This tutorial is a part of the Learn everything about Javascript in one course.
Pattern matching for string
A pattern is simply any sequence of characters you create, for example:
hello
: simple literal patternsunny*.+
: pattern to match string containssunny
and whatever comes aftersunny
.
Pattern matching is a powerful way to manipulate string (process string - IPO). In this process, a pattern is given to search through the string. If part(s) of the string matches with the pattern, following action will be implemented. Following action can be:
- return
true
that string contains given pattern. - replace the found part with new part.
- return the count of matches.
- ...
// Pattern matching for string
// in this example
// the string to to look for the pattern is: `It's always sunny in Sunnyvale`
// the pattern is `sunny*.+`
// `sunny*.+` means `sunny` and whatever comes after it
// `replace` method will replace the found part with given replacement
// '***': is the replacement
let sentence = `It's always sunny in Sunnyvale`
console.log(sentence.replace(/sunny*.+/, '***')) // It's always ***
console.log(sentence.replace('always', '_never_')) // It's _never_ sunny in Sunnyvale
We will learn more about String data structure built-in replace
method later in this tutorial.
Powerful pattern matching with Regular Expression
Regular Expression is a set of rules to make pattern matching more powerful. *.+
in above example is part of these rules. The whole sequence /sunny*.+/
is a regular expression.
Learning Regular Expression means learning to apply these rules to create the pattern and use it to process string.
Syntax: /pattern/modifiers
or new RegExp(pattern, modifiers)
Creating a regular expression in Javascript
There are 2 ways to create a regular expression in Javascript:
- String + literal enclosed between
/
, example:/sunny*.+/g
- String + constructor of global
RegExp
object, example:new RegExp('sunny*.+', 'g')
// Creating a pattern (a regular expression object)
// two ways of creation, identical results
// 'pattern' is `sunny*.+`
// 'modifiers', is 'g'
// regex1 and regex2 are Regular Expression in Javascript
let regex1 = /sunny*.+/g
let regex2 = new RegExp('sunny*.+', 'g') // we will talk more about this syntax later
console.log('regex1:', regex1) // regex1: /sunny*.+/g
console.log('regex2:', regex2) // regex2: /sunny*.+/g
Modifiers in Regular Expression
are used to instruct if the match is case-insensitive, global and multiline.
Modifier | enhancement |
---|---|
g |
global match (find all matches rather than stopping after the first match) |
i |
match is case-insensitive |
m |
perform that match on multiple lines |
// Modifiers in Regular Expression
// 'pattern' is `sunny`
// 'modifiers' are 'gi', both `global` and `case-insensitive` are applied
console.log(sentence.replace(/sunny/gi, '***')) // It's always *** in ***vale
Create pattern with literal character
These are characters match exactly a single character.
Character | matches |
---|---|
Alphanumeric | itself |
\t |
tab |
\v |
vertical tab |
\n |
newline |
\r |
carriage return |
\f |
form feed |
// Create pattern with character classes
// `.` can match any character, we can use it as many times as needed
console.log(sentence.replace(/sunny./, '***')) // It's always ***in Sunnyvale
console.log(sentence.replace(/sunny.../, '***')) // It's always *** Sunnyvale
// `\d` can match one any digit
console.log('1917 is a movie about war'.replace(/\d/, '***')) // ***917 is a movie about war
// `\D` can match one any non-digit
console.log('1917 is a movie about war'.replace(/\D/, '***')) // 1917***is a movie about war
Create pattern with character classes
These are characters can match with mutliple characters in a same class. Example of classes: digit, alphabet, ...
Character | matches |
---|---|
. |
any single character |
\d |
any digit [0-9] |
\D |
any NON-digit character |
\w |
any alphanumeric and underscore _ |
\W |
any NON-alphanumeric and underscore _ |
\s |
a single white space character (space, tab, form feed, line feed, ...) |
\S |
a single character other than white space |
// Create pattern with literal character
// `.` can match any character, we can use it as many times as needed
console.log(sentence.replace(/sunny./, '***')) // It's always ***in Sunnyvale
console.log(sentence.replace(/sunny.../, '***')) // It's always *** Sunnyvale
// `\d` can match one any digit
console.log('1917 is a movie about war'.replace(/\d/, '***')) // ***917 is a movie about war
// `\D` can match one any non-digit
console.log('1917 is a movie about war'.replace(/\D/, '***')) // 1917***is a movie about war
Create pattern with position specifiers
Position specifers instruct what part of the string the pattern should make the matching.
Position specifier | matches |
---|---|
\b |
beginning or end of a word |
\B |
not at the beginning or end of the word |
^ |
beginning of the string |
$ |
the end of the string |
(?=ok) |
lookahead assertion: require that the following characters match the pattern ok , but do not include those characters in the match. |
(?!ok) |
negative lookahead assertion: require that the following characters do not match the pattern ok |
// Create pattern with position specifiers
let str = 'hello, look at you!'
// '\b' match at word starts
console.log(str.replace(/\blo/, '**')) // hello, **ok at you!
// '\b' match at the end of word
console.log(str.replace(/lo\b/, '**')) // hel**, look at you!
// `\B` NOT match at the beginning of the word
console.log(str.replace(/\Blo/, '**')) // hel**, look at you!
// `\B` NOT match at the end of the word
console.log(str.replace(/lo\B/, '**')) // hello, **ok at you!
// `^` beginning of the string
console.log(str.replace(/^lo/, '**')) // hello, look at you!, replace nothing because string does not start with 'lo'
console.log(str.replace(/^he/, '**')) // **llo, look at you!, replaced with '**' because string starts with 'he'
// `$` end of the string
console.log(str.replace(/lo$/, '**')) // hello, look at you!, replace nothing because string does not end with 'lo'
console.log(str.replace(/u!$/, '**')) // hello, look at yo**, replaced with '**' because string starts with 'u!'
// `(?=ok)` match `lo`, following character must match `ok`
console.log(str.replace(/lo(?=ok)/, '**')) // hello, **ok at you!
// `(?!ok)` match `lo`, where `lo` must NOT be followed by `ok`
console.log(str.replace(/lo(?!ok)/, '**')) // hel**, look at you!
Create pattern to match a range of characters with brackets
Syntax | matches |
---|---|
[abc] |
any character between the brackets |
[^abc] |
any character not between the brackets |
[a-z] |
any character from a to z |
[A-Z] |
any character from A to Z |
[0-9] |
any digit from 0 to 9 |
[^0-9] |
any character not from 0 to 9 |
[a-z0-9] |
any character from a to z or 0 to 9 |
(x|y) |
character x or y (alternation) |
// Create pattern to match a range of characters with brackets
// [abc] matches any character a, b or c
console.log(str.replace(/[abc]/, '*')) // hello, look *t you!
// (look|stare) matches 'look' or 'stare'
console.log(str.replace(/(look|stare)/, '****')) // hello, **** at you!
console.log('stare at you!'.replace(/(look|stare)/, '****')) // **** at you!
Create pattern with quantifiers
Quantifiers are used to specify the number
of the pattern can be matched.
Syntax | matches |
---|---|
n{3} |
character n exactly 3 times |
n{3, } |
character n at minimum 3 times |
n{3, 5} |
character n at least 3 times, no more than 5 times |
n? |
character n 0 or 1 time (equals to n{0,1} ) |
n+ |
character n at least 1 time (equals to n{1, } ) |
n* |
character n at least 0 time (equals to n{0, } ) |
// Create pattern with quantifiers
const loud = 'Annnnnnddd the winner is...'
console.log(loud.replace(/n{3}/, '***')) // A***nnnddd the winner is...
console.log(loud.replace(/n{3,}/, '***')) // A***ddd the winner is...
console.log(loud.replace(/n{3,5}/, '***')) // A***nddd the winner is...
console.log(loud.replace(/h?/, '*')) // *Annnnnnddd the winner is...
console.log(loud.replace(/n+/, '*')) // A*ddd the winner is...
console.log(loud.replace(/An*/, '*')) // *ddd the winner is...
Create pattern with alternation, grouping, and references
Syntax | means |
---|---|
| |
Alternation. Match either the subexpression to the left or the subexpression to the right. Considered left to right. |
(...) |
Grouping. Group items into a single unit that can be used with * , + , ? , |
(?:...) |
Grouping only. Group items into a single unit, but do not remember the characters that match this group. |
\n |
Match the same characters that were matched when group number n was first matched. Groups are subexpressions within (possibly nested) parentheses. Group numbers are assigned by counting left parentheses from left to right. Groups formed with (?: are not numbered. |
// Create pattern with alternation, grouping, and references
// alternation
const look = 'I am looking at you.'
const see = 'I see you.'
const watch = 'Someone is watching.'
console.log(look.replace(/(look|see|watch)/, '***')) // I am ***ing at you.
console.log(see.replace(/(look|see|watch)/, '***')) // I *** you.
console.log(watch.replace(/(look|see|watch)/, '***')) // Someone is ***ing.
// grouping and references
const js = `javascript is funjavascriptfun, isn't it?`
console.log(js.replace(/([Jj]ava([Ss]cript)?)\sis\s(fun)/, '***')) // ***javascriptfun, isn't it?
// each '()' creates a group
// `\number` reference to that group, in left-to-right order
console.log(js.replace(/([Jj]ava([Ss]cript)?)\sis\s(fun)\1/, '***')) // ***fun, isn't it?
console.log(js.replace(/([Jj]ava([Ss]cript)?)\sis\s(fun)\1\3/, '***')) // ***, isn't it?
Create pattern with preserved characters
As we've seen above, the following characters have special meaning in regular Expression syntax: ^
, $
, .
, *
, +
, ?
, =
, !
, :
, |
, \
, /
, (
, )
, [
, ]
, {
, }
.
However, it's quite common that we want to match these characters. To do so, we simply need to put a backslash \
character before it.
// ## Create pattern with preserved characters
const str2 = 'I am a ^spe(ial $tring.'
console.log(str2.replace(/\^sp/, '***')) // I am a ***e(ial $tring.
console.log(str2.replace(/\(ial/, '****')) // I am a ^spe**** $tring.
console.log(str2.replace(/\$tr/, '***')) // I am a ^spe(ial ***ing.
console.log(str2.replace(/ing\./, '****')) // I am a ^spe(ial $tr****
Pattern matching examples
Pattern | Possible matches | |
---|---|---|
/([Jj]ava([Ss]cript)?)\sis\s(fun\w*)/ |
javascript is fun Javascript is fun JavaScript is fun Java is funny Java is funny_haha |
Important learning resources
We have learnt how to built the pattern. In the next lecture, we will learn to run these pattern and process input string with built-in String
and Regular Expression
(Regex) methods.
Summary
- Regular Expression is about about pattern matching. It is very powerful.
- A lot of practicing and referencing will help to master this skill.