Pattern matching and Regular Expression in Javascript

This tutorial is a part of the Learn everything about Javascript in one course.

Pattern matching for string

A pattern is simply any sequence of characters you create, for example:

  • hello: simple literal pattern
  • sunny*.+: pattern to match string contains sunny and whatever comes after sunny.

Pattern matching is a powerful way to manipulate string (process string - IPO). In this process, a pattern is given to search through the string. If part(s) of the string matches with the pattern, following action will be implemented. Following action can be:

  • return true that string contains given pattern.
  • replace the found part with new part.
  • return the count of matches.
  • ...
pattern.js
// Pattern matching for string
// in this example
// the string to to look for the pattern is: `It's always sunny in Sunnyvale`
// the pattern is `sunny*.+`
// `sunny*.+` means `sunny` and whatever comes after it
// `replace` method will replace the found part with given replacement
// '***': is the replacement
let sentence = `It's always sunny in Sunnyvale`
console.log(sentence.replace(/sunny*.+/, '***')) // It's always ***
console.log(sentence.replace('always', '_never_')) // It's _never_ sunny in Sunnyvale

We will learn more about String data structure built-in replace method later in this tutorial.

Powerful pattern matching with Regular Expression

Regular Expression is a set of rules to make pattern matching more powerful. *.+ in above example is part of these rules. The whole sequence /sunny*.+/ is a regular expression.

Learning Regular Expression means learning to apply these rules to create the pattern and use it to process string.

Syntax: /pattern/modifiers or new RegExp(pattern, modifiers)

Creating a regular expression in Javascript

There are 2 ways to create a regular expression in Javascript:

  1. String + literal enclosed between /, example: /sunny*.+/g
  2. String + constructor of global RegExp object, example: new RegExp('sunny*.+', 'g')
pattern.js
// Creating a pattern (a regular expression object)
// two ways of creation, identical results
// 'pattern' is `sunny*.+`
// 'modifiers', is 'g'
// regex1 and regex2 are Regular Expression in Javascript
let regex1 = /sunny*.+/g
let regex2 = new RegExp('sunny*.+', 'g') // we will talk more about this syntax later
console.log('regex1:', regex1) // regex1: /sunny*.+/g
console.log('regex2:', regex2) // regex2: /sunny*.+/g

Modifiers in Regular Expression

are used to instruct if the match is case-insensitive, global and multiline.

Modifier enhancement
g global match (find all matches rather than stopping after the first match)
i match is case-insensitive
m perform that match on multiple lines
pattern.js
// Modifiers in Regular Expression
// 'pattern' is `sunny`
// 'modifiers' are 'gi', both `global` and `case-insensitive` are applied
console.log(sentence.replace(/sunny/gi, '***')) // It's always *** in ***vale

Create pattern with literal character

These are characters match exactly a single character.

Character matches
Alphanumeric itself
\t tab
\v vertical tab
\n newline
\r carriage return
\f form feed
pattern.js
// Create pattern with character classes
// `.` can match any character, we can use it as many times as needed
console.log(sentence.replace(/sunny./, '***')) // It's always ***in Sunnyvale
console.log(sentence.replace(/sunny.../, '***')) // It's always *** Sunnyvale

// `\d` can match one any digit
console.log('1917 is a movie about war'.replace(/\d/, '***')) // ***917 is a movie about war

// `\D` can match one any non-digit
console.log('1917 is a movie about war'.replace(/\D/, '***')) // 1917***is a movie about war

Create pattern with character classes

These are characters can match with mutliple characters in a same class. Example of classes: digit, alphabet, ...

Character matches
. any single character
\d any digit [0-9]
\D any NON-digit character
\w any alphanumeric and underscore _
\W any NON-alphanumeric and underscore _
\s a single white space character (space, tab, form feed, line feed, ...)
\S a single character other than white space
pattern.js
// Create pattern with literal character
// `.` can match any character, we can use it as many times as needed
console.log(sentence.replace(/sunny./, '***')) // It's always ***in Sunnyvale
console.log(sentence.replace(/sunny.../, '***')) // It's always *** Sunnyvale

// `\d` can match one any digit
console.log('1917 is a movie about war'.replace(/\d/, '***')) // ***917 is a movie about war

// `\D` can match one any non-digit
console.log('1917 is a movie about war'.replace(/\D/, '***')) // 1917***is a movie about war

Create pattern with position specifiers

Position specifers instruct what part of the string the pattern should make the matching.

Position specifier matches
\b beginning or end of a word
\B not at the beginning or end of the word
^ beginning of the string
$ the end of the string
(?=ok) lookahead assertion: require that the following characters match the pattern ok, but do not include those characters in the match.
(?!ok) negative lookahead assertion: require that the following characters do not match the pattern ok
pattern.js
// Create pattern with position specifiers
let str = 'hello, look at you!'

// '\b' match at word starts
console.log(str.replace(/\blo/, '**')) // hello, **ok at you!

// '\b' match at the end of word
console.log(str.replace(/lo\b/, '**')) // hel**, look at you!

// `\B` NOT match at the beginning of the word
console.log(str.replace(/\Blo/, '**')) // hel**, look at you!

// `\B` NOT match at the end of the word
console.log(str.replace(/lo\B/, '**')) // hello, **ok at you!

// `^` beginning of the string
console.log(str.replace(/^lo/, '**')) // hello, look at you!, replace nothing because string does not start with 'lo'
console.log(str.replace(/^he/, '**')) // **llo, look at you!, replaced with '**' because string starts with 'he'

// `$` end of the string
console.log(str.replace(/lo$/, '**')) // hello, look at you!, replace nothing because string does not end with 'lo'
console.log(str.replace(/u!$/, '**')) // hello, look at yo**, replaced with '**' because string starts with 'u!'

// `(?=ok)` match `lo`, following character must match `ok`
console.log(str.replace(/lo(?=ok)/, '**')) // hello, **ok at you!

// `(?!ok)` match `lo`, where `lo` must NOT be followed by `ok`
console.log(str.replace(/lo(?!ok)/, '**')) // hel**, look at you!

Create pattern to match a range of characters with brackets

Syntax matches
[abc] any character between the brackets
[^abc] any character not between the brackets
[a-z] any character from a to z
[A-Z] any character from A to Z
[0-9] any digit from 0 to 9
[^0-9] any character not from 0 to 9
[a-z0-9] any character from a to z or 0 to 9
(x|y) character x or y (alternation)
pattern.js
// Create pattern to match a range of characters with brackets
// [abc] matches any character a, b or c
console.log(str.replace(/[abc]/, '*')) // hello, look *t you!

// (look|stare) matches 'look' or 'stare'
console.log(str.replace(/(look|stare)/, '****')) // hello, **** at you!
console.log('stare at you!'.replace(/(look|stare)/, '****')) // **** at you!

Create pattern with quantifiers

Quantifiers are used to specify the number of the pattern can be matched.

Syntax matches
n{3} character n exactly 3 times
n{3, } character n at minimum 3 times
n{3, 5} character n at least 3 times, no more than 5 times
n? character n 0 or 1 time (equals to n{0,1})
n+ character n at least 1 time (equals to n{1, })
n* character n at least 0 time (equals to n{0, })
pattern.js
// Create pattern with quantifiers
const loud = 'Annnnnnddd the winner is...'
console.log(loud.replace(/n{3}/, '***')) // A***nnnddd the winner is...
console.log(loud.replace(/n{3,}/, '***')) // A***ddd the winner is...
console.log(loud.replace(/n{3,5}/, '***')) // A***nddd the winner is...
console.log(loud.replace(/h?/, '*')) // *Annnnnnddd the winner is...
console.log(loud.replace(/n+/, '*')) // A*ddd the winner is...
console.log(loud.replace(/An*/, '*')) // *ddd the winner is...

Create pattern with alternation, grouping, and references

Syntax means
| Alternation. Match either the subexpression to the left or the subexpression to the right. Considered left to right.
(...) Grouping. Group items into a single unit that can be used with *, +, ?,
(?:...) Grouping only. Group items into a single unit, but do not remember the characters that match this group.
\n Match the same characters that were matched when group number n was first matched. Groups are subexpressions within (possibly nested) parentheses. Group numbers are assigned by counting left parentheses from left to right. Groups formed with (?: are not numbered.
pattern.js
// Create pattern with alternation, grouping, and references

// alternation
const look = 'I am looking at you.'
const see = 'I see you.'
const watch = 'Someone is watching.'
console.log(look.replace(/(look|see|watch)/, '***')) // I am ***ing at you.
console.log(see.replace(/(look|see|watch)/, '***')) // I *** you.
console.log(watch.replace(/(look|see|watch)/, '***')) // Someone is ***ing.

// grouping and references
const js = `javascript is funjavascriptfun, isn't it?`
console.log(js.replace(/([Jj]ava([Ss]cript)?)\sis\s(fun)/, '***')) // ***javascriptfun, isn't it?

// each '()' creates a group
// `\number` reference to that group, in left-to-right order
console.log(js.replace(/([Jj]ava([Ss]cript)?)\sis\s(fun)\1/, '***')) // ***fun, isn't it?
console.log(js.replace(/([Jj]ava([Ss]cript)?)\sis\s(fun)\1\3/, '***')) // ***, isn't it?

Create pattern with preserved characters

As we've seen above, the following characters have special meaning in regular Expression syntax: ^, $, ., *, +, ?, =, !, :, |, \, /, (, ), [, ], {, }.

However, it's quite common that we want to match these characters. To do so, we simply need to put a backslash \ character before it.

pattern.js
// ## Create pattern with preserved characters
const str2 = 'I am a ^spe(ial $tring.'
console.log(str2.replace(/\^sp/, '***')) // I am a ***e(ial $tring.
console.log(str2.replace(/\(ial/, '****')) // I am a ^spe**** $tring.
console.log(str2.replace(/\$tr/, '***')) // I am a ^spe(ial ***ing.
console.log(str2.replace(/ing\./, '****')) // I am a ^spe(ial $tr****

Pattern matching examples

Pattern Possible matches
/([Jj]ava([Ss]cript)?)\sis\s(fun\w*)/ javascript is fun
Javascript is fun
JavaScript is fun
Java is funny
Java is funny_haha

Important learning resources

We have learnt how to built the pattern. In the next lecture, we will learn to run these pattern and process input string with built-in String and Regular Expression (Regex) methods.

Summary

  1. Regular Expression is about about pattern matching. It is very powerful.
  2. A lot of practicing and referencing will help to master this skill.