What is a regular expression?
Regular expressions are patterns used to match character combinations in strings.
In JavaScript, regular expressions are also objects. These patterns are used with the exec() and test() methods of RegExp, and with the match(), matchAll(), replace(), replaceAll(), search(), and split() methods of String.
“Regular expressions - JavaScript | MDN” (MDN Web Docs). Retrieved February 5, 2025.
How can you create a regular expression?
The two main ways of creating regular expressions are:
/abc/iv
new RegExp('abc', 'iv')Both regular expressions have the same two parts:
i and v. Flags configure how the pattern is interpreted. For example, i enables case-insensitive matching and v enables Unicode sets mode.“Literal vs. constructor” (exploringjs.com). Retrieved February 18, 2025.
How can you clone a regular expression?
There are two variants of the constructor RegExp():
new RegExp(pattern : string, flags = '') // [ES3] - A new regular expression is created as specified via pattern. If flags is missing, the empty string '' is used.new RegExp(regExp : RegExp, flags = regExp.flags) [ES6] - regExp is cloned. If flags is provided, then it determines the flags of the clone.The second variant is useful for cloning regular expressions, optionally while modifying them. Flags are immutable and this is the only way of changing them – for example:
function copyAndAddFlags(regExp, flagsToAdd='') {
// The constructor doesn’t allow duplicate flags;
// make sure there aren’t any:
const newFlags = Array.from(
new Set(regExp.flags + flagsToAdd)
).join('');
return new RegExp(regExp, newFlags);
}
assert.equal(/abc/i.flags, 'i');
assert.equal(copyAndAddFlags(/abc/i, 'g').flags, 'gi');“Cloning and non-destructively modifying regular expressions” (exploringjs.com). Retrieved February 18, 2025.
What are the syntax characters of a regular expression?
At the top level of a regular expression, the following characters aren special and are know as syntax characters. They are escaped by prefixing a backslash (\).
\ ^ $ . * + ? ( ) [ ] { } |In regular expression literals, we must escape slashes:
> /\//.test('/')
trueIn the argument of new RegExp(), we don’t have to escape slashes:
> new RegExp('/').test('/')
true“Syntax characters” (exploringjs.com). Retrieved February 18, 2025.
When is it illegal to scape a non-syntax character?
Without flag /u and /v, an escaped non-syntax character at the top level matches itself:
> /^\a$/.test('a')
trueWith flag /u or /v, escaping a non-syntax character at the top level is a syntax error:
assert.throws(
() => eval(String.raw`/\a/v`),
{
name: 'SyntaxError',
message: 'Invalid regular expression: /\\a/v: Invalid escape',
}
);
assert.throws(
() => eval(String.raw`/\-/v`),
{
name: 'SyntaxError',
message: 'Invalid regular expression: /\\-/v: Invalid escape',
}
);“Illegal top-level escaping” (exploringjs.com). Retrieved February 19, 2025.
What are character classes [. . .]?
A character class wraps class ranges in square brackets. The class ranges specify a set of characters:
[«class ranges»] matches any character in the set.[^«class ranges»] matches any character not in the set.Rules for class ranges:
[abc]^ \ - ]^ only has to be escaped if it comes first.- need not be escaped if it comes first or last.\n, \u{1F44D}, etc.) have the usual meaning.\d, \P{White_Space}, \p{RGI_Emoji}, etc.) have the usual meanings.[a-z]Watch out: \b stands for backspace. Elsewhere in a regular expression, it matches word boundaries.
“Syntax: character classes” (exploringjs.com). Retrieved February 20, 2025.
What are the scaping rules inside character classes [. . . ]?
Rules for escaping inside character classes without flag /v:
We always must escape: \ ]
Some characters only have to be escaped in some locations:
- only has to be escaped if it doesn’t come first or last.^ only has to be escaped if it comes first.Rules with flag /v:
A single ^ only has to be escaped if it comes first.
Class set syntax characters have to be escaped:
( ) [ ] { } / - \ |Class set reserved double punctuators have to be escaped:
&& !! ## \$\$ %% ** ++ ,, .. :: ;; << == >> ?? @@ ^^ `` ~~
“Escaping inside character classes ([···])” (exploringjs.com). Retrieved February 19, 2025.
What are syntax atoms of regular expressions?
Atoms are the basic building blocks of regular expressions.
^, $, etc.). Pattern characters match themselves. Examples: A b %. matches any character. We can use the flag /s (dotAll) to control if the dot matches line terminators or not.\f: form feed (FF)\n: line feed (LF)\r: carriage return (CR)\t: character tabulation\v: line tabulation\cA (Ctrl-A), …, \cZ (Ctrl-Z)\u00E4/u or /v): \u{1F44D}\d \D \s \S \w \W\p{White_Space}, \P{White_Space}, etc./u or /v.\p{RGI_Emoji}, etc.“Syntax: atoms of regular expressions” (exploringjs.com). Retrieved February 20, 2025.
What do the following character class escapes (sets of code units) do: \d \D \s \S \w \W ?
\d → Matches any digit (equivalent to [0-9]).\D → Matches any non-digit (equivalent to [^0-9]).\s → Matches any whitespace character (spaces, tabs, line terminators, etc.).\S → Matches any non-whitespace character.\w → Matches any “word” character (equivalent to [a-zA-Z0-9_]).\W → Matches any non-word character (equivalent to [^a-zA-Z0-9_]).
Examples:
> 'a7x4'.match(/\d/g) [ '7', '4' ] > 'a7x4'.match(/\D/g) [ 'a', 'x' ] > 'high - low'.match(/\w+/g) [ 'high', 'low' ] > 'hello\t\n everyone'.replaceAll(/\s/g, '-') 'hello---everyone'
“Basic character class escapes (sets of code units): \d \D \s \S \w \W” (exploringjs.com). Retrieved February 24, 2025.
What are unicode character properties?
In the Unicode standard, each character has properties – metadata describing it. Properties play an important role in defining the nature of a character.
These are a few examples of properties:
* Name: a unique name, composed of uppercase letters, digits, hyphens, and spaces – for example:
* A: Name = LATIN CAPITAL LETTER A
* 🙂: Name = SLIGHTLY SMILING FACE
x: General_Category = Lowercase_Letter$: General_Category = Currency_Symbol\t: White_Space = Trueπ: White_Space = False€ was added in version 2.1 of the Unicode standard.€: Age = 2.1S: Block = Basic_Latin (range 0x0000..0x007F)🙂: Block = Emoticons (range 0x1F600..0x1F64F)α: Script = GreekД: Script = Cyrillic“Unicode character properties” (exploringjs.com). Retrieved February 25, 2025.
What are unicode character property escapes?
With flag /u and flag /v, we can use \p{} and \P{} to specify sets of code points via Unicode character properties. That looks like this:
\p{prop=value}: matches all characters whose Unicode character property prop has the value value.\P{prop=value}: matches all characters that do not have a Unicode character property prop whose value is value.\p{bin_prop}: matches all characters whose binary Unicode character property bin_prop is True.\P{bin_prop}: matches all characters whose binary Unicode character property bin_prop is False.Without the flags /u and /v, \p is the same as p.
Forms (3) and (4) can be used as abbreviations if the property is General_Category. For example, the following two escapes are equivalent:
\p{Uppercase_Letter}
\p{General_Category=Uppercase_Letter}Examples:
Checking for whitespace:
> /^\p{White_Space}+$/u.test('\t \n\r')
trueChecking for Greek letters:
> /^\p{Script=Greek}+$/u.test('μετά')
trueDeleting any letters:
> '1π2ü3é4'.replace(/\p{Letter}/ug, '')
'1234'Deleting lowercase letters:
> 'AbCdEf'.replace(/\p{Lowercase_Letter}/ug, '')
'ACE'“Unicode character property escapes [ES2018]” (exploringjs.com). Retrieved February 25, 2025.
What are unicode string property scapes?
With /u, we can use Unicode property escapes (\p{} and \P{}) to specify sets of code points via Unicode character properties.
With /v, we can additionally use \p{} to specify sets of code point sequences via Unicode string properties (negation via \P{} is not supported):
> /^\p{RGI_Emoji}$/v.test('⛔') // 1 code point (1 code unit)
true
> /^\p{RGI_Emoji}$/v.test('🙂') // 1 code point (2 code units)
true
> /^\p{RGI_Emoji}$/v.test('😵💫') // 3 code points
trueLet’s see how the character property Emoji would do with these inputs:
> /^\p{Emoji}$/u.test('⛔') // 1 code point (1 code unit)
true
> /^\p{Emoji}$/u.test('🙂') // 1 code point (2 code units)
true
> /^\p{Emoji}$/u.test('😵💫') // 3 code points
false“Unicode string property escapes [ES2024]” (exploringjs.com). Retrieved February 26, 2025.
Regexp syntax quantifiers
By default, all of the following quantifiers are greedy (they match as many characters as possible):
?: match never or once*: match zero or more times+: match one or more times{n}: match n times{n,}: match n or more times{n,m}: match at least n times, at most m times.To make them reluctant (so that they match as few characters as possible), put question marks (?) after them:
> /".*"/.exec('"abc"def"')[0] // greedy
'"abc"def"'
> /".*?"/.exec('"abc"def"')[0] // reluctant
'"abc"'“Syntax: quantifiers” (exploringjs.com). Retrieved February 27, 2025.
Regexp syntax assertions
^ matches only at the beginning of the input$ matches only at the end of the input\b matches only at a word boundary\B matches only when not at a word boundarySyntax: assertions” (exploringjs.com).](https://exploringjs.com/js/book/ch_regexps.html#syntax-assertions) Retrieved February 28, 2025.
What are lookaround assertions
Lookaround assertions are special types of assertions in regular expressions that allow you to match a pattern based on what comes before (lookbehind) or after (lookahead) it, without including those parts in the match
Positive lookahead: (?=«pattern») matches if pattern matches what comes next.
Example: sequences of lowercase letters that are followed by an X.
> 'abcX def'.match(/[a-z]+(?=X)/g) [ 'abc' ]
Note that the X itself is not part of the matched substring.
Negative lookahead: (?!«pattern») matches if pattern does not match what comes next.
Example: sequences of lowercase letters that are not followed by an X.
> 'abcX def'.match(/[a-z]+(?!X)/g) [ 'ab', 'def' ]
Positive lookbehind: (?<=«pattern») matches if pattern matches what came before.
Example: sequences of lowercase letters that are preceded by an X.
> 'Xabc def'.match(/(?<=X)[a-z]+/g) [ 'abc' ]
Negative lookbehind: (?<!«pattern») matches if pattern does not match what came before.
Example: sequences of lowercase letters that are not preceded by an X.
> 'Xabc def'.match(/(?<!X)[a-z]+/g) [ 'bc', 'def' ]
Example: replace “.js” with “.html”, but not in “Node.js”.
> 'Node.js: index.js and main.js'.replace(/(?<!Node)\.js/g, '.html') 'Node.js: index.html and main.html'
“Lookahead assertions” (exploringjs.com). Retrieved March 3, 2025.
Explain regexp syntax disjunction (|)
^aa|zz$ - matches all strings that start with 'aa' and/or end with 'zz'.| has a lower precedence than ^ and $.^(aa|zz)$ - matches the two strings 'aa' and 'zz'.^a(a|z)z$ - matches the two strings 'aaz' and 'azz'.Caveat: this operator has low precedence. Use groups if necessary:
“Syntax: disjunction (|)” (exploringjs.com). Retrieved March 3, 2025.
Explain regexp /i (.ignoreCase) flag
/i (.ignoreCase) flag switches on case-insensitive matching:
> /a/.test('A')
false
> /a/i.test('A')
true“Regular expression flags” (exploringjs.com). Retrieved March 3, 2025.
Explain regexp /g (.global) flag
/g (.global) flag fundamentally changes how the following methods work.
RegExp.prototype.test() RegExp.prototype.exec() String.prototype.match()
In a nutshell, without /g, the methods only consider the first match for a regular expression in an input string. With /g, they consider all matches.
“Regular expression flags” (exploringjs.com). Retrieved March 3, 2025.
Explain regexp /d (.hasIndices) flag
Some RegExp-related methods return match objects that describe where the regular expression matched in an input string. If the /d (.hasIndices) flag is on, each match object includes match indices which tell us where each group capture starts and ends.
Match indices for numbered groups
This is how we access the captures of numbered groups:
const matchObj = /(a+)(b+)/d.exec('aaaabb');
assert.equal(
matchObj[1], 'aaaa'
);
assert.equal(
matchObj[2], 'bb'
);Due to the regular expression flag /d, matchObj also has a property .indices that records for each numbered group where it was captured in the input string:
assert.deepEqual( matchObj.indices[1], [0, 4] ); assert.deepEqual( matchObj.indices[2], [4, 6] );
Match indices for named groups
The captures of named groups are accessed likes this:
const matchObj = /(?<as>a+)(?<bs>b+)/d.exec('aaaabb');
assert.equal(
matchObj.groups.as, 'aaaa');
assert.equal(
matchObj.groups.bs, 'bb');Their indices are stored in matchObj.indices.groups:
assert.deepEqual( matchObj.indices.groups.as, [0, 4]); assert.deepEqual( matchObj.indices.groups.bs, [4, 6]);
“Regular expression flags” (exploringjs.com). Retrieved March 3, 2025.
Explain regexp /m (.multiline) flag
If the /m (.multiline) flag is on, ^ matches the beginning of each line and $ matches the end of each line. If it is off, ^ matches the beginning of the whole input string and $ matches the end of the whole input string.
> 'a1\na2\na3'.match(/^a./gm) [ 'a1', 'a2', 'a3' ] > 'a1\na2\na3'.match(/^a./g) [ 'a1' ]
“Regular expression flags” (exploringjs.com). Retrieved March 3, 2025.
Explain regexp /s (.dotAll) flag
By default, the dot does not match line terminators. With the /s (.dotAll) flag, it does:
> /./.test('\n')
false
> /./s.test('\n')
trueWorkaround: If /s isn’t supported, we can use [^] instead of a dot.
> /[^]/.test('\n')
true“Regular expression flags” (exploringjs.com). Retrieved March 3, 2025.
Explain regexp /y (.sticky) flag
/y (.sticky): This flag mainly makes sense in conjunction with /g. When both are switched on, any match must directly follow the previous one (that is, it must start at index .lastIndex of the regular expression object). Therefore, the first match must be at index 0.
> 'a1a2 a3'.match(/a./gy) [ 'a1', 'a2' ] > '_a1a2 a3'.match(/a./gy) // first match must be at index 0 null
> 'a1a2 a3'.match(/a./g) [ 'a1', 'a2', 'a3' ] > '_a1a2 a3'.match(/a./g) [ 'a1', 'a2', 'a3' ]
The main use case for /y is tokenization (during parsing)
“Regular expression flags” (exploringjs.com). Retrieved March 3, 2025.
Explain regexp /u (.unicode) flag
The /u (.unicode) flag provides better support for Unicode code points.
“Regular expression flags” (exploringjs.com). Retrieved March 3, 2025.
Explain regexp /v (.unicodeSets) flag
The /v (.unicodeSets) flag improves on flag /u and provides limited support for multi-code-point grapheme clusters. It also supports set operations in character classes.
“Regular expression flags” (exploringjs.com). Retrieved March 3, 2025.