What is a regular expression?
A regular expression (also known as regex or regexp) is a search pattern consisting of a set of characters and optional flags. You can use a regular expression to define a search pattern to find data in a text.
Flags
Flags or modifiers in regular expressions are used to customize searching.
Flag | Description |
---|---|
g |
Global match - finds all matches instead of stopping at the first one |
i |
Ignore case - performs a case insensitive search |
m |
Multiline - allows ^ and $ to match start and end of line |
u |
Unicode - enables full Unicode support |
y |
Sticky - starts searching at the lastIndex position |
s |
Singleline - also known as "dotall", allows . to match newlines \n |
Brackets
Brackets are used to search for characters in a given range in regular expressions.
Expression | Description |
---|---|
[...] |
One of the characters in the brackets |
[^...] |
One of the characters NOT in the brackets |
[a-z] |
One of the characters from a to z |
[^a-z] |
One of the characters NOT from a to z |
[A-Z] |
One of the characters from A to Z |
[^A-Z] |
One of the characters NOT from A to Z |
[0-9] |
One of the characters from 0 to 9 (a digit character) |
[^0-9] |
One of the characters NOT from 0 to 9 (a non-digit character) |
Groups
Groups in regular expressions are part of a search pattern enclosed in parentheses (...)
.
Expression | Description |
---|---|
(...) |
A capturing group |
(?:...) |
A non-capturing group |
(a|b) |
Either a or b |
Character Classes
Character classes are characters with a special meaning to define search patterns in regular expressions.
Character | Description |
---|---|
. |
A single character except newline \n |
\d |
A digit character. Equivalent to [0-9] . |
\D |
A non-digit character. Equivalent to [^0-9] . |
\w |
A word character. An alphanumeric character including underscore. Equivalent to [a-zA-Z0-9_] . |
\W |
A non-word character. NOT an alphanumeric character including underscore. Equivalent to [^a-zA-Z0-9_] . |
\s |
A whitespace character |
\S |
A non-whitespace character |
[\b] |
A literal backspace character |
Special Characters
Character | Description |
---|---|
\ |
An escape character |
\0 |
A null character |
\n |
A newline character |
\t |
A tab character |
\v |
A vertical tab character |
\r |
A carriage return character |
\f |
A form feed character |
\cX |
A control character where X is a character from A-Z |
\ooo |
The character specified by three octal digits |
\xhh |
The character specified by two hexadecimal digits |
\uhhhh |
The Unicode character specified by four hexadecimal digits |
Quantifiers
Quantifiers in regular expressions specify the number of occurrences a character, character class, or group must be present.
Character | Description |
---|---|
* |
Zero or more times |
? |
Zero or one time |
+ |
One or more times |
{n} |
Exactly n times |
{n,m} |
n to m times |
{n,} |
n times or more |
Assertions
Assertions are regular expressions consisting of anchors and lookaheads that cause a match to succeed if found or fail otherwise.
Character | Description |
---|---|
^ |
Start of string or line |
$ |
End of string or line |
\b |
A word boundary |
\B |
A non-word boundary |
(?=...) |
Positive lookahead |
(?!...) |
Negative lookahead |