Regex (Python)

Metacharacters

. ^ $ * + ? { } [ ] \ | ( )

[] re used for specifying a character class, which is a set of characters that you wish to match

^ as the first character of the class will except the set

\ backslash can be followed by various characters to signal various special sequences. It’s also used to escape

Special sequences

\d Matches any decimal digit; this is equivalent to the class [0-9].

\D Matches any non-digit character; this is equivalent to the class [^0-9].

\s Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v].

\S Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].

\w Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].

\W Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].

Repeating Things

The first metacharacter for repeating things that we’ll look at is .'' doesn’t match the literal character '*'; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once.

For example, ca*t will match 'ct' (0 'a' characters), 'cat' (1 'a'), 'caaat' (3 'a' characters), and so forth.

Another repeating metacharacter is +, which matches one or more times. the difference between * and +; * matches zero or more times, so whatever’s being repeated may not be present at all, while + requires at least one occurrence. To use a similar example, ca+t will match 'cat' (1 'a'), 'caaat' (3 'a's), but won’t match 'ct'.

The question mark character, ?, matches either once or zero times; you can think of it as marking something as being optional. For example, home-?brew matches either 'homebrew' or 'home-brew'.

The most complicated repeated qualifier is {m,n}, where m and n are decimal integers. This qualifier means there must be at least m repetitions, and at most n. For example, a/{1,3}b will match 'a/b', 'a//b', and 'a///b'. It won’t match 'ab', which has no slashes, or 'a////b', which has four.