Python Regex Cheat Sheet
Python Regex Cheat Sheet
Python Regex Cheat Sheet
In this Python regular expression cheat sheet, we’ll cover every syntax with a simple
example to explain how it works.
Metacharacters
Metacharacters are regular expression building blocks. The following table highlights
different metacharacters in Python RegEx, along with their descriptions and suitable
examples:
[] Square Brackets
It is a character class with a set of characters we want to match.
For example, the character class [abc] will match any single character a, b, or c. You
can specify any range of characters as required.
For example:
● [0, 3] is the same as [0123]
● [a-c] is the same as [abc].
You can invert the character class using the caret(^) symbol. For example:
● [^0-3] means any number except 0, 1, 2, or 3
● [^a-c] means any character except a, b, or c
For example:
import re
txt = "The rain in Spain"
x = re.findall("[a-m]", txt)
print(x)
Output
\ Backslash
The backslash (\) makes sure the character is not treated in a special way. It is used to
escape the metacharacters.
For example, if you want to search for the dot(.) in the string, the searched dot will be
treated as a special character. So, you need to use the backslash(\) just before the
dot(.)
For example:
import re
txt = "That will be 59 dollars"
#Find all digit characters:
x = re.findall("\d", txt)
print(x)
Output:
['5', '9']
.- Dot
Using the dot symbol, you can match only a single character, except the newline
character.
● a.b will check for the string containing any character at the place of the dot such
as acb, acbd, abbb, etc
● .. will check if the string contains at least two characters
For example:
import re
#Search for a sequence that starts with "he", followed by two (any)
characters, and an "o":
x = re.findall("he..o", txt)
print(x)
Output:
['hello']
^ Caret (Start With)
Caret (^) symbol allows you to match the beginning of the string. It checks whether the
string starts with the specific character(s) or not.
For example:
● ^g will check if the string starts with g, such as girl, globe, gym, g, etc.
● ^ge will check if the string starts with ge, such as gem, gel, etc.
For example:
import re
x = re.findall("^hello", txt)
if x:
print("Yes, the string starts with 'hello'")
else:
print("No match")
Output:
The dollar($) symbol allows you to match the end of the string. It checks whether the
string ends with the given character(s) or not.
For example:
● s$ will check for the string that ends with a such as sis, ends, s, etc.
● ks$ will check for the string that ends with ks such as seeks, disks, ks, etc.
For example:
import re
txt = "hello planet"
x = re.findall("planet$", txt)
if x:
print("Yes, the string ends with 'planet'")
else:
print("No match")
Output:
• Star
This symbol will match zero or more occurrences of the regular expression preceding
the * symbol.
For example:
● ab*c will be matched for the string ac, abc, abbbc, dabc, etc. but will not be
matched for abdc because b is not followed by c.
For example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 0 or more (any)
characters, and an "o":
x = re.findall("he.*o", txt)
print(x)
Output:
['hello']
+ Plus
This symbol will match one or more occurrences of the regular expression preceding
the + symbol.
For example:
● ab+c will be matched for the string abc, abbc, dabc, but will not be matched for
ac, abdc because there is no b in ac and b is not followed by c in abdc.
For example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 1 or more (any)
characters, and an "o":
x = re.findall("he.+o", txt)
print(x)
Output:
['hello']
? Question mark
This symbol will check if the string before the question mark occurs at least once, or not
at all.
For example:
● ab?c will match strings ac, acb, and dabc, but will not be matched for abbc
because there are two bs. Similarly, it will not be matched for abdc because b is
not followed by c.
For example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 0 or 1 (any)
character, and an "o":
x = re.findall("he.?o", txt)
print(x)
Output:
[]
{m,n}- Braces
Braces match any repetitions preceding RegEx from m to n inclusive.
For example:
● a{2, 4} will be matched for the string aaab, baaaac, gaad, but will not be matched
for strings like abc, bc because there is only one a or no a in both the cases.
For example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed excactly 2 (any)
characters, and an "o":
x = re.findall("he.{2}o", txt)
print(x)
Output:
['hello']
| - OR
The Or symbolt checks whether the pattern before or after the “or” symbol is present in
the string or not.
For example:
● a|b will match any string that contains a or b such as acd, bcd, abcd, etc.
For example:
import re
txt = "The rain in Spain falls mainly in the plain!"
#Check if the string contains either "falls" or "stays":
x = re.findall("falls|stays", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
['falls']
Yes, there is at least one match!
(<RegEx>)- Group
Group symbol is used to group sub-patterns.
For example:
● (a|b)cd will match for strings like acd, abcd, gacd, etc.
Special Sequences
In this section of our RegEx Python cheat sheet, we’ll discuss various special
sequences with suitable examples.
\d Returns a match where the string contains digits (numbers from 0-9) "\d"
\D Returns a match where the string DOES NOT contain digits "\D"
\s Returns a match where the string contains a white space character "\s"
Returns a match where the string DOES NOT contain a white space
\S character "\S"
Returns a match where the string contains any word characters
(characters from a to Z, digits from 0-9, and the underscore _
\w character) "\w"
Returns a match where the string DOES NOT contain any word
\W characters "\W"
\Z Returns a match if the specified characters are at the end of the string "Spain\Z"
\A
import re
txt = "The rain in Spain"
Output:
['The']
Yes, there is a match!
\b
import re
txt = "The rain in Spain"
#Check if "ain" is present at the beginning of a WORD:
x = re.findall(r"\bain", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
[]
No match
['ain', 'ain']
Yes, there is at least one match!
\B
import re
Output:
['ain', 'ain']
Yes, there is at least one match!
\d
import re
txt = "The rain in Spain"
#Check if the string contains any digits (numbers from 0-9):
x = re.findall("\d", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
[]
No match
\D
import re
txt = "The rain in Spain"
#Return a match at every no-digit character:
x = re.findall("\D", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
['T', 'h', 'e', ' ', 'r', 'a', 'i', 'n', ' ', 'i', 'n', ' ', 'S', 'p', 'a',
'i', 'n']
Yes, there is at least one match!
\s
import re
txt = "The rain in Spain"
#Return a match at every white-space character:
x = re.findall("\s", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
Output:
['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!
\w
import re
txt = "The rain in Spain"
#Return a match at evry word character (characters from a to Z, digits from
0-9, and the underscore _ character):
x = re.findall("\w", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!
\W
import re
txt = "The rain in Spain"
#Return a match at every NON word character (characters NOT between a and
Z. Like "!", "?" white-space etc.):
x = re.findall("\W", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
\Z
import re
txt = "Te rain in Spain"
#Check if the string ends with "Spain":
x = re.findall("Spain\Z", txt)
print(x)
if x:
print("Yes, there is a match!")
else:
print("No match")
Output:
['Spain']
Yes, there is a match!
Sets
This is a set of characters enclosed in square brackets [] with a special meaning. In this
section of our Python regular expressions cheat sheet, we’ll explain all set types with
examples.
[arn]
This will return a match where one of the specified characters (a, r, or n) are present.
For example:
import re
txt = "The rain in Spain"
#Check if the string has any a, r, or n characters:
x = re.findall("[arn]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
[a-n]
This will return a match for any lower case character, alphabetically between a and n.
For example:
import re
txt = "The rain in Spain"
#Check if the string has any characters between a and n:
x = re.findall("[a-n]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
['h', 'e', 'a', 'i', 'n', 'i', 'n', 'a', 'i', 'n']
Yes, there is at least one match!
[^arn]
This will return a match for any character EXCEPT a, r, and n.
For example:
import re
txt = "The rain in Spain"
#Check if the string has other characters than a, r, or n:
x = re.findall("[^arn]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
['T', 'h', 'e', ' ', 'i', ' ', 'i', ' ', 'S', 'p', 'i']
Yes, there is at least one match!
[0123]
This will return a match where any of the specified digits (0, 1, 2, or 3) are present.
For example:
import re
txt = "The rain in Spain"
#Check if the string has any 0, 1, 2, or 3 digits:
x = re.findall("[0123]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
[]
No match
[0-9]
This will return a match for any digit between 0 and 9.
For example:
import re
txt = "8 times before 11:45 AM"
#Check if the string has any digits:
x = re.findall("[0-9]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
[0-5][0-9]
This will return a match for any two-digit numbers from 00 and 59.
For example:
import re
txt = "8 times before 11:45 AM"
#Check if the string has any two-digit numbers, from 00 to 59:
x = re.findall("[0-5][0-9]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
['11', '45']
Yes, there is at least one match!
[a-zA-Z]
This will return a match for any character alphabetically between a and z, lower case
OR upper case.
For example:
import re
txt = "8 times before 11:45 AM"
#Check if the string has any characters from a to z lower case, and A to Z
upper case:
x = re.findall("[a-zA-Z]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
['t', 'i', 'm', 'e', 's', 'b', 'e', 'f', 'o', 'r', 'e', 'A', 'M']
Yes, there is at least one match!
[+]
In sets, +, *, ., |, (), $,{} has no special meaning. So, [+] means: return a match for any +
character in the string.
For example:
import re
txt = "8 times before 11:45 AM"
#Check if the string has any + characters:
x = re.findall("[+]", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Output:
[]
No match
Regex Module in Python
Python comes with a module named ‘re’. You must have noticed in the above example
where we have imported the module ‘re’. This module consists of several functions that
will help you perform various actions.
findall() function
This is a built-in function of the ‘re;’ module that handles the regular expression.
Syntax:
Each string is evaluated from left to right and finds all the matches of the pattern within
the string. However, the result depends on the pattern.
● If the pattern has no capturing groups, the findall() function returns a list of
strings that match the whole pattern.
● If the pattern has one capturing group, the findall() function returns a list of
strings that match the group.
● If the pattern has multiple capturing groups, the findall() function returns the
tuples of strings that match the groups.
● It’s important to note that the non-capturing groups do not affect the form of the
return result.
For example:
print(matches)
Output:
['black', 'blue']
Output:
['ack', 'ue']
Output
Output:
Syntax:
For example:
import re
s = 'Readability counts.'
pattern = r'[aeoui]'
matches = re.finditer(pattern, s)
for match in matches:
print(match)
Output:
Search() Function
The search() function scans the string from left to right and finds the first location where
the pattern produces a match. It returns a Match object if the search was successful or
None otherwise.
Syntax:
re.search(pattern, string, flags=0)
For example:
Output
<
re.Match object; span=(7, 8), match='3'>
Output:
('CPython', 'CPy')
fullmatch() function
This function will return a match object if the whole string matches a regular
expression’s search pattern, or none otherwise.
Syntax:
For example:
Output:
Match() Function
The match function of the re module allows you to search for a pattern at the beginning
of the string.
Syntax:
For example:
Output:
Sub() Function
This function of the re module allows you to handle the regular expression.
Syntax:
It will search for the pattern in the string and replace the matched strings with the
replacement (repl). If the sub() function couldn’t find a match, it will return the original
string. Otherwise, the sub()function returns the string after replacing the matches.
For example:
● To turn the phone number (212)-456-7890 into 2124567890
import re
phone_no = '(212)-456-7890'
pattern = '\D'
result = re.sub(pattern, '',phone_no)
print(result)
Output:
2124567890
Output:
● Backreference example
import re
s = 'Make the World a *Better Place*'
pattern = r'\*(.*?)\*'
replacement = r'<b>\1<\\b>'
html = re.sub(pattern, replacement, s)
print(html)
Output:
Subn() Function
This function is similar to sub() in all ways, except in how it provides output. It returns a
tuple with count of the total of replacement and the new string rather than just the string.
Syntax:
For example:
import re
print(re.subn('ub', '~*', 'Subject has Uber booked already'))
t = re.subn('ub', '~*', 'Subject has Uber booked already',
flags=re.IGNORECASE)
print(t)
print(len(t))
# This will give same output as sub() would have
print(t[0])
Output:
escape() function
This function will return a string with all non-alphanumerics backslashes. This is useful if
you want to match an arbitrary literal string that may have regular expression
metacharacters in it.
Syntax:
re.escape(string)
For example:
import re
print(re.escape("This is Awesome even 1 AM"))
print(re.escape("I Asked what is this [a-9], he said \t ^WoW"))
Output:
This\ is\ Awesome\ even\ 1\ AM
I\ Asked\ what\ is\ this\ \[a\-9\]\,\ he\ said\ \ \ \^WoW
Compile() Function
This function will compile the regular expressions into pattern objects, which have
methods for various operations such as searching for pattern matches or performing
string substitutions.
Syntax:
re.compile(string)
For example:
import re
p = re.compile('[a-e]')
# findall() searches for the Regular Expression
# and return a list upon finding
print(p.findall("Aye, said Mr. Gibenson Stark"))
Output:
Split() Function
It splits a string by the matches of a regular expression.
Syntax:
l = re.split(pattern, s)
print(l)
Output
Output
Groups
A group is a part of a regular expression enclosed in parentheses () metacharacter.
Expressions Explanations
Matches the expression inside the parentheses and groups it to capture as
() required
(?#…) Read a comment
(?PAB) Matches the expression AB, which can be retrieved with the group name.
Matches the expression as represented by A, but cannot be retrieved
(?:A) afterwards.
(?P=group) Matches the expression matched by an earlier group named “group”
For example:
import re
example = (re.search(r"(?:AB)","ACABC"))
print(example)
print(example.groups())
result = re.search(r"(\w*), (\w*)","seeks, zest")
print(result.groups())
Output:
Assertions
In Python RegEx, we use Lookahead as an assertion. It determines the success or
failure regarding whether the pattern is to the right of the parser’s current position.
Expression Explanation
Matches the expression A only if it is followed by B. (Positive lookahead
A(?=B) assertion)
Matches the expression A only if it is not followed by B. (Negative look
A(?!B) ahead assertion)
Matches the expression A only if B is immediate to its left. (Positive look
(?<=B)A behind assertion)
Matches expression A only if B is not immediately to its left. (Negative
(?<!B)A look behind assertion)
For example:
import re
print(re.search(r"z(?=a)", "pizza"))
print(re.search(r"z(?!a)", "pizza"))
Output
Flags or Modifiers
Expression Explanation
a Matches ASCII only
i Ignore case
L Locale character classes
m ^ and $ match start and end of the line (Multi-line)
s Matches everything including newline as well
u Matches unicode character classes
x Allow spaces and comments (Verbose)
For example:
import re
Output: