regex operations
findall
metacharacters
['h', 'e', 'a', 'i', 'i', 'a', 'i']
import re
email = "john@example.com"
pattern = r"([a-z]+)@([a-z]+)\.com"
# Apply the pattern and extract the capture groups
match = re.match(pattern, email)
print(match.group())
if match:
username = match.group(1)
domain = match.group(2)
print("Username:", username)
print("Domain:", domain)
else:
print("Email address is not valid")
john@example.com
Username: john
Domain: example
special sequences
characterclass
Returns a match if the specified characters are at the beginning of the string
Returns a match where the specified characters are at the beginning or at the end of a word (the “r” in the beginning ensures the string is being treated as a “raw string”)
Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word (the “r” in the beginning is making sure that the string is being treated as a “raw string”)
Returns a match where the string contains digits (numbers from 0-9)
Returns a match where the string DOES NOT contain digits
import re
txt = "The rain in 33 Spain"
#Return a match at every no-digit character:
x = re.findall("\D", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
['T', 'h', 'e', ' ', 'r', 'a', 'i', 'n', ' ', 'i', 'n', ' ', ' ', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!
Returns a match where the string contains a white space character
Returns a match where the string DOES NOT contain a white space character
import re
txt = "The rain in Spain"
#Return a match at every NON white-space character:
x = re.findall("\S", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n']
Yes, there is at least one match!
Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)
import re
txt = "The rain in Spain 35 _"
#Return a match at evry word character (characters from a to Z, digits from
# 0-9, and the underscore _ character):
x = re.findall("\w", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
['T', 'h', 'e', 'r', 'a', 'i', 'n', 'i', 'n', 'S', 'p', 'a', 'i', 'n', '3', '5', '_']
Yes, there is at least one match!
Returns a match where the string DOES NOT contain any word characters
import re
txt = "The rain in % Spain"
#Return a match at every NON word character (characters NOT between a and
#Z. Like "!", "?" white-space etc.):
x = re.findall("\W", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
[' ', ' ', ' ', '%', ' ']
Yes, there is at least one match!
Returns a match if the specified characters are at the end of the string
This is a set of characters enclosed in square brackets [] with a special meaning.
This will return a match where one of the specified characters (a, r, or n) are present.
This will return a match for any lower case character, alphabetically between a and n. ]
This will return a match for any character EXCEPT a, r, and n.
This will return a match where any of the specified digits (0, 1, 2, or 3) are present.
This will return a match for any two-digit numbers from 00 and 59.
This will return a match for any character alphabetically between a and z, lower case OR upper case.
In sets, +, *, ., |, (), $,{} has no special meaning. So, + means: return a match for any + character in the string.
finditer
import re
s = 'Readability counts.'
pattern = r'[aeoui]'
matches = re.finditer(pattern, s)
for match in matches:
print(match)
<re.Match object; span=(1, 2), match='e'>
<re.Match object; span=(2, 3), match='a'>
<re.Match object; span=(4, 5), match='a'>
<re.Match object; span=(6, 7), match='i'>
<re.Match object; span=(8, 9), match='i'>
<re.Match object; span=(13, 14), match='o'>
<re.Match object; span=(14, 15), match='u'>
The search() function scans the string from left to right and finds the first location where the pattern produces a match. It returns a Match object if the search was successful or None otherwise. re.search(pattern, string, flags=0)
image.png
import re
s = 'CPython, IronPython, or Cython'
pattern = r'\b((\w+)thon)\b'
match = re.search(pattern, s)
if match is not None:
print(match.groups())
# The pattern r'\b((\w+)thon)\b' has two capturing groups:
# ● (\w+) – captures the characters at the beginning of the word.
# ● ((\w+)thon) – captures the whole word.
('CPython', 'CPy')
This function will return a match object if the whole string matches a regular expression’s search pattern, or none otherwise.
Syntax: re.fullmatch(pattern, string, flags=0)
Pattern is the regular expression.
String is the input string provided by the user.
Flag is optional and by default is zero. It accepts one or more RegEx flags. The flags parameter changes how the RegEx engine matches the pattern.
email valid
import re
email = 'no-reply@pythontutorial.net'
pattern = r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}'
match = re.fullmatch(pattern, email)
if match is not None:
print(f'The email "{match.group()}" is valid')
else:
print(f'The email "{email}"" is not valid')
The email "no-reply@pythontutorial.net" is valid
re.match(pattern, string, flags=0)
match
rematch
re.sub(pattern, repl, string, count=0, flags=0)
- Pattern is a regular expression or Pattern object. - Repl is the replacement. - String is the input string provided by the user. - Count parameter specifies the maximum number of matches that the sub() function should replace. If you pass zero or skip it, the sub()function will replace all the matches. - Flags is one or more RegEx flags to modify the standard pattern behaviour
sub
re.escape(string)
escape
re.compile(string)
compile
split(pattern, string, maxsplit=0, flags=0)
image.png
FLAGS
re.A
or re.ASCII
(ASCII-only matching):re.I
or re.IGNORECASE
(case-insensitive matching):re.M
or re.MULTILINE
(multi-line matching):re.S
or re.DOTALL
(dot matches any character including newline):re.X
or re.VERBOSE
(allow comments in regex):import re
pattern = re.compile(r'''
\d+ # Match one or more digits
\s* # Match zero or more whitespace characters
[a-zA-Z]+ # Match one or more letters
''', flags=re.VERBOSE)
text = "123 ABC"
match = pattern.match(text)
print(match.group()) # Output: '123 ABC'
123 ABC
Manish Patel