Python Regular Expression


Python re module handles regular expression.

search method: return MatchObject for a match, None if no match

>>> import re
>>> x = "EndMemo Python Online Tutorial"
>>> m = re.search('Python',x)
>>> if m: print("found")
...
found

match method: return MatchObject for a match at the beginning of the string
>>> m = re.match('Python',x)
>>> if m: print("found")  #No match
>>> if m is None: print("no match")
...
no match
>>> m = re.match('EndMemo',x)
>>> if m is None: print("no match")
...
>>> if m: print("found")
...
found

split method: split string by pattern
>>> re.split('\s',x)
['EndMemo', 'Python', 'Online', 'Tutorial']
>>> re.split('o',x)
['EndMem', ' Pyth', 'n Online Tut', 'rial']

findall method: return all matches
>>> re.findall('o[a-z]',x)
['on', 'or']

sub method: replace the match with pattern
>>> re.sub('o',"XXX", x)
'EndMemXXX PythXXX Online TutXXXial'

Non greedy regular expression:
>>> x = "EndMemXXXXXXXXXXXXX PythXXXXXXn"
>>> re.sub('X{6,13}','o',x)
'EndMemo Python'
>>> x = "EndMemXXXXXXXXXXXXX PythXXXXXXn"
>>> re.sub('X{6,13}?','o',x)
'EndMemooX Python'

Regular Expression Syntax:
SyntaxDescription
\dDigit, 0,1,2 ... 9
\DNot Digit
\sSpace
\SNot Space
\wWord
\WNot Word
\tTab
\nNew line
^Beginning of the string
$End of the string
\Escape special characters, e.g. \\ is "\", \+ is "+"
|Alternation match. e.g. /(e|d)n/ matches "en" and "dn"
Any character, except \n or line terminator
[ab]a or b
[^ab]Any character except a and b
[0-9]All Digit
[A-Z]All uppercase A to Z letters
[a-z]All lowercase a to z letters
[A-z]All Uppercase and lowercase a to z letters
i+i at least one time
i*i zero or more times
i?i zero or 1 time
i{n}i occurs n times in sequence
i{n1,n2}i occurs n1 - n2 times in sequence
i{n1,n2}?non greedy match, see above example
i{n,}i occures >= n times
*?, +?, ??non greedy match
(...)Match whatever in th parentheses