Python Regular Expression
Python re
module handles regular expression.
search
method: return
>>> import re >>> x = "EndMemo Python Online Tutorial" >>> m = re.search('Python',x) >>> if m: print("found") ...
found
match
method: return >>> m = re.match('Python',x) >>> if m: print("found")#No match >>> if m is None: print("no match") ...
no match
>>> m = re.match('EndMemo',x) >>> if m is None: print("no match") ... >>> if m: print("found") ...
found
split
method: split string by pattern
>>> re.split('\s',x)
['EndMemo', 'Python', 'Online', 'Tutorial']
>>> re.split('o',x)
['EndMem', ' Pyth', 'n Online Tut', 'rial']
findall
method: return all matches
>>> re.findall('o[a-z]',x)
['on', 'or']
sub
method: replace the match with pattern
>>> re.sub('o',"XXX", x)
'EndMemXXX PythXXX Online TutXXXial'
Non greedy regular expression:
>>> x = "EndMemXXXXXXXXXXXXX PythXXXXXXn" >>> re.sub('X{6,13}','o',x)
'EndMemo Python'
>>> x = "EndMemXXXXXXXXXXXXX PythXXXXXXn" >>> re.sub('X{6,13}?','o',x)
'EndMemooX Python'
Regular Expression Syntax:
Syntax | Description |
\d | Digit, 0,1,2 ... 9 |
\D | Not Digit |
\s | Space |
\S | Not Space |
\w | Word |
\W | Not Word |
\t | Tab |
\n | New line |
^ | Beginning of the string |
$ | End of the string |
\ | Escape special characters, e.g. \\ is "\", \+ is "+" |
| | Alternation match. e.g. /(e|d)n/ matches "en" and "dn" |
• | Any character, except \n or line terminator |
[ab] | a or b |
[^ab] | Any character except a and b |
[0-9] | All Digit |
[A-Z] | All uppercase A to Z letters |
[a-z] | All lowercase a to z letters |
[A-z] | All Uppercase and lowercase a to z letters |
i+ | i at least one time |
i* | i zero or more times |
i? | i zero or 1 time |
i{n} | i occurs n times in sequence |
i{n1,n2} | i occurs n1 - n2 times in sequence |
i{n1,n2}? | non greedy match, see above example |
i{n,} | i occures >= n times |
*?, +?, ?? | non greedy match |
(...) | Match whatever in th parentheses |