Regular Expression – REGEX
regex – pattern matching
Letters inside the bracket – Matches only one time without quantifiers.
[ri] – r or i
[a-z] – a to z
[A-Z] – A to Z
[a-z A-Z] – a to z or A to Z
[0-9] – 0 to 9
Brackets with quantifiers to determine number of times.
i.e. use quantifiers to match more characters
[ ]? – 0 or 1 time – will be used for optional character
[ ]+ – 1 or more times
[ ]* – o or more times
[ ]{n} – n times
[ ]{n,} – n or more times
[ ]{n,m} – occurs atlest n times but less than m times
Metacharacters
. = any character except newline character
\d = [0-9]
\D = not decimal
\w = [a-zA-Z_0-9]
\W = not a word [opposite od \w]
\s = whitespace (space, tab, newline)
\S = not whitespace
$ = end of string
cap symbol = Beginning of string
| = either or
() = group example (www.)? -> sometimes we enter www. sometimes we don’t. But when we enter we will enter www.
escape character
Use backslash whenever you are matching a pattern that has a special meaning. example: ? + * all have special meaning.
To search those characters add \ in front.
Example
- To check mobile number is having the correct format:
Number should start with 0, Next is 2 followed by any number but the (mobile number)
size should be between 10 to 11) e.g. 021 333 2222 or 021 444 22224
[0][2][0-9]{8-9}
Explanation: 0 and 2 followed by any digit from 0-9 of size 8 or 9 – So the total is either 10 or 11 digits.
- Phone number with country code. But only 61 or 64 is allowed and followed by 10 digit number
[6][1 4][0-9]{10}
Explanation: 6 followed by 1 or 4 and then 10 digit i.e. any number from 0-9
- Some Random example:
[A-Z][a-z]+[0-9][a-z]+
Upper case letter followed by lowercase (1 or more characters) and a number (one digit) in between and again lower case letters (1 or more characters)
Ravi3raju
R-avi-3-raju – Pattern breakdown
- Email Address:
[a-zA-Z_-.]+[@][A-Za-z]{2,}[.][a-z]{2,3}
1st part – [a-zA-Z_-.]+ any characters inside the bracket should exist one or more
2nd part – @
3rd part – [A-Za-z]+ any characters that are in the bracket at least the size of 2 or more e.g be.com or gmail.com – be(2) or gmail(4)
4th part – .
5th part – [a-z]{2,3} 2 or 3 characters mentioned inside the bracket. i.e. lower case letter. e.g. com or in or fr or nz or au
Commonly these Regex is used for pattern matching (used in the filter i.e. where clause – rlike),
extract when match(regex_extract),
replace when match (regex_replace)
Pandas – regex – goes with str function
Think like a SQL “like” operator but matching against a complex pattern:

python – regex

regex_replace
Pandas – remove alphabets and display only numbers



regex_replace
Python – remove alphabets and display only numbers

