Python Regular Expression to extract email from text

python-regular-expression-extract-email-feature-image

In this tutorial, we’re gonna look at way to use Python Regular Expression to extract email from a text.

Related Post: Python Regular Expression

Python Regular Expression to extract email

Import the regex module

All Python regex functions in re module. Remember to import it at the beginning of Python code or any time IDLE is restarted.

>>> import re
Create Regex object

We create a Regex object by passing a string value representing regular expression to re.compile().

To match the email pattern:

>>> regex = re.compile(r'''(
...    [a-zA-Z0-9._%+-]+ # username
...    @ # @ symbol
...    ([a-zA-Z0-9.-]+) # domain name
...    (\.[a-zA-Z]{2,4}) # dot-something
... )''', re.VERBOSE)
Get Match object

Regex object has search() method that searches the string that matches to the regex. It returns:
None if the regex pattern is not found
– a Match object if the pattern is found

# text = 'Send us an email to customer_service_007@grokonez.com'
mo = regex.search(text)
Get matched text

We call Match object’s group() method to get the actual matched text from the searched string.

>>> mo.group()
'customer_service_007@grokonez.com'
>>> mo.group(2)
'grokonez'
>>> mo.group(3)
'.com'

In this example, we use parentheses to group the pattern into several groups, so we can call groups() method that returns a tuple of multiple values.

>>> mo.groups()
('customer_service_007@grokonez.com', 'grokonez', '.com')
Extract email

Now we want to store email data in some variables: email, domainName, toplevel. They are built from groups 0, 2, 3 (whole email, domain name, top level domain name).

>>> email = mo.group()
>>> email
'customer_service_007@grokonez.com'

>>> domainName = mo.group(2)
>>> domainName
'grokonez'

>>> toplevel = mo.group(3)
>>> toplevel
'.com'

Full code

>>> import re

>>> regex = re.compile(r'''(
...    [a-zA-Z0-9._%+-]+ # username
...    @ # @ symbol
...    ([a-zA-Z0-9.-]+) # domain name
...    (\.[a-zA-Z]{2,4}) # dot-something
... )''', re.VERBOSE)

text = 'Send us an email to customer_service_007@grokonez.com'
mo = regex.search(text)

>>> email = mo.group()
>>> email
'customer_service_007@grokonez.com'

>>> domainName = mo.group(2)
>>> domainName
'grokonez'

>>> toplevel = mo.group(3)
>>> toplevel
'.com'


By grokonez | January 6, 2019.


Related Posts


Got Something To Say:

Your email address will not be published. Required fields are marked *

*