Python RegEx Tutorial

 Python RegEx Tutorial

Regular Expressions (RegEx) are a powerful tool in Python for searching, matching, and manipulating strings. They allow you to define complex search patterns and work with text data efficiently. Whether you're filtering logs, validating input, or parsing data, RegEx is an essential skill for any Python programmer.


📌 What is RegEx?

A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. It is commonly used for:

  • Validating input (e.g., emails, phone numbers, passwords)
  • Searching for specific patterns in text
  • Replacing or splitting text based on rules
  • Data cleaning and preprocessing

📌 RegEx Module in Python

Python provides the built-in re module to work with regular expressions.


import re

Once imported, you can use various functions like findall(), search(), split(), sub(), and match objects.


📌 RegEx Functions in Python

Function Description Example
findall() Returns all matches in a list re.findall("ai", "The rain in Spain") → ['ai', 'ai']
search() Returns the first match object re.search("ai", "The rain in Spain")
split() Splits string by pattern re.split("\s", "Hello World") → ['Hello','World']
sub() Replaces matches with text re.sub("\s", "-", "Hello World") → "Hello-World"

📌 Metacharacters in RegEx

Metacharacters are special symbols in RegEx with specific meanings.

MetacharacterDescriptionExample
.Any character except newline"he..o" → matches "hello"
^Starts with"^Hello" → matches "Hello World"
$Ends with"World$" → matches "Hello World"
*0 or more occurrences"aix*" → matches "ai", "aix"
+1 or more occurrences"aix+" → matches "aix"
{}Exact number of occurrences"al{2}" → matches "all"
[]Set of characters"[a-m]"
()Grouping"(abc)?"
|OR"cat|dog" → matches "cat" or "dog"

📌 Flags in RegEx

Flags modify the behavior of RegEx patterns.

FlagDescription
re.ICase-insensitive matching
re.MMulti-line matching
re.SDot matches newline
re.XAllow whitespace and comments in pattern

📌 Special Sequences

SequenceDescription
\dMatches digits [0-9]
\DMatches non-digits
\sMatches whitespace
\SMatches non-whitespace
\wMatches word characters
\WMatches non-word characters
\bMatches word boundary
\AMatches start of string
\ZMatches end of string

📌 Sets

Sets let you define a range of characters inside [].

  • [abc] → Matches 'a', 'b', or 'c'
  • [a-z] → Matches lowercase letters
  • [0-9] → Matches digits
  • [^0-9] → Matches non-digits

📌 Match Object

The Match object provides details about the search result.


import re

txt = "The rain in Spain"
x = re.search("ai", txt)

print(x.span())   # (5, 7)
print(x.start())  # 5
print(x.end())    # 7
print(x.string)   # The rain in Spain

📌 Examples of RegEx Functions

✔ findall()


import re
txt = "The rain in Spain"
print(re.findall("ai", txt))  # ['ai', 'ai']

✔ search()


txt = "The rain in Spain"
x = re.search("ai", txt)
print("First match at:", x.start())

✔ split()


txt = "The rain in Spain"
print(re.split("\s", txt))  # ['The', 'rain', 'in', 'Spain']

✔ sub()


txt = "The rain in Spain"
print(re.sub("\s", "-", txt))  # The-rain-in-Spain

💡 Tips for Using RegEx

  • Always test patterns with Regex101.
  • Keep patterns simple and readable.
  • Use raw strings in Python (r"pattern") to avoid escaping issues.
  • Use grouping () and backreferences for complex replacements.

📝 Exercises

  1. Write a RegEx to validate an email address.
  2. Extract all numbers from the string "Order 123, Bill 456, Item 789".
  3. Split a text into words using whitespace as a delimiter.
  4. Replace all vowels in a string with *.

❓ FAQs

Q1. What is the difference between search() and match()?
match() checks only at the beginning of the string, while search() looks anywhere in the string.

Q2. How do I ignore case sensitivity in RegEx?
Use the re.I flag.

Q3. Are RegEx patterns the same in all languages?
Mostly yes, but some implementations vary. Python uses Perl-style RegEx.

Q4. Can RegEx handle multiline text?
Yes, use the re.M flag.


✅ Conclusion

Python RegEx is a powerful tool for string manipulation and pattern matching. By mastering the re module, you can validate, search, replace, and split text with ease. Understanding metacharacters, flags, and match objects will make you more efficient in handling text-based data.


Post a Comment

0 Comments