Python中利用正则表达式的 16个常见任务-编程学习网

安装与导入

首先，确保你的Python环境中已经安装了re模块。这是Python的标准库之一，所以通常不需要额外安装。

import re

字符匹配

单个字符：使用方括号[]表示一组字符中的任意一个。

# 匹配任何字母
pattern = "[a-zA-Z]"
string = "Hello World!"
match = re.search(pattern, string)
print(match.group())  # 输出: H

多个字符：使用*表示零次或多次出现。

# 匹配任意数量的空格
pattern = "\s*"
string = "    Hello World!"
match = re.match(pattern, string)
print(match.group())  # 输出: '    '

范围匹配

使用-定义一个范围内的字符。

# 匹配小写字母a到e
pattern = "[a-e]"
string = "abcdeABCDE"
matches = re.findall(pattern, string)
print(matches)  # 输出: ['a', 'b', 'c', 'd', 'e']

排除字符

使用^排除某些字符。

# 匹配除了a到z之外的所有字符
pattern = "[^a-z]"
string = "123ABCdef"
matches = re.findall(pattern, string)
print(matches)  # 输出: ['1', '2', '3', 'A', 'B', 'C']

字符集组合

可以将多个字符集组合起来使用。

# 匹配数字或大写字母
pattern = "[0-9A-Z]+"
string = "Hello123World"
matches = re.findall(pattern, string)
print(matches)  # 输出: ['123']

位置锚定

^表示行首，$表示行尾。

# 匹配以大写字母开头的单词
pattern = "^[A-Z][a-zA-Z]*"
string = "Hello world"
matches = re.findall(pattern, string)
print(matches)  # 输出: ['Hello']

分组与引用

使用圆括号()来创建一个捕获组。

# 捕获日期格式
pattern = "(\d{4})-(\d{2})-(\d{2})"
string = "Today is 2023-04-01."
match = re.match(pattern, "2023-04-01")
if match:
    year = match.group(1)
    month = match.group(2)
    day = match.group(3)
    print(f"Year: {year}, Month: {month}, Day: {day}")

非捕获组

如果不关心某部分的内容，可以使用(?:)。

# 不捕获中间的冒号
pattern = r"(\d{2}):(?:\d{2}):\d{2}"
string = "09:30:15"
match = re.match(pattern, string)
if match:
    hour = match.group(1)
    print(hour)  # 输出: 09

替换文本

使用re.sub()方法替换字符串中的匹配项。

# 将所有空格替换成下划线
pattern = "\s+"
string = "Hello   World"
new_string = re.sub(pattern, "_", string)
print(new_string)  # 输出: Hello_World

贪婪与非贪婪匹配

贪婪匹配：默认情况下，正则表达式会尽可能多地匹配字符。
非贪婪匹配：使用?使匹配变得“懒惰”，即尽可能少地匹配字符。

# 贪婪匹配
pattern = "<.*>"
string = "Hello World"
match = re.search(pattern, string)
print(match.group())  # 输出: Hello World

# 非贪婪匹配
pattern = "<.*?>"
string = "Hello World"
match = re.search(pattern, string)
print(match.group())  # 输出:

条件分支

使用(?P)命名捕获组，并通过(?P=name)引用它们。

# 匹配重复的单词
pattern = r"\b(\w+)\b\s+\1\b"
string = "hello hello world"
match = re.sub(pattern, r"\1", string, flags=re.IGNORECASE)
print(match)  # 输出: hello world

重复限定符

使用{n}指定精确重复次数。
使用{n,}指定至少重复n次。
使用{n,m}指定重复n到m次。

# 匹配恰好重复三次的字符
pattern = r"a{3}"
string = "aaabbbccc"
matches = re.findall(pattern, string)
print(matches)  # 输出: []

# 匹配至少重复两次的字符
pattern = r"a{2,}"
string = "aaabbbccc"
matches = re.findall(pattern, string)
print(matches)  # 输出: ['aaa']

# 匹配重复两到四次的字符
pattern = r"a{2,4}"
string = "aaabbbccc"
matches = re.findall(pattern, string)
print(matches)  # 输出: ['aaa']

特殊字符

点号 .：匹配除换行符之外的任何字符。
反斜杠 \：转义特殊字符。

# 匹配包含任何字符的单词
pattern = r"\w+"
string = "hello\nworld"
matches = re.findall(pattern, string)
print(matches)  # 输出: ['hello']

# 匹配特殊字符
pattern = r"\."
string = "hello.world"
matches = re.findall(pattern, string)
print(matches)  # 输出: ['.']

边界限定符

单词边界 \b：匹配单词的开始和结束。
非单词边界 \B：匹配非单词的位置。

# 匹配单词边界
pattern = r"\bhello\b"
string = "hello world"
matches = re.findall(pattern, string)
print(matches)  # 输出: ['hello']

# 匹配非单词边界
pattern = r"\Bworld\B"
string = "hello world"
matches = re.findall(pattern, string)
print(matches)  # 输出: []

标志位

忽略大小写 re.IGNORECASE：使匹配不区分大小写。
多行模式 re.MULTILINE：使^和$分别匹配每一行的开始和结束。
点号匹配换行符 re.DOTALL：使.匹配包括换行符在内的任何字符。

# 忽略大小写
pattern = r"hello"
string = "Hello world"
matches = re.findall(pattern, string, re.IGNORECASE)
print(matches)  # 输出: ['Hello']

# 多行模式
pattern = r"^hello"
string = "hello\nworld"
matches = re.findall(pattern, string, re.MULTILINE)
print(matches)  # 输出: ['hello']

# 点号匹配换行符
pattern = r"hello.*world"
string = "hello\nworld"
matches = re.findall(pattern, string, re.DOTALL)
print(matches)  # 输出: ['hello\nworld']

实战案例分析

假设我们需要从一段文本中提取所有的邮箱地址。这可以通过正则表达式轻松实现。邮箱地址的一般形式为 username@domain.com。下面是一个简单的示例：

文章详情

Python中利用正则表达式的 16个 常见任务

安装与导入

字符匹配

范围匹配

排除字符

字符集组合

位置锚定

分组与引用

非捕获组

替换文本

贪婪与非贪婪匹配

条件分支

重复限定符

特殊字符

边界限定符

标志位

实战案例分析

软考中级精品资料免费领

相关文章

猜你喜欢

Python中利用正则表达式的 16个 常见任务

Python验证的50个常见正则表达式

使用Python验证常见的50个正则表达式

python正则表达式常见的知识点汇总

.net中常用的正则表达式

python常用的正则表达式总结

python常用的正则表达式大全

python正则表达式常见的知识点有哪些

15 个常用的正则表达式技巧

Python中常见的正则表达式问题及解决方法

Python正则表达式常用的15个符号整理

Mysql中正则表达式Regexp常见用法及说明

python 中正则表达式的使用

Python中正则表达式的用法

Python中使用正则表达式的11个场景

Python中的正则表达式怎么用

Python中怎么利用正则表达式匹配子串

在 Python 中使用正则表达式的九个实例

.net中常用的正则表达式有哪些

python中的正则表达式怎么使用

Python中利用正则表达式的 16个常见任务

Python中利用正则表达式的 16个常见任务