基本无害的正则表达式 - 宇宙尽头的餐馆

整理了一些平日用的比较多的正则表达式，非专业我感觉足够了。本文亦提供 PDF 和 tex 版本，可在文末下载。

匹配符（Quantifiers）#

d? : d 出现 0 次或 1 次
a* : a 出现 0 次或多次
a+ : a 出现 1 次或多次
a{6} : a 出现正好 6 次
a{2,} : a 出现至少 2 次
a{2,6} : a 出现 2 到 6 次

匹配多个字符（Grouping）#

(ab)+ : ab 序列出现 1 次或多次
- Example: (ab)+ matches ab, abab, ababab, etc.

或运算（Alternation）#

竖线 | 用于表示“或”，类似“或”语句。

a (cat|dog) : 匹配 a cat 或 a dog
- Example: Matches a cat or a dog.
a cat|dog : 匹配 a cat 或 dog
- Example: Matches a cat or dog.

字符类 (Character Classes):#

[abc]+ : 由 a, b, c 组成的字符出现 1 次或多次
- Example: [abc]+ matches a, b, c, aa, abc, bca, etc.
[a-zA-Z0-9] : 匹配任何一个大写字母、小写字母或数字
- Example: Matches A, z, 5.
[^0-9] : 排除数字，匹配任何非数字字符 (包括换行符)
- Example: [^0-9] matches a, $, (space), etc., but not 1.

元字符 (Metacharacters)、简写字符集（Shorthand Character Sets）#

\d : 数字字符 (0-9)
- Example: \d+ 匹配一个或多个数字, like 123.
\D : 非数字字符
- Example: \D matches a, $, (space), etc.
\w : 单词字符 (字母、数字、下划线 _)
- Example: \w+ matches word, _var, num1.
\W : 非单词字符
- Example: \W matches !, @, (space), etc.
\s : 空白符 (空格, tab, 换行符等)
- Example: hello\sworld matches hello world.
\S : 非空白字符
- Example: \S+ matches word, !@#, etc.
\b : 单词的边界 (单词的开头或结尾, 单词与符号之间的边界)
- Example: \bcat\b matches cat in the cat sat but not in catalog.
\B : 非单词的边界 (符号与符号, 单词与单词内部的边界)
- Example: cat\B matches cat in catalog but not in the cat.
. : 任意字符 (不包含换行符)
- Example: a.c matches abc, a1c, a!c, etc.
\. : 匹配实际的点字符 . (通过 \ 进行转义)
- Example: example\.com matches example.com.
\n：一个换行符
\r：一个回车符
\t：一个制表符
^ : 匹配行首
- Example: ^Start matches Start only if it's at the beginning of a line.
$ : 匹配行尾
- Example: end$ matches end only if it's at the end of a line.

NOTE

在 VS Code 中 \s 无法匹配换行，需要指定 \n 来匹配

如 End\r?\nBegin 匹配“End”和“Begin”的条件是，“End”是行中的最后一个字符串，而“Begin”是下一行中的第一个字符串。在 Windows 操作系统中，大多数行以“\r\n”（回车符后跟新行）结束。这些字符不可见，但在编辑器中存在并传递给 .NET 正则表达式服务。在处理来自 Web 或非 Windows 操作系统的文件时，请务必考虑到这些文件仅使用新行进行换行的情况。（参考）

转义特殊字符#

反斜线 \ 在表达式中用于转义紧跟其后的字符。用于指定 { } [ ] / \ + * . $ ^ | ? 这些特殊字符。

贪婪匹配 vs 懒惰匹配（Greedy vs Lazy Matching）#

*, +, {} 默认是贪婪匹配 (匹配尽可能多的字符)
- Example: <.+> applied to https://www.flyalready. com matches the entire string https://www.flyalready. com.
在量词后添加 ? (*?, +?, {...}?) 设置为懒惰匹配 (匹配尽可能少的字符)
- Example: <.+?> applied to https://www.flyalready. com matches  and  separately.

平常用的比较多的就是这些了，下面的零宽度断言偏专业一点。

Lookarounds（零宽断言）#

真的是十分让人摸不着头脑的一个翻译。断言是什么东西我不造啊！

Lookarounds 匹配字符，但不将它们包含在匹配结果中。它们是“零宽度”的，意味着它们不消耗字符串中的任何字符。

符号	描述
?=	正先行断言-存在
?!	负先行断言-排除
?<=	正后发断言-存在
?<!	负后发断言-排除

(?=...) : 正向先行断言 (Positive Lookahead)。匹配后面跟着 ... 的位置。
- Example: (T|t)he(?=\sfat)，即 The 和 the 后面紧跟着 (空格)fat
- The fat cat sat on the mat.
(?!...) : 负向先行断言 (Negative Lookahead)。匹配后面不跟着 ... 的位置。
- Example: (T|t)he(?!\sfat)，匹配 The 和 the，且其后不跟着 (空格)fat。
- The fat cat sat on the mat.
(?<=...) : 正向后行断言 (Positive Lookbehind)。匹配前面是 ... 的位置。
- Example: (?<=(T|t)he\s)(fat|mat) 匹配 fat 和 mat，且其前跟着 The 或 the。
- The fat cat sat on the mat.
(?<!...) : 负向后行断言 (Negative Lookbehind)。匹配前面不是 ... 的位置。
- Example: (?<!(T|t)he\s)(cat) 匹配 cat，且其前不跟着 The 或 the。
- The cat sat on cat.

下载#

PDF 版本：PDF LaTeX：LaTeX

参考资料#

> cd ..