1. 什么是正则表达式 ?
正则表达式,又称规则表达式。(英语:Regular Expression,在代码中常简写为regex、regexp或RE),计算机科学的一个概念。 --> 百度百科
简单来说就是用 特殊字符 组成的 有一定规则 的逻辑公式(规则字符串),主要用于对字符串进行过滤操作。
2. 正则表达式的三个功能 :
- 快速
匹配
指定的字符串; 替换
遵照正则表达式规则的字符串;- 在目标字符串中
筛选
指定的子字符串;
3. 正则表达式的语法:
正则表达式是由 普通字符
(例如:a-z)和 元字符
(特殊字符)组成的文字模式,正则表达式作为一个模板(规则字符串,将目标字符串与所搜索的字符串进行匹配)
4. 常用元字符:
4.1 限定符
限定符 | 描述 |
---|
? | 零次或一次匹配前面的字符或子表达式,相当于{0,1} |
* | 零次或多次匹配前面字符或子表达式,相当于{0,} |
+ | 一次或多次匹配前面字符或子表达式,相当于{1,} |
{n} | 匹配n次 |
{n,} | 匹配至少n次 |
{n,m} | 匹配n到m次 |
4.2 匹配符
匹配符 | 描述 |
---|
\d | 数字字符匹配,等效于[0-9] |
\D | 非数字字符匹配,等效于[^0-9] |
\w | 匹配任何字类字符,等效于[A-Za-z0-9] |
\W | 匹配任何非字类字符,等效于[A-Za-z^0-9] |
\f | 换页符匹配 |
\n | 换行符匹配 |
\r | 匹配一个回车符 |
\s | 匹配任何空白字符 |
\S | 匹配任何非空白字符 |
\t | 制表符匹配 |
4.3 判断符
判断符 | 描述 |
---|
x|y | 匹配x或y |
[xyz] | 匹配包含的任一字符 |
[^xyz] | 反向匹配,匹配不包含任何字符 |
[a-z] | 匹配指定范围的任何字符 |
4.4 定位符
定位符 | 描述 |
---|
^ | 匹配输入字符串起始位置 |
$ | 匹配输入字符串结尾位置 |
\b | 匹配字和空格间的位置 |
\B | 非字边界匹配 |
5. 常用正则表达式
public final static Pattern NUMBERS = Pattern.compile("\\d+");
public final static Pattern WORD = Pattern.compile("[a-zA-Z]+");
public final static Pattern GENERAL = Pattern.compile("^\\w+$");
public final static Pattern CHINESE = Pattern.compile("[\u4E00-\u9FFF]");
public final static Pattern CHINESES = Pattern.compile("[\u4E00-\u9FFF]" + "+");
public final static Pattern GENERAL_WITH_CHINESE = Pattern.compile("^[\u4E00-\u9FFF\\w]+$");
public final static Pattern GROUP_VAR = Pattern.compile("\\$(\\d+)");
public final static Pattern EMAIL = Pattern.compile("(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)])", Pattern.CASE_INSENSITIVE);
public final static Pattern MOBILE = Pattern.compile("(?:0|86|\\+86)?1[3456789]\\d{9}");
public final static Pattern CITIZEN_ID = Pattern.compile("[1-9]\\d{5}[1-2]\\d{3}((0\\d)|(1[0-2]))(([012]\\d)|3[0-1])\\d{3}(\\d|X|x)");
public final static Pattern ZIP_CODE = Pattern.compile("[1-9]\\d{5}(?!\\d)");
public final static Pattern BIRTHDAY = Pattern.compile("^(\\d{2,4})([/\\-.年]?)(\\d{1,2})([/\\-.月]?)(\\d{1,2})日?$");
public static final Pattern TIME = Pattern.compile("\\d{1,2}:\\d{1,2}(:\\d{1,2})?");
public final static Pattern URL = Pattern.compile("[a-zA-z]+://[^\\s]*");
public final static Pattern URL_HTTP = Pattern.compile("(https://|http://)?([\\w-]+\\.)+[\\w-]+(:\\d+)*(/[\\w- ./?%&=]*)?");
public final static Pattern UUID = Pattern.compile("^[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12}$");
public final static Pattern UUID_SIMPLE = Pattern.compile("^[0-9a-z]{32}$");
public final static Pattern MONEY = Pattern.compile("^(\\d+(?:\\.\\d+)?)$");
public static final Pattern MAC_ADDRESS = Pattern.compile("((?:[A-F0-9]{1,2}[:-]){5}[A-F0-9]{1,2})|(?:0x)(\\d{12})(?:.+ETHER)", Pattern.CASE_INSENSITIVE);
public static final Pattern HEX = Pattern.compile("^[a-f0-9]+$", Pattern.CASE_INSENSITIVE);
public final static Pattern PLATE_NUMBER = Pattern.compile(
"^(([京津沪渝冀豫云辽黑湘皖鲁新苏浙赣鄂桂甘晋蒙陕吉闽贵粤青藏川宁琼使领][A-Z](([0-9]{5}[ABCDEFGHJK])|([ABCDEFGHJK]([A-HJ-NP-Z0-9])[0-9]{4})))|" +
"([京津沪渝冀豫云辽黑湘皖鲁新苏浙赣鄂桂甘晋蒙陕吉闽贵粤青藏川宁琼使领]\\d{3}\\d{1,3}[领])|" +
"([京津沪渝冀豫云辽黑湘皖鲁新苏浙赣鄂桂甘晋蒙陕吉闽贵粤青藏川宁琼使领][A-Z][A-HJ-NP-Z0-9]{4}[A-HJ-NP-Z0-9挂学警港澳使领]))$");
public static final Pattern CREDIT_CODE = Pattern.compile("^[0-9A-HJ-NPQRTUWXY]{2}\\d{6}[0-9A-HJ-NPQRTUWXY]{10}$");
public final static Pattern IPV4 = Pattern.compile("\\b((?!\\d\\d\\d)\\d+|1\\d\\d|2[0-4]\\d|25[0-5])\\.((?!\\d\\d\\d)\\d+|1\\d\\d|2[0-4]\\d|25[0-5])\\.((?!\\d\\d\\d)\\d+|1\\d\\d|2[0-4]\\d|25[0-5])\\.((?!\\d\\d\\d)\\d+|1\\d\\d|2[0-4]\\d|25[0-5])\\b");
public final static Pattern IPV6 = Pattern.compile("(([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]+|::(ffff(:0{1,4})?:)?((25[0-5]|(2[0-4]|1?[0-9])?[0-9])\\.){3}(25[0-5]|(2[0-4]|1?[0-9])?[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1?[0-9])?[0-9])\\.){3}(25[0-5]|(2[0-4]|1?[0-9])?[0-9]))");
6. java中如何使用正则表达式
6.1 示例
public static void main(String[] args) {
String str = "abc123";
String regex = "\\w{3,}";
boolean b1 = str.matches(regex);
System.out.println("直接匹配:" + b1);
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
boolean b2 = m.matches();
System.out.println("构造匹配:" + b2);
String input = "hello <b>Regular</b> <i>Expression</i>";
String regex1 = "<\\w+>|</\\w+>";
String output = input.replaceAll(regex1, "");
System.out.println("替换过滤:" + output);
}
6.2 运行结果
直接匹配:true
构造匹配:true
替换过滤:hello Regular Expression
7. 参考链接
- 正则表达式基本语法总结
- 正则表达式个人总结(一):正则表达式语法的深入理解第一部分