Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
494 views
in Technique[技术] by (71.8m points)

regex - Escaping special characters in Java Regular Expressions

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?

This would be very handy in dynamically building a regular expression, without having to manually escape each individual character.

For example, consider a simple regex like d+.d+ that matches numbers with a decimal point like 1.2, as well as the following code:

String digit = "d";
String point = ".";
String regex1 = "\d+\.\d+";
String regex2 = Pattern.quote(digit + "+" + point + digit + "+");

Pattern numbers1 = Pattern.compile(regex1);
Pattern numbers2 = Pattern.compile(regex2);

System.out.println("Regex 1: " + regex1);

if (numbers1.matcher("1.2").matches()) {
    System.out.println("Match");
} else {
    System.out.println("No match");
}

System.out.println("Regex 2: " + regex2);

if (numbers2.matcher("1.2").matches()) {
    System.out.println("Match");
} else {
    System.out.println("No match");
}

Not surprisingly, the output produced by the above code is:

Regex 1: d+.d+
    Match
Regex 2: Qd+.d+E
    No match

That is, regex1 matches 1.2 but regex2 (which is "dynamically" built) does not (instead, it matches the literal string d+.d+).

So, is there a method that would automatically escape each regex meta-character?

If there were, let's say, a static escape() method in java.util.regex.Pattern, the output of

Pattern.escape('.')

would be the string ".", but

Pattern.escape(',')

should just produce ",", since it is not a meta-character. Similarly,

Pattern.escape('d')

could produce "d", since 'd' is used to denote digits (although escaping may not make sense in this case, as 'd' could mean literal 'd', which wouldn't be misunderstood by the regex interpeter to be something else, as would be the case with '.').

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Is there any method in Java or any open source library for escaping (not quoting) a special character (meta-character), in order to use it as a regular expression?

If you are looking for a way to create constants that you can use in your regex patterns, then just prepending them with "" should work but there is no nice Pattern.escape('.') function to help with this.

So if you are trying to match "\d" (the string d instead of a decimal character) then you would do:

// this will match on d as opposed to a decimal character
String matchBackslashD = "\\d";
// as opposed to
String matchDecimalDigit = "\d";

The 4 slashes in the Java string turn into 2 slashes in the regex pattern. 2 backslashes in a regex pattern matches the backslash itself. Prepending any special character with backslash turns it into a normal character instead of a special one.

matchPeriod = "\.";
matchPlus = "\+";
matchParens = "\(\)";
... 

In your post you use the Pattern.quote(string) method. This method wraps your pattern between "\Q" and "\E" so you can match a string even if it happens to have a special regex character in it (+, ., \d, etc.)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...