Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
816 views
in Technique[技术] by (71.8m points)

c# - Regex to match two or more consecutive characters

Using regular expressions I want to match a word which

  • starts with a letter
  • has english alpahbets
  • numbers, period(.), hyphen(-), underscore(_)
  • should not have two or more consecutive periods or hyphens or underscores
  • can have multiple periods or hyphens or underscore

For example,

flin..stones or flin__stones or flin--stones

are not allowed.

fl_i_stones or fli_st.ones or flin.stones or flinstones

is allowed .

So far My regular expression is ^[a-zA-Z][a-zA-Zd._-]+$

So My question is how to do it using regular expression

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use a lookahead and a backreference to solve this. But note that right now you are requiring at least 2 characters. The starting letter and another one (due to the +). You probably want to make that + and * so that the second character class can be repeated 0 or more times:

^(?!.*(.)1)[a-zA-Z][a-zA-Zd._-]*$

How does the lookahead work? Firstly, it's a negative lookahead. If the pattern inside finds a match, the lookahead causes the entire pattern to fail and vice-versa. So we can have a pattern inside that matches if we do have two consecutive characters. First, we look for an arbitrary position in the string (.*), then we match single (arbitrary) character (.) and capture it with the parentheses. Hence, that one character goes into capturing group 1. And then we require this capturing group to be followed by itself (referencing it with 1). So the inner pattern will try at every single position in the string (due to backtracking) whether there is a character that is followed by itself. If these two consecutive characters are found, the pattern will fail. If they cannot be found, the engine jumps back to where the lookahead started (the beginning of the string) and continue with matching the actual pattern.

Alternatively you can split this up into two separate checks. One for valid characters and the starting letter:

^[a-zA-Z][a-zA-Zd._-]*$

And one for the consecutive characters (where you can invert the match result):

(.)1

This would greatly increase the readability of your code (because it's less obscure than that lookahead) and it would also allow you to detect the actual problem in pattern and return an appropriate and helpful error message.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...