Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
815 views
in Technique[技术] by (71.8m points)

c# - Regular expression for recognizing url

I want to create a Regex for url in order to get all links from input string. The Regex should recognize the following formats of the url address:

  • http(s)://www.webpage.com
  • http(s)://webpage.com
  • www.webpage.com

and also the more complicated urls like: - http://www.google.pl/#sclient=psy&hl=pl&site=&source=hp&q=regex+url&pbx=1&oq=regex+url&aq=f&aqi=g1&aql=&gs_sm=e&gs_upl=1582l3020l0l3199l9l6l0l0l0l0l255l1104l0.2.3l5l0&bav=on.2,or.r_gc.r_pw.&fp=30a1604d4180f481&biw=1680&bih=935

I have the following one

((www.|https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\))+[wd:#@%/;$()~_?+-=\.&]*)

but it does not recognize the following pattern: www.webpage.com. Can someone please help me to create an appropriate Regex?

EDIT: It should works to find an appropriate link and moreover place a link in an appropriate index like this:

private readonly Regex RE_URL = new Regex(@"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\))+[wd:#@%/;$()~_?+-=\.&]*)", RegexOptions.Multiline);
foreach (Match match in (RE_URL.Matches(new_text)))
            {
                // Copy raw string from the last position up to the match
                if (match.Index != last_pos)
                {
                    var raw_text = new_text.Substring(last_pos, match.Index - last_pos);
                    text_block.Inlines.Add(new Run(raw_text));
                }

                // Create a hyperlink for the match
                var link = new Hyperlink(new Run(match.Value))
                {
                    NavigateUri = new Uri(match.Value)
                };
                link.Click += OnUrlClick;

                text_block.Inlines.Add(link);

                // Update the last matched position
                last_pos = match.Index + match.Length;
            }
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I don't know why your result in match is only http:// but I cleaned your regex a bit

((?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\)(?:www.)?|www.)[wd:#@%/;$()~_?+,-=\.&]+)

(?:) are non capturing groups, that means there is only one capturing group left and this contains the complete matched string.

(?:(?:https?|ftp|gopher|telnet|file|notes|ms-help):(?://|\\)(?:www.)?|www.) The link has now to start with something fom the first list followed by an optional www. or with an www.

[wd:#@%/;$()~_?+,-=\.&] I added a comma to the list (otherwise your long example does not match) escaped the - (you were creating a character range) and unescaped the . (not needed in a character class.

See this here on Regexr, a useful tool to test regexes.

But URL matching is not a simple task, please see this question here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...