Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
194 views
in Technique[技术] by (71.8m points)

.net - Complex Regular expression for two consecutive words or a single word. C#

I have a list of every city in the world in my Database, and have an application written in C# that needs to search an incoming string to determine whether any of my cities exist in that string. However, I'm having issues figuring out the Reg pattern because some cities are TWO words like "San Francisco". Thanks for any help in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Probably the easiest way is to create an array of all your cities in memory (select name from cities) and then use regex or simple string methods to see if these cities are found in the text.

 List<string> cities = GetCitiesFromDatabase(); // need to implement this yourself
 string text = @"the text containign city names such as Amsterdam and San Francisco";

 bool containsACity = cities.Any(city => text.Contains(city)); //To search case insensitive, add StringComparison.CurrentCultureIgnoreCase
 IEnumerable<string> containedCities = cities.Where(city => text.Contains(city));

To ensure that 'Amsterdam' wouldn't match on 'Amsterdamned', you could use a regular expression instead of Contains:

 bool containsACity = cities.Any(city => Regex.IsMatch(text, @""+Regex.Escape(city))+@"")
 // Add RegexOptions.IgnoreCase for case insensitive matches.
 IEnumerable<string> containedCities = cities.Where(city => Regex.IsMatch(text, @""+Regex.Escape(city))+@"");

Alternatively, you can build a large regular expression to search for any city and execute that once:

 string regex = @"(?:" + String.Join("|", cities.Select(city => Regex.Escape(city)).ToArray()) + @")"
 bool containsACity = Regex.IsMatch(text, regex, RegexOptions.IgnoreCase);
 IEnumerable<string> containedCities = Regex.Matches(text, regex, RegexOptions.IgnoreCase).Cast<Match>().Select(m => m.Value);

You can improve the performance of these calls by caching the list of cities or caching the regular expression (and improve even further by creating a static readonly Regex object with RegexOptions.Compiled).

Another solution would be to calculate this in the database, instead of storing a local list of cities in memory, send the input to the database and use a LIKE statement or Regex inside the database to compare the list of cities against the text. Depending on the number of cities and the size of the text this might be a faster solution, but whether or not this is possible depends on the database being used.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...