Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
176 views
in Technique[技术] by (71.8m points)

c# - Remove text in-between delimiters in a string (using a regex?)

Consider the requirement to find a matched pair of set of characters, and remove any characters between them, as well as those characters/delimiters.

Here are the sets of delimiters:

 []    square brackets
 ()    parentheses
 ""    double quotes
 ''    single quotes

Here are some examples of strings that should match:

 Given:                       Results In:
-------------------------------------------
 Hello "some" World           Hello World
 Give [Me Some] Purple        Give Purple
 Have Fifteen (Lunch Today)   Have Fifteen
 Have 'a good'day             Have day

And some examples of strings that should not match:

 Does Not Match:
------------------
 Hello "world
 Brown]co[w
 Cheese'factory

If the given string doesn't contain a matching set of delimiters, it isn't modified. The input string may have many matching pairs of delimiters. If a set of 2 delimiters are overlapping (i.e. he[llo "worl]d"), that'd be an edge case that we can ignore here.

The algorithm would look something like this:

string myInput = "Give [Me Some] Purple (And More) Elephants";
string pattern; //some pattern
string output = Regex.Replace(myInput, pattern, string.Empty);

Question: How would you achieve this with C#? I am leaning towards a regex.

Bonus: Are there easy ways of matching those start and end delimiters in constants or in a list of some kind? The solution I am looking for would be easy to change the delimiters in case the business analysts come up with new sets of delimiters.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Simple regex would be:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "(\[.*\])|(".*")|('.*')|(\(.*\))";
string output = Regex.Replace(input, regex, "");

As for doing it a custom way where you want to build up the regex you would just need to build up the parts:

('.*')  // example of the single quote check

Then have each individual regex part concatenated with an OR (the | in regex) as in my original example. Once you have your regex string built just run it once. The key is to get the regex into a single check because performing a many regex matches on one item and then iterating through a lot of items will probably see a significant decrease in performance.

In my first example that would take the place of the following line:

string input = "Give [Me Some] Purple (And More) Elephants";
string regex = "Your built up regex here";
string sOutput = Regex.Replace(input, regex, "");

I am sure someone will post a cool linq expression to build the regex based on an array of delimiter objects to match or something.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...