Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
306 views
in Technique[技术] by (71.8m points)

c# - Regex to remove all special characters from string?

I'm completely incapable of regular expressions, and so I need some help with a problem that I think would best be solved by using regular expressions.

I have list of strings in C#:

List<string> lstNames = new List<string>();
lstNames.add("TRA-94:23");
lstNames.add("TRA-42:101");
lstNames.add("TRA-109:AD");

foreach (string n in lstNames) {
  // logic goes here that somehow uses regex to remove all special characters
  string regExp = "NO_IDEA";
  string tmp = Regex.Replace(n, regExp, "");
}

I need to be able to loop over the list and return each item without any special characters. For example, item one would be "TRA9423", item two would be "TRA42101" and item three would be TRA109AD.

Is there a regular expression that can accomplish this for me?

Also, the list contains more than 4000 items, so I need the search and replace to be efficient and quick if possible.

EDIT: I should have specified that any character beside a-z, A-Z and 0-9 is special in my circumstance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It really depends on your definition of special characters. I find that a whitelist rather than a blacklist is the best approach in most situations:

tmp = Regex.Replace(n, "[^0-9a-zA-Z]+", "");

You should be careful with your current approach because the following two items will be converted to the same string and will therefore be indistinguishable:

"TRA-12:123"
"TRA-121:23"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...