Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
112 views
in Technique[技术] by (71.8m points)

c# - Replacing multiple characters in a string, the fastest way?

I am importing some number of records with multiple string fields from an old db to a new db. It seems to be very slow and I suspect it's because I do this:

foreach (var oldObj in oldDB)
{
    NewObject newObj = new NewObject();
    newObj.Name = oldObj.Name.Trim().Replace('^', '?').Replace('@', '?').Replace('[', '?')
        .Replace(']', '?').Replace('`', '?').Replace('}', '?')
        .Replace('~', '?').Replace('{', '?').Replace('\', '?');
    newObj.Surname = oldObj.Surname.Trim().Replace('^', '?').Replace('@', '?').Replace('[', '?')
        .Replace(']', '?').Replace('`', '?').Replace('}', '?')
        .Replace('~', '?').Replace('{', '?').Replace('\', '?');
    newObj.Address = oldObj.Address.Trim().Replace('^', '?').Replace('@', '?').Replace('[', '?')
        .Replace(']', '?').Replace('`', '?').Replace('}', '?')
        .Replace('~', '?').Replace('{', '?').Replace('\', '?');
    newObj.Note = oldObj.Note.Trim().Replace('^', '?').Replace('@', '?').Replace('[', '?')
        .Replace(']', '?').Replace('`', '?').Replace('}', '?')
        .Replace('~', '?').Replace('{', '?').Replace('\', '?');
    /*
    ... some processing ...
    */
}

Now, I have read some posts and articles through the Net where I have seen many different thoughts about this. Some say it's better if I'd do regex with MatchEvaluator, some say it's the best to leave it as is.

While it's possible that it'd be easier for me to just do a benchmark case for myself, I decided to ask a question here in case someone else has been wondering about the same question, or in case someone knows in advance.

So what is the fastest way to do this in C#?

EDIT

I have posted the benchmark here. At the first sight it looks like Richard's way might be the fastest. However, his way, nor Marc's, would do anything because of the wrong Regex pattern. After correcting the pattern from

@"^@[]`}~{" 

to

@"^|@|[|]|`|}|~|{|" 

it appears as if the old way with chained .Replace() calls is the fastest after all

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Thanks for your inputs guys. I wrote a quick and dirty benchmark to test your inputs. I have tested parsing 4 strings with 500.000 iterations and have done 4 passes. The result is as follows:

*** Pass 1
Old (Chained String.Replace()) way completed in 814 ms
logicnp (ToCharArray) way completed in 916 ms
oleksii (StringBuilder) way completed in 943 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2551 ms
Richard (Regex w/ MatchEvaluator) way completed in 215 ms
Marc Gravell (Static Regex) way completed in 1008 ms

*** Pass 2
Old (Chained String.Replace()) way completed in 786 ms
logicnp (ToCharArray) way completed in 920 ms
oleksii (StringBuilder) way completed in 905 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2515 ms
Richard (Regex w/ MatchEvaluator) way completed in 217 ms
Marc Gravell (Static Regex) way completed in 1025 ms

*** Pass 3
Old (Chained String.Replace()) way completed in 775 ms
logicnp (ToCharArray) way completed in 903 ms
oleksii (StringBuilder) way completed in 931 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2529 ms
Richard (Regex w/ MatchEvaluator) way completed in 214 ms
Marc Gravell (Static Regex) way completed in 1022 ms

*** Pass 4
Old (Chained String.Replace()) way completed in 799 ms
logicnp (ToCharArray) way completed in 908 ms
oleksii (StringBuilder) way completed in 938 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2592 ms
Richard (Regex w/ MatchEvaluator) way completed in 225 ms
Marc Gravell (Static Regex) way completed in 1050 ms

The code for this benchmark is below. Please review the code and confirm that @Richard has got the fastest way. Note that I haven't checked if outputs were correct, I assumed they were.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace StringReplaceTest
{
    class Program
    {
        static string test1 = "A^@[BCD";
        static string test2 = "E]FGH";
        static string test3 = "ijk`l}m";
        static string test4 = "nopq~{r";

        static readonly Dictionary<char, string> repl =
            new Dictionary<char, string> 
            { 
                {'^', "?"}, {'@', "?"}, {'[', "?"}, {']', "?"}, {'`', "?"}, {'}', "?"}, {'~', "?"}, {'{', "?"}, {'\', "?"} 
            };

        static readonly Regex replaceRegex;

        static Program() // static initializer 
        {
            StringBuilder pattern = new StringBuilder().Append('[');
            foreach (var key in repl.Keys)
                pattern.Append(Regex.Escape(key.ToString()));
            pattern.Append(']');
            replaceRegex = new Regex(pattern.ToString(), RegexOptions.Compiled);
        }

        public static string Sanitize(string input)
        {
            return replaceRegex.Replace(input, match =>
            {
                return repl[match.Value[0]];
            });
        } 

        static string DoGeneralReplace(string input) 
        { 
            var sb = new StringBuilder(input);
            return sb.Replace('^', '?').Replace('@', '?').Replace('[', '?').Replace(']', '?').Replace('`', '?').Replace('}', '?').Replace('~', '?').Replace('{', '?').Replace('\', '?').ToString(); 
        }

        //Method for replacing chars with a mapping 
        static string Replace(string input, IDictionary<char, char> replacementMap)
        {
            return replacementMap.Keys
                .Aggregate(input, (current, oldChar)
                    => current.Replace(oldChar, replacementMap[oldChar]));
        } 

        static void Main(string[] args)
        {
            for (int i = 1; i < 5; i++)
                DoIt(i);
        }

        static void DoIt(int n)
        {
            Stopwatch sw = new Stopwatch();
            int idx = 0;

            Console.WriteLine("*** Pass " + n.ToString());
            // old way
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = test1.Replace('^', '?').Replace('@', '?').Replace('[', '?').Replace(']', '?').Replace('`', '?').Replace('}', '?').Replace('~', '?').Replace('{', '?').Replace('\', '?');
                string result2 = test2.Replace('^', '?').Replace('@', '?').Replace('[', '?').Replace(']', '?').Replace('`', '?').Replace('}', '?').Replace('~', '?').Replace('{', '?').Replace('\', '?');
                string result3 = test3.Replace('^', '?').Replace('@', '?').Replace('[', '?').Replace(']', '?').Replace('`', '?').Replace('}', '?').Replace('~', '?').Replace('{', '?').Replace('\', '?');
                string result4 = test4.Replace('^', '?').Replace('@', '?').Replace('[', '?').Replace(']', '?').Replace('`', '?').Replace('}', '?').Replace('~', '?').Replace('{', '?').Replace('\', '?');
            }
            sw.Stop();
            Console.WriteLine("Old (Chained String.Replace()) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            Dictionary<char, char> replacements = new Dictionary<char, char>();
            replacements.Add('^', '?');
            replacements.Add('@', '?');
            replacements.Add('[', '?');
            replacements.Add(']', '?');
            replacements.Add('`', '?');
            replacements.Add('}', '?');
            replacements.Add('~', '?');
            replacements.Add('{', '?');
            replacements.Add('\', '?');

            // logicnp way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                char[] charArray1 = test1.ToCharArray();
                for (int i = 0; i < charArray1.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test1[i], out newChar))
                        charArray1[i] = newChar;
                }
                string result1 = new string(charArray1);

                char[] charArray2 = test2.ToCharArray();
                for (int i = 0; i < charArray2.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test2[i], out newChar))
                        charArray2[i] = newChar;
                }
                string result2 = new string(charArray2);

                char[] charArray3 = test3.ToCharArray();
                for (int i = 0; i < charArray3.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test3[i], out newChar))
                        charArray3[i] = newChar;
                }
                string result3 = new string(charArray3);

                char[] charArray4 = test4.ToCharArray();
                for (int i = 0; i < charArray4.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test4[i], out newChar))
                        charArray4[i] = newChar;
                }
                string result4 = new string(charArray4);
            }
            sw.Stop();
            Console.WriteLine("logicnp (ToCharArray) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // oleksii way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = DoGeneralReplace(test1);
                string result2 = DoGeneralReplace(test2);
                string result3 = DoGeneralReplace(test3);
                string result4 = DoGeneralReplace(test4);
            }
            sw.Stop();
            Console.WriteLine("oleksii (StringBuilder) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // André Christoffer Andersen way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = Replace(test1, replacements);
                string result2 = Replace(test2, replacements);
                string result3 = Replace(test3, replacements);
                string result4 = Replace(test4, replacements);
            }
            sw.Stop();
            Console.WriteLine("André Christoffer Andersen (Lambda w/ Aggregate) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // Richard way
            sw.Reset();
            sw.Start();
            Regex reg = new Regex(@"^|@|[|]|`|}|~|{|");
            MatchEvaluator eval = match =>
            {
                switch (match.Value)
                {
                    case "^": return "?";
                    case "@": return "?";
                    case "[": return "?";
                    case "]": return "?";
                    case "`": return "?";
                    case "}": return "?";
                    case "~": return "?";
                    case "{": return "?";
                    case "": return "?";
                    default: throw new Exception("Unexpected match!");
                }
            };
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = reg.Replace(test1, eval);
                string result2 = reg.Replace(test2, eval);
                string result3 = reg.Replace(test3, eval);
                string result4 = reg.Replace(test4, eval);
            }
            sw.Stop();
            Console.WriteLine("Richard (Regex w/ MatchEvaluator) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // Marc Gravell way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = Sanitize(test1);
                string result2 = Sanitize(test2);
                string result3 = Sanitize(test3);
                string result4 = Sanitize(test4);
            }
            sw.Stop();
            Console.WriteLine("Marc Gravell (Static Regex) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms
");
        }
    }
}

EDIT June 2020
Since this Q&A is still getting hits, I wanted to update it with additional input from user1664043 using StringBuilder w/ IndexOfAny, this time compiled using .NET Core 3.1, and here are the results:

*** Pass 1
Old (Chained String.Replace()) way completed in 199 ms
logicnp (ToCharArray) way completed in 296 ms
oleksii (StringBuilder) way completed in 416 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 870 ms
Richard (Regex w/ MatchEvaluator) way completed in 1722 ms
Marc Gravell (Static Regex) way completed in 395 ms
user1664043 (StringBuilder w/ IndexOfAny) way completed in 459 ms

*** Pass 2
Old (Chained String.Replace()) way completed in 215 ms
logicnp (ToCharArray) way completed in 239 ms
oleksii (StringBuilder) way completed in 341 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 758 ms
Richard (Regex w/ MatchEvaluator) way completed in 1591 ms
Marc Gravell (Static Regex) way completed in 354 ms
user1664043 (StringBuilder w/ IndexOfAny) way completed in 426 ms

*** Pass 3
Old (Chained String.Replace()) way completed in 199 ms
logicnp (ToCharArray) way completed in 265 ms
oleksii (StringBuilder) way completed in 337 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 817 ms
Richard (Regex w/ MatchEvaluator) way completed in 1666 ms
Marc Gravell (Static Regex) way completed in 373 ms
user1664043 (StringBuilder w/ IndexOfAny) way completed in 412 ms

*** Pass 4
Old (Chained String.Replace()) way completed in 199 ms

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...