Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
149 views
in Technique[技术] by (71.8m points)

c# - how to replace characters in a array quickly

I am using a XML Text reader on a XML file that may contain characters that are invalid for the reader. My initial thought was to create my own version of the stream reader and clean out the bad characters but it is severely slowing down my program.

public class ClensingStream : StreamReader
{
        private static char[] badChars = { 'x00', 'x09', 'x0A', 'x10' };
    //snip
        public override int Read(char[] buffer, int index, int count)
        {
            var tmp = base.Read(buffer, index, count);

            for (int i = 0; i < buffer.Length; ++i)
            {
                //check the element in the buffer to see if it is one of the bad characters.
                if(badChars.Contains(buffer[i]))
                    buffer[i] = ' ';
            }

            return tmp;
        }
}

according to my profiler the code is spending 88% of its time in if(badChars.Contains(buffer[i])) what is the correct way to do this so I am not causing horrible slowness?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The reason that it spends so much time in that line is because the Contains method loops through the array to look for the character.

Put the characters in a HashSet<char> instead:

private static HashSet<char> badChars =
  new HashSet<char>(new char[] { 'x00', 'x09', 'x0A', 'x10' });

The code to check if the set contains the character looks the same as when looking in the array, but it uses the hash code of the character to look for it instead of looping through all the items in the array.

Alternatively, you could put the characters in a switch, that way the compiler would create an efficient comparison:

switch (buffer[i]]) {
  case 'x00':
  case 'x09':
  case 'x0A':
  case 'x10': buffer[i] = ' '; break;
}

If you have more characters (five or six IIRC), the compiler will actually create a hash table to look up the cases, so that would be similar to using a HashSet.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...