Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
221 views
in Technique[技术] by (71.8m points)

c# - Split using delimiter except when delimiter is escaped

I'm reading clipboard data coming from excel using

var stream = (System.IO.Stream) ( Forms.Clipboard.GetDataObject() ).GetData( Forms.DataFormats.CommaSeparatedValue );,

but unfortunately, excel is passing cell text instead of cell values. When the cells are using special formatting (such as the thousands seperator), the clipboard data for a series of cells in columns that looks like this:

 1,234,123.00    2,345.00    342.00      12,345.00

is stored as this:

" 1,234,123.00 "," 2,345.00 ", 342.00 ," 12,345.00 "

when what I really want is this:

 1234123.00, 2345.00, 342.00, 12345.00

I had been previously using the clipData.Split(new string[] { "," }, StringSllitOptions.None)) function to turn my CSV clipboard data into a series of cells, but this fails when there is escaped formatted text containing commas.


I'm asking if anyone can think of a way to split this string into a set of cells, ignoring the commas escaped within the " bits, since this is how Excel is choosing to escape cells containing commas.

In short, how can I turn a single string containing this:

" 1,234,123.00 "," 2,345.00 ", 342.00 ," 12,345.00 "

into an array of strings containing this:

{ "1,234,123.00", "2,345.00", "342.00", "12,345.00" }

Without ruining my ability to parse a simple comma delimited string.

*****edit***

Follow up question (formulated as a DFA) here: Split a string based on each time a Deterministic Finite Automata reaches a final state?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First off I've dealt with data from Excel before and what you typically see is comma separated values and if the value is considered to be a string it will have double quotes around it (and can contain commas and double quotes). If it is considered to be numeric then there are not double quotes. Additionally if the data contains a double quote that will be delimited by a double quote like "". So assuming all of that here's how I've dealt with this in the past

public static IEnumerable<string> SplitExcelRow(this string value)
{
    value = value.Replace("""", "&quot;");
    bool quoted = false;
    int currStartIndex = 0;
    for (int i = 0; i < value.Length; i++)
    {
        char currChar = value[i];
        if (currChar == '"')
        {
            quoted = !quoted;       
        }
        else if (currChar == ',')
        {
            if (!quoted)
            {
                yield return value.Substring(currStartIndex, i - currStartIndex)
                    .Trim()
                    .Replace(""","")
                    .Replace("&quot;",""");
                currStartIndex = i + 1;
            }
        }
    }
    yield return value.Substring(currStartIndex, value.Length - currStartIndex)
        .Trim()
        .Replace(""", "")
        .Replace("&quot;", """);
}

Of course this assumes the data coming in is valid so if you have something like "fo,o"b,ar","bar""foo" this will not work. Additionally if your data contains &quot; then it will be turned into a " which may or may not be desirable.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...