I would read the whole shared string table into your dictionary in one go rather than looking up each value as required. This will allow you to move through the file in order and stash the values ready for a hashed lookup which will be more efficient than scanning the SST for each value you require.
Running something like the following at the start of your process will allow you to access each value using dictionary[parsedValue]
.
private static void LoadDictionary()
{
int i = 0;
foreach (var ss in sharedStringTablePart.SharedStringTable.ChildElements)
{
dictionary.Add(i++, ss.InnerText);
}
}
If your file is very large, you might see some gains using a SAX approach to read the file rather than the DOM approach above:
private static void LoadDictionarySax()
{
using (OpenXmlReader reader = OpenXmlReader.Create(sharedStringTablePart))
{
int i = 0;
while (reader.Read())
{
if (reader.ElementType == typeof(SharedStringItem))
{
SharedStringItem ssi = (SharedStringItem)reader.LoadCurrentElement();
dictionary.Add(i++, ssi.Text != null ? ssi.Text.Text : string.Empty);
}
}
}
}
On my machine, using a file with 60000 rows and 2 columns it was around 300 times quicker using the LoadDictionary
method above instead of the GetValue
method from your question. The LoadDictionarySax
method gave similar performance but on a larger file (100000 rows with 10 columns) the SAX approach was around 25% faster than the LoadDictionary
method. On an even larger file (100000 rows, 26 columns), the LoadDictionary
method threw an out of memory exception but the LoadDictionarySax
worked without issue.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…