For the purposes of the discussion below, I'll assume that you have some way of reading the CSV files into a class. I'll call that class MyRecord
.
Load the files into separate lists, call them NewList
and OldList
:
List<MyRecord> NewList = LoadFile("newFilename");
List<MyRecord> OldList = LoadFile("oldFilename");
There's perhaps a more elegant way to do this with LINQ, but the idea is to do a straight merge. First you have to sort the two lists. Either your MyRecord
class implements IComparable
, or you supply your own comparison delegate:
NewList.Sort(/* delegate here */);
OldList.Sort(/* delegate here */);
You can skip the delegate if MyRecord
implements IComparable
.
Now it's a straight merge.
int ixNew = 0;
int ixOld = 0;
while (ixNew < NewList.Count && ixOld < OldList.Count)
{
// Again with the comparison delegate.
// I'll assume that MyRecord implements IComparable
int cmpRslt = OldList[ixOld].CompareTo(NewList[ixNew]);
if (cmpRslt == 0)
{
// records have the same customer id.
// compare for changes.
++ixNew;
++ixOld;
}
else if (cmpRslt < 0)
{
// this old record is not in the new file. It's been deleted.
++ixOld;
}
else
{
// this new record is not in the old file. It was added.
++ixNew;
}
}
// At this point, one of the lists might still have items.
while (ixNew < NewList.Count)
{
// NewList[ixNew] is an added record
++ixNew;
}
while (ixOld < OldList.Count)
{
// OldList[ixOld] is a deleted record
}
With just 120,000 records, that should execute very quickly. I would be very surprised if doing the merge took as long as loading the data from disk.
EDIT: A LINQ solution
I pondered how one would do this with LINQ. I can't do exactly the same thing as the merge above, but I can get the added, removed, and changed items in separate collections.
For this to work, MyRecord
will have to implement IEquatable<MyRecord>
and also override GetHashCode
.
var AddedItems = NewList.Except(OldList);
var RemovedItems = OldList.Except(NewList);
var OldListLookup = OldList.ToLookup(t => t.Id);
var ItemsInBothLists =
from newThing in NewList
let oldThing = OldListLookup[newThing.Id].FirstOrDefault()
where oldThing != null
select new { oldThing = oldThing, newThing = newThing };
In the above, I assume that MyRecord
has an Id
property that is unique.
If you want just the changed items instead of all the items that are in both lists:
var ChangedItems =
from newThing in NewList
let oldThing = OldListLookup[newThing.Id].FirstOrDefault()
where oldThing != null && CompareItems(oldThing, newThing) != 0
select new { oldThing = oldThing, newThing = newThing };
The assumption is that the CompareItems
method will do a deep comparison of the two items and return 0 if they compare equal or non-zero if something has changed.