Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
166 views
in Technique[技术] by (71.8m points)

c# - how does except method work in linq

I have the classes:

class SomeClass
{
   public string Name{get;set;}
   public int SomeInt{get;set;}
}


class SomeComparison: IEqualityComparer<SomeClass>
{
     public bool Equals(SomeClass s, SomeClass d)
     {
         return s.Name == d.Name;
     }

     public int GetHashCode(SomeClass a)
     {
         return (a.Name.GetHashCode() * 251);
     }
}

I also have two large List<SomeClass> called list1 and list2

before I used to have:

 var q = (from a in list1
         from b in list2
         where a.Name != b.Name
         select a).ToList();

and that took about 1 minute to execute. Now I have:

var q =  list1.Except(list2,new SomeComparison()).ToList();

and that takes less than 1 second!

I will like to understand what does the Except method do. Does the method creates a hash table of each list and then perform the same comparison? If I will be performing a lot of this comparisons should I create a Hashtable instead?


EDIT

Now instead of having lists I have two HashSet<SomeClass> called hashSet1 and hashSet2

when I do:

   var q = (from a in hashSet1
           form b in hashSet2
           where a.Name != b.Name
           select a).ToList();

that still takes a long time... What am I doing wrong?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your guess was close - the Linq to Objects Except extension method uses a HashSet<T> internally for the second sequence passed in - that allows it to look up elements in O(1) while iterating over the first sequence to filter out elements that are contained in the second sequence, hence the overall effort is O(n+m) where n and m are the length of the input sequences - this is the best you can hope to do since you have to look at each element at least once.

For a review of how this might be implemented I recommend Jon Skeet's EduLinq series, here part of it's implementation of Except and the link to the full chapter:

private static IEnumerable<TSource> ExceptImpl<TSource>(
    IEnumerable<TSource> first,
    IEnumerable<TSource> second,
    IEqualityComparer<TSource> comparer)
{
    HashSet<TSource> bannedElements = new HashSet<TSource>(second, comparer);
    foreach (TSource item in first)
    {
        if (bannedElements.Add(item))
        {
            yield return item;
        }
    }
}

Your first implementation on the other hand will compare each element in the first list to each element in the second list - it is performing a cross product. This will require nm operations so it will run in O(nm) - when n and m become large this becomes prohibitively slow very fast. (Also this solution is wrong as is since it will create duplicate elements).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...