There is a big difference between these two approaches:
List<int> Result1 = new HashSet<int>(myList).ToList(); //3700 ticks
List<int> Result2 = myList.Distinct().ToList(); //4700 ticks
The first one can (will probably) change the order of the elements of the returned List<>
: Result1
elements won't be in the same order of myList
's ones. The second maintains the original ordering.
There is probably no faster way than the first one.
There is probably no "more correct" (for a certain definition of "correct" based on ordering) than the second one.
(the third one is similar to the second one, only slower)
Just out of curiousity, the Distinct()
is:
// Reference source http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,712
public static IEnumerable<TSource> Distinct<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return DistinctIterator<TSource>(source, null);
}
// Reference source http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,722
static IEnumerable<TSource> DistinctIterator<TSource>(IEnumerable<TSource> source, IEqualityComparer<TSource> comparer) {
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource element in source)
if (set.Add(element)) yield return element;
}
So in the end the Distinct()
simply uses an internal implementation of an HashSet<>
(called Set<>
) to check for the uniqueness of items.
For completeness sake, I'll add a link to the question Does C# Distinct() method keep original ordering of sequence intact?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…