Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
353 views
in Technique[技术] by (71.8m points)

c# HashSet init takes too long

I'm dealing with the fact that I need to init a HashSet with a set of elements but without any kind of comparation class.

After the init, any element added to the HashSet need to be passed with a comparator.

How can I accomplish it?

Now I have this:

HashSet<Keyword> set = new HashSet<Keyword>(new KeyWordComparer());

The problem is that the init takes to long and there's no necessity in applying the comparation.

KeywordComparer Class:

class KeyWordComparer : EqualityComparer<Keyword>
{
    public override bool Equals(Keyword k1, Keyword k2)
    {
        int equals = 0;
        int i = 0;
        int j = 0;

        // based on sorted ids
        while (i < k1._lista_modelos.Count && j < k2._lista_modelos.Count)
        {
            if (k1._lista_modelos[i] < k2._lista_modelos[j])
            {
                i++;
            }
            else if (k1._lista_modelos[i] > k2._lista_modelos[j])
            {
                j++;
            }
            else
            {
                equals++;
                i++;
                j++;
            }
         }

         return equals >= 8;
    }
    public override int GetHashCode(Keyword keyword)
    {
        return 0;//notice that using the same hash for all keywords gives you an O(n^2) time complexity though.
    }
}

Note: This is a follow-up question to c# comparing list of IDs.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

every keyword has 20 IDs, so when I want to add a new Keyword to the HashSet, the KeywordComparer check that the new one does not have more than 8 ID's repeated with any keyword of the HashSet.In such case, new keyword is not included, if not, it's included.

Collecting these keywords is not a job for a hash set here. A hash set is generally not suited for items which depend on other elements of the set. You should only use it for things where a useful hash can be calculated for every item. Since it depends on the existing set of items whether a new item gets added to your set, this is totally the wrong tool.

Here’s an attempt to solve this problem according to your short description of what you actually want to do. Here, we are simply collecting the keywords in a list. In order to verify that they may be added, we use an addition hash set to collect the ids of the keywords. That way, we can quickly check for a new item, whether 8 or more of its ids are already contained within the list of keywords.

var keywords = new List<Keyword>();
var selectedIds = new HashSet<int>(); // I’m assuming that the ids are ints here

foreach (var keyword in GetListOfAllKeywords())
{
    // count the number of keyword ids that are already in the selectedIds set
    var duplicateIdCount = keyword.Ids.Count(id => selectedIds.Contains(id));

    if (duplicateIdCount <= 8)
    {
        // less or equal to 8 ids are already selected, so add this keyword
        keywords.Add(keyword);

        // and collect all the keyword’s ids
        selectedIds.AddRange(keyword.Ids);
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...