Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
212 views
in Technique[技术] by (71.8m points)

c++ - Implementation of a "hits in last [second/minute/hour]" data structure

I think this is a fairly common question but I can't seem to find answer by googling around (maybe there's a more precise name for the problem I don't know?)

You need to implement a structure with a "hit()" method used to report a hit and hitsInLastSecond|Minute|Hour methods. You have a timer with say nanosecond accuracy. How do you implement this efficiently?

My thought was something like this (in psuedo-C++)

class HitCounter {
  void hit() {
    hits_at[now()] = ++last_count;
  }

  int hitsInLastSecond() {
    auto before_count = hits_at.lower_bound(now() - 1 * second)
    if (before_count == hits_at.end()) { return last_count; }
    return last_count - before_count->second;
  }

  // etc for Minute, Hour

  map<time_point, int> hits_at;
  int last_count = 0;
};

Does this work? Is it good? Is something better?

Update: Added pruning and switched to a deque as per comments:

class HitCounter {
  void hit() {
    hits.push_back(make_pair(now(), ++last_count));
  }

  int hitsInLastSecond() {
    auto before = lower_bound(hits.begin(), hits.end(), make_pair(now() - 1 * second, -1));
    if (before == hits.end()) { return last_count; }
    return last_count - before_count->second;
  }

  // etc for Minute, Hour

  void prune() {
    auto old = upper_bound(hits.begin(). hits.end(), make_pair(now - 1 * hour, -1));
    if (old != hits.end()) {
      hits.erase(hits.begin(), old)
    }
  }

  deqeue<pair<time_point, int>> hits;
  int last_count = 0;
};
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What you are describing is called a histogram.

Using a hash, if you intend nanosecond accuracy, will eat up much of your cpu. You probably want a ring buffer for storing the data.

Use std::chrono to achieve the timing precision you require, but frankly hits per second seems like the highest granularity you need and if you are looking at the overall big picture, it doesn't seem like it will matter terribly what the precision is.

This is a partial, introductory sample of how you might go about it:

#include <array>
#include <algorithm>

template<size_t RingSize>
class Histogram
{
    std::array<size_t, RingSize> m_ringBuffer;
    size_t m_total;
    size_t m_position;
public:
    Histogram() : m_total(0)
    {
        std::fill_n(m_ringBuffer.begin(), RingSize, 0);
    }

    void addHit()
    {
        ++m_ringBuffer[m_position];
        ++m_total;
    }

    void incrementPosition()
    {
        if (++m_position >= RingSize)
            m_position = 0;
        m_total -= m_ringBuffer[m_position];
        m_ringBuffer[m_position] = 0;
    }

    double runningAverage() const
    {
        return (double)m_total / (double)RingSize;
    }

    size_t runningTotal() const { return m_total; }
};

Histogram<60> secondsHisto;
Histogram<60> minutesHisto;
Histogram<24> hoursHisto;
Histogram<7> weeksHisto;

This is a naive implementation which assumes you will call it every second and increment the position, and will transpose runningTotal from one histogram to the next every RingSize (so every 60s, add secondsHisto.runningTotal to minutesHisto).

Hopefully it will be a useful introductory place to start from.

If you want to track a longer histogram of hits per second, you can do that with this model, by increasing the ring size, add a second total to track the last N ring buffer entries, so that m_subTotal = sum(m_ringBuffer[m_position - N .. m_position]), similar to the way m_total works.

size_t m_10sTotal;

...

void addHit()
{
    ++m_ringBuffer[m_position];
    ++m_total;
    ++m_10sTotal;
}

void incrementPosition()
{
    // subtract data from >10 sample intervals ago.
    m_10sTotal -= m_ringBuffer[(m_position + RingBufferSize - 10) % RingBufferSize];
    // for the naive total, do the subtraction after we
    // advance position, since it will coincide with the
    // location of the value RingBufferSize ago.
    if (++m_position >= RingBufferSize)
        m_position = 0;
    m_total -= m_ringBuffer[m_position];
}

You don't have to make the histo grams these sizes, this is simply a naive scraping model. There are various alternatives, such as incrementing each histogram at the same time:

secondsHisto.addHit();
minutesHisto.addHit();
hoursHisto.addHit();
weeksHisto.addHit();

Each rolls over independently, so all have current values. Size each histo as far as you want data at that granularity to go back.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...