Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
342 views
in Technique[技术] by (71.8m points)

python - Deciding and implementing a trending algorithm in Django

I have a Django application in which I need to implement a simple trending/ranking algorithm. I'm very lost as a :

I have two models, Book and Reader. Every night, new books are added to my database. The number of readers for each book are updated too every night i.e. One book will have multiple reader statistic records (one record for each day).

Over a given period (past week, past month or past year), I would like to list the most popular books, what algorithm should I use for this?

The popularity doesn't need to be realtime in any way because the reader count for each book is only updated daily.

I found one article which was referenced in another SO post that showed how they calculated trending Wikipedia articles but the post only showed how the current trend was calculated.

As someone pointed out on SO, it is a very simple baseline trend algorithm and only calculates the slope between two data points so I guess it shows the trend between yesterday and today.

I'm not looking for a uber complex trending algorithm like those used on Hacker News, Reddit, etc.

I have only two data axes, reader count and date.

Any ideas on what and how I should implement. For someone who's never worked with anything statistics/algorithm related, this seems to be a very daunting undertaking.

Thanks in advance everyone.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Probably the simplest possible trending "algorithm" I can think of is the n-day moving average. I'm not sure how your data is structured, but say you have something like this:

books = {'Twilight': [500, 555, 580, 577, 523, 533, 556, 593],
         'Harry Potter': [650, 647, 653, 642, 633, 621, 625, 613],
         'Structure and Interpretation of Computer Programs': [1, 4, 15, 12, 7, 3, 8, 19]
        }

A simple moving average just takes the last n values and averages them:

def moving_av(l, n):
    """Take a list, l, and return the average of its last n elements.
    """
    observations = len(l[-n:])
    return sum(l[-n:]) / float(observations)

The slice notation simply grabs the tail end of the list, starting from the nth to last variable. A moving average is a fairly standard way to smooth out any noise that a single spike or dip could introduce. The function could be used like so:

book_scores = {}
for book, reader_list in books.iteritems():
    book_scores[book] = moving_av(reader_list, 5)

You'll want to play around with the number of days you average over. And if you want to emphasize recent trends you can also look at using something like a weighted moving average.

If you wanted to focus on something that looks less at absolute readership and focuses instead on increases in readership, simply find the percent change in the 30-day moving average and 5-day moving average:

d5_moving_av = moving_av(reader_list, 5)
d30_moving_av = moving_av(reader_list, 30)
book_score = (d5_moving_av - d30_moving_av) / d30_moving_av

With these simple tools you have a fair amount of flexibility in how much you emphasize past trends and how much you want to smooth out (or not smooth out) spikes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...