Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
184 views
in Technique[技术] by (71.8m points)

database - How to create my own recommendation engine?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I've built up one for a video portal myself. The main idea that I had was about collecting data about everything:

  • Who uploaded a video?
  • Who commented on a video?
  • Which tags where created?
  • Who visited the video? (also tracking anonymous visitors)
  • Who favorited a video?
  • Who rated a video?
  • Which channels was the video assigned to?
  • Text streams of title, description, tags, channels and comments are collected by a fulltext indexer which puts weight on each of the data sources.

Next I created functions which return lists of (id,weight) tuples for each of the above points. Some only consider a limited amount of videos (eg last 50), some modify the weight by eg rating, tag count (more often tagged = less expressive). There are functions that return the following lists:

  • Similar videos by fulltext search
  • Videos uploaded by the same user
  • Other videos the users from these comments also commented on
  • Other videos the users from these favorites also favorited
  • Other videos the raters from these ratings also rated on (weighted)
  • Other videos in the same channels
  • Other videos with the same tags (weighted by "expressiveness" of tags)
  • Other videos played by people who played this video (XY latest plays)
  • Similar videos by comments fulltext
  • Similar videos by title fulltext
  • Similar videos by description fulltext
  • Similar videos by tags fulltext

All these will be combined into a single list by just summing up the weights by video ids, then sorted by weight. This works pretty well for around 1000 videos now. But you need to do background processing or extreme caching for this to be speedy.

I'm hoping that I can reduce this to a generic recommendation engine or similarity calculator soon and release as a rails/activerecord plugin. Currently it's still a well integrated part of my project.

To give a small hint, in ruby code it looks like this:

def related_by_tags
  tag_names.find(:all, :include => :videos).inject([]) { |result,t|
    result + t.video_ids.map { |v|
      [v, TAG_WEIGHT / (0.1 + Math.log(t.video_ids.length) / Math.log(2))]
    }
  }
end

I would be interested on how other people solve such algorithms.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...