Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
408 views
in Technique[技术] by (71.8m points)

c# - how to check uniqueness (non duplication) of a post in an rss feed

when retrieving and caching/saving (in a database) some posts from an rss feed, how to determine that:

  1. it is the same post (example: when some typos are fixed in the feed or if the title changes, the date changes, etc...)
  2. find feeds that talk about the same topic (example: same story from different sources)

are there any best practices for these things?

thnx a lot

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Some RSS feeds have a guid element as an identifier. Posts with a shared guid are probably duplicates. Some RSS feeds just stuff the URL in there to indicate that a post's uniqueness is tied to its url. Note that if the URL matches but the Guid does not, this may indicate that the posts are not duplicates. If a feed does not maintain an archive, the url might not change. This situation is probably pretty rare.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...