Firstly, I am a complete newbie when it comes to Python. However, I have written a piece of code to look at an RSS feed, open the link and extract the text from the article. This is what I have so far:
from BeautifulSoup import BeautifulSoup
import feedparser
import urllib
# Dictionaries
links = {}
titles = {}
# Variables
n = 0
rss_url = "feed://www.gfsc.gg/_layouts/GFSC/GFSCRSSFeed.aspx?Division=ALL&Article=All&Title=News&Type=doc&List=%7b66fa9b18-776a-4e91-9f80- 30195001386c%7d%23%7b679e913e-6301-4bc4-9fd9-a788b926f565%7d%23%7b0e65f37f-1129-4c78-8f59-3db5f96409fd%7d%23%7bdd7c290d-5f17-43b7-b6fd-50089368e090%7d%23%7b4790a972-c55f-46a5-8020-396780eb8506%7d%23%7b6b67c085-7c25-458d-8a98-373e0ac71c52%7d%23%7be3b71b9c-30ce-47c0-8bfb-f3224e98b756%7d%23%7b25853d98-37d7-4ba2-83f9-78685f2070df%7d%23%7b14c41f90-c462-44cf-a773-878521aa007c%7d%23%7b7ceaf3bf-d501-4f60-a3e4-2af84d0e1528%7d%23%7baf17e955-96b7-49e9-ad8a-7ee0ac097f37%7d%23%7b3faca1d0-be40-445c-a577-c742c2d367a8%7d%23%7b6296a8d6-7cab-4609-b7f7-b6b7c3a264d6%7d%23%7b43e2b52d-e4f1-4628-84ad-0042d644deaf%7d"
# Parse the RSS feed
feed = feedparser.parse(rss_url)
# view the entire feed, one entry at a time
for post in feed.entries:
# Create variables from posts
link = post.link
title = post.title
# Add the link to the dictionary
n += 1
links[n] = link
for k,v in links.items():
# Open RSS feed
page = urllib.urlopen(v).read()
page = str(page)
soup = BeautifulSoup(page)
# Find all of the text between paragraph tags and strip out the html
page = soup.find('p').getText()
# Strip ampersand codes and WATCH:
page = re.sub('&w+;','',page)
page = re.sub('WATCH:','',page)
# Print Page
print(page)
print(" ")
# To stop after 3rd article, just whilst testing ** to be removed **
if (k >= 3):
break
This produces the following output:
>>> (executing lines 1 to 45 of "RSS_BeautifulSoup.py")
?Total deposits held with Guernsey banks at the end of June 2012 increased 2.1% in sterling terms by £2.1 billion from the end of March 2012 level of £101 billion, up to £103.1 billion. This is 9.4% lower than the same time a year ago. Total assets and liabilities increased by £2.9 billion to £131.2 billion representing a 2.3% increase over the quarter though this was 5.7% lower than the level a year ago. The higher figures reflected the effects both of volume and exchange rate factors.
The net asset value of total funds under management and administration has increased over the quarter ended 30 June 2012 by £711 million (0.3%) to reach £270.8 billion.For the year since 30 June 2011, total net asset values decreased by £3.6 billion (1.3%).
The Commission has updated the warranties on the Form REG, Form QIF and Form FTL to take into account the Commission’s Guidance Notes on Personal Questionnaires and Personal Declarations. In particular, the following warranty (varies slightly dependent on the application) has been inserted in the aforementioned forms,
>>>
The problem is that this is the first paragraph of each article, however I need to show the entire article. Any help would be gratefully received.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…