Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
413 views
in Technique[技术] by (71.8m points)

pandas - Create a DataFrame in Python from BeautifulSoup extract

I am able to extract data , a list of words from the web using BeautifulSoup. The data is collected in the component synonyms[i].text. However when I want to convert the data extracted into a dataframe I get the words split into letters rather than in complete words. How do I convert the data into a proper list of words in a proper dataframe i.e., where a word like 'analyse' is in the data frame as 'analyse' and not split as 'a','n','a','l','y','s','e' ?

import requests
from bs4 import BeautifulSoup
import pandas as pd
page = requests.get("https://www.wordhippo.com/what-is/another-word-for/guard.html")
soup = BeautifulSoup(page.content, 'html.parser')


keyword = "guard"

synonyms = soup.select('.relatedwords')
for i in range(0, 1):
              print ('synonyms section ' + str(i + 1))
              print pd.DataFrame((list(synonyms[i].text)))

#Output that I need to convert into a DataFrame
synonyms section 1

fighter
trooper
warrior
serviceman

#The Output I am getting in the list

enter image description here

Thanks in advance.

question from:https://stackoverflow.com/questions/65896733/create-a-dataframe-in-python-from-beautifulsoup-extract

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think you need strip for remove first and last and then split for list of words:

for i in range(0, 1):
    print ('synonyms section ' + str(i + 1))
    print (pd.DataFrame({'text': synonyms[i].text.strip().split()}))
    
          text
0     guardian
1    custodian
2       warden
3       keeper
4       sentry
..         ...
211    soldier
212       park
213     ranger
214       more
215          ?

[216 rows x 1 columns]

If need all values to DataFrame use extend method for add lists to L list and then call DataFrame constructor outside loop:

L = []
for i, syno in enumerate(synonyms):
    print ('synonyms section ' + str(i + 1))
    L.extend(syno.text.strip().split())

df = pd.DataFrame({'text':L})
print(df)
           text
0      guardian
1     custodian
2        warden
3        keeper
4        sentry
        ...
7667  Languages
7668          g
7669         gu
7670        gua
7671       guar

[7672 rows x 1 columns]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...