pandas - Create a DataFrame in Python from BeautifulSoup extract

Question

Welcome To Ask or Share your Answers For Others

pandas - Create a DataFrame in Python from BeautifulSoup extract

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

pandas - Create a DataFrame in Python from BeautifulSoup extract

I am able to extract data , a list of words from the web using BeautifulSoup. The data is collected in the component synonyms[i].text. However when I want to convert the data extracted into a dataframe I get the words split into letters rather than in complete words. How do I convert the data into a proper list of words in a proper dataframe i.e., where a word like 'analyse' is in the data frame as 'analyse' and not split as 'a','n','a','l','y','s','e' ?

import requests
from bs4 import BeautifulSoup
import pandas as pd
page = requests.get("https://www.wordhippo.com/what-is/another-word-for/guard.html")
soup = BeautifulSoup(page.content, 'html.parser')


keyword = "guard"

synonyms = soup.select('.relatedwords')
for i in range(0, 1):
              print ('synonyms section ' + str(i + 1))
              print pd.DataFrame((list(synonyms[i].text)))

#Output that I need to convert into a DataFrame
synonyms section 1

fighter
trooper
warrior
serviceman

#The Output I am getting in the list

Thanks in advance.

question from:https://stackoverflow.com/questions/65896733/create-a-dataframe-in-python-from-beautifulsoup-extract

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:16:45+0000

I think you need strip for remove first and last and then split for list of words:

for i in range(0, 1):
    print ('synonyms section ' + str(i + 1))
    print (pd.DataFrame({'text': synonyms[i].text.strip().split()}))
    
          text
0     guardian
1    custodian
2       warden
3       keeper
4       sentry
..         ...
211    soldier
212       park
213     ranger
214       more
215          ?

[216 rows x 1 columns]

If need all values to DataFrame use extend method for add lists to L list and then call DataFrame constructor outside loop:

L = []
for i, syno in enumerate(synonyms):
    print ('synonyms section ' + str(i + 1))
    L.extend(syno.text.strip().split())

df = pd.DataFrame({'text':L})
print(df)
           text
0      guardian
1     custodian
2        warden
3        keeper
4        sentry
        ...
7667  Languages
7668          g
7669         gu
7670        gua
7671       guar

[7672 rows x 1 columns]

Categories

pandas - Create a DataFrame in Python from BeautifulSoup extract

pandas - Create a DataFrame in Python from BeautifulSoup extract

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags