Python Beautiful Soup and Regex - Double quotes not getting replaced

Question

Welcome To Ask or Share your Answers For Others

Python Beautiful Soup and Regex - Double quotes not getting replaced

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

Python Beautiful Soup and Regex - Double quotes not getting replaced

I am trying to scrape this website using BeautifulSoup and Regex. While doing so, I encountered a question which was having "double quotes" and I wanted to replace the "double quotes" and save it as a .txt file. But it is not replacing the "double quotes". We tried .replace() method but I failed. The code is as follows:

url = 'http://www.sanfoundry.com/operating-system-mcqs-process-scheduling-queue/'
r = requests.get(url)
soup = bs(r.content)
data = soup.find_all('div', {'class':'entry-content'})
data1 = data[0].text
pattern = r'^d{1,2}[.|)]([s|S].*)|(^[a-z])s.*)|^View Answers?(Answer:.*)'
#pattern = r'^d{1,2}[.|)]s*(.*)|(^[a-z])s.*)|^View Answers?(Answer:.*)'
reg = re.compile(pattern)
#with open(r'C:UsersdhvaniGoogle DrivePythonData Scrapingyb.txt', 'a') as f:
with open(r'C:UsersJeri_DabbaGoogle DrivePythonData Scrapingyb.txt', 'a') as f:

    for i in data1.split('
'):
        if reg.search(i).group(1):
           y = reg.search(i).group(1)
           y = y.replace('"', '')
           f.write(y + "
")

When I checked the .txt file the "double quotes" was not replaced. What might be the problem?

I am new to python.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:16:04+0000

This website includes characters that aren't 'normal' double quote characters i.e. not " U+0022

The site includes right and left double quotation marks unicode “ ” U+201C and U+201D

You can replace these:

y = y.replace('"', '')
y = y.replace('“', '')
y = y.replace('”', '')

Categories

Python Beautiful Soup and Regex - Double quotes not getting replaced

Python Beautiful Soup and Regex - Double quotes not getting replaced

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags