python - How to use Beautiful Soup to extract string in <script> tag?

Question

Welcome To Ask or Share your Answers For Others

python - How to use Beautiful Soup to extract string in <script> tag?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to use Beautiful Soup to extract string in <script> tag?

In a given .html page, I have a script tag like so:

     <script>jQuery(window).load(function () {
  setTimeout(function(){
    jQuery("input[name=Email]").val("[email protected]");
  }, 1000);
});</script>

How can I use Beautiful Soup to extract the email address?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:08:29+0000

To add a bit more to the @Bob's answer and assuming you need to also locate the script tag in the HTML which may have other script tags.

The idea is to define a regular expression that would be used for both locating the element with BeautifulSoup and extracting the email value:

import re

from bs4 import BeautifulSoup


data = """
<body>
    <script>jQuery(window).load(function () {
      setTimeout(function(){
        jQuery("input[name=Email]").val("[email protected]");
      }, 1000);
    });</script>
</body>
"""
pattern = re.compile(r'.val("([^@]+@[^@]+.[^@]+)");', re.MULTILINE | re.DOTALL)
soup = BeautifulSoup(data, "html.parser")

script = soup.find("script", text=pattern)
if script:
    match = pattern.search(script.text)
    if match:
        email = match.group(1)
        print(email)

Prints: [email protected].

Here we are using a simple regular expression for the email address, but we can go further and be more strict about it but I doubt that would be practically necessary for this problem.

Categories

python - How to use Beautiful Soup to extract string in <script> tag?

python - How to use Beautiful Soup to extract string in <script> tag?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags