python finding embedded mp4 file with Beautifulsoup

Question

Welcome To Ask or Share your Answers For Others

python finding embedded mp4 file with Beautifulsoup

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python finding embedded mp4 file with Beautifulsoup

I am new to bs4!

I have looked up many tutorials but nothing will work... I want to scrape the mp4 file from a site but the embedded stuff looks different than on the tutorials... I have tried the find and find_all function but cant get it to work. Can anyone help?

<div class="rmp-playlist-container">
<div class="rmp-playlist-player-wrapper">
<div id="rmpPlayer"></div>
</div>
</div>
<p><script>var playlistData = [{src: {mp4:["https://wantedurl.mp4"]},"contentMetadata": {"title": "video1",   "thumbnail":"https://somethumbnail.jpg","poster": [   "https://someposter.jpg"]}

current code:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'From': '[email protected]'  # This is another valid field
}

base_url = "url"

r = requests.get(base_url,headers=headers)

patt = re.compile(r'mp4:s*["(.+?)"]')
soup = BeautifulSoup(r, 'html.parser')
print(soup)

for e in soup.find_all('script'):
    m = patt.search(e.string)
    if m:
        print(m.group(1))

question from:https://stackoverflow.com/questions/65904149/python-finding-embedded-mp4-file-with-beautifulsoup

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:15:09+0000

You can try regular expressions to parse the javascript text.

from bs4 import BeautifulSoup
import re

patt = re.compile(r'mp4:s*["(.+?)"]')

data = '''
<div class="rmp-playlist-container">
<div class="rmp-playlist-player-wrapper">
<div id="rmpPlayer"></div>
</div>
</div>
<p><script>var playlistData = [{src: {mp4:["https://wantedurl.mp4"]},"contentMetadata": {"title": "video1",   "thumbnail":"https://somethumbnail.jpg","poster": [   "https://someposter.jpg"]}}];
</script>
'''

soup = BeautifulSoup(data, 'html.parser')

for e in soup.find_all('script'):
    m = patt.search(e.string)
    if m:
        print(m.group(1))

Categories

python finding embedded mp4 file with Beautifulsoup

python finding embedded mp4 file with Beautifulsoup

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags