Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
204 views
in Technique[技术] by (71.8m points)

python finding embedded mp4 file with Beautifulsoup

I am new to bs4!

I have looked up many tutorials but nothing will work... I want to scrape the mp4 file from a site but the embedded stuff looks different than on the tutorials... I have tried the find and find_all function but cant get it to work. Can anyone help?

<div class="rmp-playlist-container">
<div class="rmp-playlist-player-wrapper">
<div id="rmpPlayer"></div>
</div>
</div>
<p><script>var playlistData = [{src: {mp4:["https://wantedurl.mp4"]},"contentMetadata": {"title": "video1",   "thumbnail":"https://somethumbnail.jpg","poster": [   "https://someposter.jpg"]}

current code:

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'From': '[email protected]'  # This is another valid field
}

base_url = "url"

r = requests.get(base_url,headers=headers)

patt = re.compile(r'mp4:s*["(.+?)"]')
soup = BeautifulSoup(r, 'html.parser')
print(soup)

for e in soup.find_all('script'):
    m = patt.search(e.string)
    if m:
        print(m.group(1))


question from:https://stackoverflow.com/questions/65904149/python-finding-embedded-mp4-file-with-beautifulsoup

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can try regular expressions to parse the javascript text.

from bs4 import BeautifulSoup
import re

patt = re.compile(r'mp4:s*["(.+?)"]')

data = '''
<div class="rmp-playlist-container">
<div class="rmp-playlist-player-wrapper">
<div id="rmpPlayer"></div>
</div>
</div>
<p><script>var playlistData = [{src: {mp4:["https://wantedurl.mp4"]},"contentMetadata": {"title": "video1",   "thumbnail":"https://somethumbnail.jpg","poster": [   "https://someposter.jpg"]}}];
</script>
'''

soup = BeautifulSoup(data, 'html.parser')

for e in soup.find_all('script'):
    m = patt.search(e.string)
    if m:
        print(m.group(1))


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...