Hi i am trying to web scrap the university of reading : http://www.reading.ac.uk/ready-to-study/study/subject-area/modern-languages-and-european-studies-ug/ba-spanish-and-history.aspx but i am having problem to extract the Course duration of it. can any one help me. i used the code below?
duration_title = soup.find('li', text=re.compile(r'Course duration', re.IGNORECASE)) if duration_title: duration = duration_title.find_next_sibling('strong') if duration: duration_text = duration.get_text() duration_ = re.search(r"d+(?:.d+)|d+", duration_text) if duration_ is not None: if duration_.group() == 1 or '1' in duration_.group(): course_data['Duration'] = duration_.group() course_data['Duration_Time'] = 'Year' elif '0.5' in duration_.group(): course_data['Duration'] = '6' course_data['Duration_Time'] = 'Months' else: course_data['Duration'] = duration_.group() course_data['Duration_Time'] = 'Years' else: course_data['Duration'] = 'Not mentioned' course_data['Duration_Time'] = 'Not mentioned' print('Duration: ', str(course_data['Duration']) + ' / ' + course_data['Duration_Time'])
Try text only and remove the li:
text
li
soup.find(text=re.compile(r'Course duration', re.IGNORECASE))
1.4m articles
1.4m replys
5 comments
57.0k users