I need to parse a nested HTML list and convert it to a parent-child dict. Given this list:
<ul>
<li>Operating System
<ul>
<li>Linux
<ul>
<li>Debian</li>
<li>Fedora</li>
<li>Ubuntu</li>
</ul>
</li>
<li>Windows</li>
<li>OS X</li>
</ul>
</li>
<li>Programming Languages
<ul>
<li>Python</li>
<li>C#</li>
<li>Ruby</li>
</ul>
</li>
</ul>
I want to convert it to a dict like this:
{
'Operating System': {
'Linux': {
'Debian': None,
'Fedora': None,
'Ubuntu': None,
},
'Windows': None,
'OS X': None,
},
'Programming Languages': {
'Python': None,
'C#': None,
'Ruby': None,
}
}
My initial attempt is using find_all('li', recursive=False)
. It returns the top level items (Operating System and Programming Languages) but also the children.
How can I do it with BeautifulSoup?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…