python - Parsing non-standard XML (CDATA tag)

Question

Welcome To Ask or Share your Answers For Others

python - Parsing non-standard XML (CDATA tag)

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Parsing non-standard XML (CDATA tag)

When I want to parsing XML document in Python using BeautifulSoup library, I faced some problems. The XML document that I want to parse:

<item>
<title><![CDATA[Title Sample]]></title>
<link /><![CDATA[http://banhada.kr/?cateCode=09&viewCode=S0941580]]>
<time_start>2011-10-10 09:00:00</time_start>
<time_end>2011-10-17 09:00:00</time_end>
<price_original>35000</price_original>
<price_now>20000</price_now>
</item>

As you can see above, tag is a little strange. In my opinion, that( tag) is not a stand XML form, right? How can I parse this terrible form?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:04:42+0000

You don't need BeautifulStoneSoup or lxml. Python's included batteries do the job just fine, and there doesn't seem to be anything non-compliant about your XML.

>>> content='''
... <item>
... <title><![CDATA[Title Sample]]></title>
... <link /><![CDATA[http://banhada.kr/?cateCode=09&viewCode=S0941580]]>
... <time_start>2011-10-10 09:00:00</time_start>
... <time_end>2011-10-17 09:00:00</time_end>
... <price_original>35000</price_original>
... <price_now>20000</price_now>
... </item>'''
>>> import xml.etree.cElementTree as et
>>> foo = et.XML(content)
>>> for e in foo:
...     print e.tag, e.text, repr(e.tail)
...
title Title Sample '
'
link None 'http://banhada.kr/?cateCode=09&viewCode=S0941580
'
time_start 2011-10-10 09:00:00 '
'
time_end 2011-10-17 09:00:00 '
'
price_original 35000 '
'
price_now 20000 '
'
>>>

Categories

python - Parsing non-standard XML (CDATA tag)

python - Parsing non-standard XML (CDATA tag)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags