I have a project where I am outputting content from a CMS into XML.. I don't fully control the content the CMS, and we now have a problem that certain content doesn't fully conform to XML
<Block PageGuid="xxx" PageId="1234" PageType="block" PageName="blockpage" PageUrl="/en/New-Folder7/New-Folder8/" CreateBlock="false">
<Properties>
<Property PropertyName="EmbedCode" Ignore="false" DefaultLanguageChanged="true" TranslatedChanged="true">
<DefaultLanguage><DIV id=TA_sss class=TA_sss><UL id=sdfsdfsdfsdf class="TA_links xx"><LI id=sdfsdfsf class=sdfsfsf><A href="http://www.tripadvisor.co.uk/">xxxxxxxxx</A></LI></UL></DIV><SCRIPT src="http://www.jscache.com/"></SCRIPT></DefaultLanguage>
<Translation><DIV id=TA_sss class=TA_sss><UL id=xxxx class='TA_links xxx'><LI id=xxxx class=xxxx><A href='http://www.tripadvisor.co.uk/'>xxxxxxxxx</A></LI></UL></DIV><SCRIPT src='http://www.jscache.com/'></SCRIPT></Translation>
<PreviousValues>
<PreviousDefaultText></PreviousDefaultText>
<PreviousTranslationText></PreviousTranslationText>
</PreviousValues>
</Property>
</Properties>
</Block>
See the above XML.. I need to find any cases where I have an attribute with a missing quote, adding the in:
i.e.
And cases where they are single quotes, replacing with double quotes
i.e.
http://www.tripadvisor.co.uk/'>
I have the entire XML in a string, so I am hoping there is a Regex I can use to do this?
My solution:
var reader = new StringReader(xml);
var sgmlReader = new Sgml.SgmlReader
{
DocType = "HTML",
WhitespaceHandling = WhitespaceHandling.All,
CaseFolding = Sgml.CaseFolding.ToLower,
InputStream = reader
};
var doc = new XmlDocument { PreserveWhitespace = true, XmlResolver = null };
doc.Load(sgmlReader);
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…