I'm trying to write a regular expression using the PCRE library in PHP.
I need a regex to match only &
, >
and <
chars that exist within string part of any XML node and not the tag declaration themselves.
Input XML:
<pnode>
<cnode>This string contains > and < and & chars.</cnode>
</pnode>
The idea is to to a search and replace these chars and convert them to XML entities equivalents.
If I was to convert the entire XML to entities the XML would look like this:
Entire XML converted to entities
<pnode>
<cnode>This string contains > and < and & chars.</cnode>
</pnode>
I need it to look like this:
Correct XML
<pnode>
<cnode>This string contains > and < and & chars.</cnode>
</pnode>
I have tried to write a regular expression to match these chars using look-ahaead but I don't know enough to get this to work. My attempt (currently only attempting to match > symbols):
/>(?=[^<]*<)/g
Just to make it clear the XML I'm trying to fix comes from a 3rd party and they seem unable to fix it their end hence my attempt to fix it.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…