Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
716 views
in Technique[技术] by (71.8m points)

html - DOM php delete all tags by tag name

I'm having a problem with a script I've been using for some time and it worked for me until I hit this problem.

I have a script with which I'd like to delete all p html tags from a html source code. The script does work partly because it only removes some of the p tags, but leaves some out.

I don't understand why it does that.

$doc = new DOMDocument();

$a = <<<FAIL
<html><body>
<div style="clear:both"></div>
<p class="articletitle">hoo</p>
<p class="articletext">hmmm</p>
<p class="articletext">hmmmm</p>
<p align="center"></p>
</body></html>
FAIL;

$doc->loadHTML($a);
$list = $doc->getElementsByTagName("p");

foreach ($list as $l) {
$l->parentNode->removeChild($l);
$c++;
}
echo $doc->saveHTML() . $c;

the script returns

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<div style="clear:both"></div>

<p class="articletext">hmmm</p>

<p align="center"></p>

leaving out two p tags...

Can you please help me to find out why it's skipping some tags

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try this way:

$doc->loadHTML($a);
$list = $doc->getElementsByTagName("p");

while ($list->length > 0) {
    $p = $list->item(0);
    $p->parentNode->removeChild($p);
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...