Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
314 views
in Technique[技术] by (71.8m points)

php - Use DOM and XPath to remove a node from a sitemap file

I am trying to develop a function that removes certain URL nodes from my sitemap file. Here is what I have so far.

$xpath = new DOMXpath($DOMfile);
$elements = $xpath->query("/urlset/url/loc[contains(.,'$pageUrl')]");
echo count($elements);
foreach($elements as $element){
    //this is where I want to delete the URL
    echo $element;
    echo "here".$element->nodeValue;
}

Which outputs "111111". I don't know why I can't echo a string in a foreach loop if the $elements count is '1'.

Up until now, I've been doing

$urls = $dom->getElementsByTagName( "url" );
foreach( $urls as $url ){
    $locs = $url->getElementsByTagName( "loc" );
    $loc = $locs->item(0)->nodeValue;
    echo $loc;
    if($loc == $fullPageUrl){
                   $removeUrl = $dom->removeChild($url);                
    }
}

Which would work fine if my sitemap wasn't so big. It times out right now, so I'm hoping using xpath queries will be faster.

After Gordon's comment, I tried:

$xpath = new DOMXpath($DOMfile);
$query = sprintf('/urlset/url[./loc = "%d"]', $pageUrl);
foreach($xpath->query($query) as $element) {
    //this is where I want to delete the URL
    echo $element;
    echo "here".$element->nodeValue;
}

And its not returning anything.

I tried going a step further and used codepad, using what was used in the other post mentioned, and did this:

<?php error_reporting(-1);
$xml = <<< XML <?xml version="1.0"
encoding="UTF-8" ?> <url>
<loc>professional_services</loc>
<loc>5professional_services</loc>
<loc>6professional_services</loc> 
</url> XML; 
$id = '5professional_services'; 
$dom = new DOMDocument; $dom->loadXML($xml);
$xpath = new DOMXPath($dom); $query = sprintf('/url/[loc = $id]');
foreach($xpath->query($query) as $record) {
     $record->parentNode->removeChild($record);
}
echo $dom->saveXml();

and I'm getting a "Warning: DOMXPath::query(): Invalid expression" at the foreach loop line. Thanks for the other comment on the urlset, I'll be sure to include the double slashes in my code, tried it and it returned nothing.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

XML from a sitemap should be :

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc></loc>
...
</url>
<url>
<loc></loc>
...
</url>
...
</urlset>

Since it got a namespace, the query is a little more complicated than my previous answer :

$xpath = new DOMXpath($DOMfile);
// Here register your namespace with a shortcut
$xpath->registerNamespace('sm', "http://www.sitemaps.org/schemas/sitemap/0.9");
// this request should work
$elements = $xpath->query('/sm:urlset/sm:url[sm:loc = "'.$pageUrl.'"]');

foreach($elements as $element){
    // This is a hint from the manual comments
    $element->parentNode->removeChild($element);
}
echo $DOMfile->saveXML();

I'm writing out of memory just before going to bed. If it doesn't work I'll go test tomorrow morning. (And yes, I'm aware that it could bring some downvotes)

If you don't have a namespace (you should but that's not an obligation sigh)

$elements = $xpath->query('/urlset/url[loc = "'.$pageUrl.'"]');

You got a concrete example that it's working here : http://codepad.org/vuGl1MAc


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...