Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
518 views
in Technique[技术] by (71.8m points)

c# - remove html node from htmldocument :HTMLAgilityPack

In my code, I want to remove the img tag which doesn't have src value. I am using HTMLAgilitypack's HtmlDocument object. I am finding the img which doesn't have src value and trying to remove it.. but it gives me error Collection was modified; enumeration operation may not execute. Can anyone help me for this? The code which I have used is:

foreach (HtmlNode node in doc.DocumentNode.DescendantNodes())
{
    if (node.Name.ToLower() == "img")
    {                            
           string src = node.Attributes["src"].Value;
           if (string.IsNullOrEmpty(src))
           {
               node.ParentNode.RemoveChild(node, false);    
           }
   }
   else
   {
             ..........// i am performing other operations on document
   }
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It seems you're modifying the collection during the enumeration by using HtmlNode.RemoveChild method.

To fix this you need is to copy your nodes to a separate list/array by calling e.g. Enumerable.ToList<T>() or Enumerable.ToArray<T>().

var nodesToRemove = doc.DocumentNode
    .SelectNodes("//img[not(string-length(normalize-space(@src)))]")
    .ToList();

foreach (var node in nodesToRemove)
    node.Remove();

If I'm right, the problem will disappear.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...