Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
305 views
in Technique[技术] by (71.8m points)

c# - How to select node types which are HtmlNodeType.Comment using HTMLAgilityPack

I wish to remove from html things like

<!--[if gte mso 9]>
...
<![endif]-->


<!--[if gte mso 10]>
...
<![endif]-->

How to do this in C# using HTMLAgilityPack?

I'm using

static void RemoveTag(HtmlNode node, string tag)
        {
            var nodeCollection = node.SelectNodes("//"+ tag );
            if(nodeCollection!=null)
                foreach (HtmlNode nodeTag in nodeCollection)
                {
                    nodeTag.Remove();
                }
        }

for normal tags.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
        public static void RemoveComments(HtmlNode node)
        {
            foreach (var n in node.ChildNodes.ToArray())
                RemoveComments(n);
            if (node.NodeType == HtmlNodeType.Comment)
                node.Remove();
        }


        static void Main(string[] args)
        {
            var doc = new HtmlDocument();
            string html = @"<!--[if gte mso 9]>
...
<![endif]-->

<body>
    <span>
        <!-- comment -->
    </span>
    <!-- another comment -->
</body>

<!--[if gte mso 10]>
...
<![endif]-->";
            doc.LoadHtml(html);

            RemoveComments(doc.DocumentNode);
            Console.WriteLine(doc.DocumentNode.OuterHtml);
            Console.ReadLine();

        }

Or a fun little LINQ-style:

public static IEnumerable<HtmlNode> Walk(HtmlNode node)
{
    yield return node;
    foreach (var child in node.ChildNodes)
        foreach (var x in Walk(child))
            yield return x;
}

...

foreach (var n in Walk(doc.DocumentNode).OfType<HtmlCommentNode>().ToArray())
    n.Remove();

Even easier (forgot we could use xpath to find comment nodes)

    var doc = new HtmlDocument();
    string html = @"
<!--[if gte mso 9]>
...
<![endif]-->

<body>
<span>
<!-- comment -->
</span>
<!-- another comment -->
</body>

<!--[if gte mso 10]>
...
<![endif]-->";
    doc.LoadHtml(html);
    foreach (var n in doc.DocumentNode.SelectNodes("//comment()") ?? new HtmlNodeCollection(doc.DocumentNode))
        n.Remove();

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...