如何使用HTMLAgilityPack选择HtmlNodeType.Comment的节点类型
我想从html中删除像
如何使用HTMLAgilityPack在C#中执行此操作?
我正在使用
static void RemoveTag(HtmlNode node, string tag) { var nodeCollection = node.SelectNodes("//"+ tag ); if(nodeCollection!=null) foreach (HtmlNode nodeTag in nodeCollection) { nodeTag.Remove(); } }
对于普通标签。
public static void RemoveComments(HtmlNode node) { foreach (var n in node.ChildNodes.ToArray()) RemoveComments(n); if (node.NodeType == HtmlNodeType.Comment) node.Remove(); } static void Main(string[] args) { var doc = new HtmlDocument(); string html = @" "; doc.LoadHtml(html); RemoveComments(doc.DocumentNode); Console.WriteLine(doc.DocumentNode.OuterHtml); Console.ReadLine(); }
或者一个有趣的小LINQ风格:
public static IEnumerable Walk(HtmlNode node) { yield return node; foreach (var child in node.ChildNodes) foreach (var x in Walk(child)) yield return x; } ... foreach (var n in Walk(doc.DocumentNode).OfType().ToArray()) n.Remove();
更容易(忘了我们可以用xpath来查找注释节点)
var doc = new HtmlDocument(); string html = @" "; doc.LoadHtml(html); foreach (var n in doc.DocumentNode.SelectNodes("//comment()") ?? new HtmlNodeCollection(doc.DocumentNode)) n.Remove();
@Mark,结合你的第三个例子来制作这个,供参考:
public static string CleanUpRteOutput(this string s) { if (s != null) { HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(s); RemoveTag(doc, "script"); RemoveTag(doc, "link"); RemoveTag(doc, "style"); RemoveTag(doc, "meta"); RemoveTag(doc, "comment"); ...
和removeTag函数:
static void RemoveTag(HtmlAgilityPack.HtmlDocument doc, string tag) { foreach (var n in doc.DocumentNode.SelectNodes("//" + tag) ?? new HtmlAgilityPack.HtmlNodeCollection(doc.DocumentNode)) n.Remove(); }