Html Agility Pack – 删除元素,但不删除innerHtml

我可以通过note.Remove()来轻松删除元素:

HtmlDocument html = new HtmlDocument(); html.Load(Server.MapPath(@"~\Site\themes\default\index.cshtml")); foreach (var item in html.DocumentNode.SelectNodes("//removeMe")) { item.Remove(); } 

但这也删除了innerHtml。 如果我只想删除标签并保留innerHtml怎么办?

例:

  

任何帮助,将不胜感激 :)

 HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(html); var node = doc.DocumentNode.SelectSingleNode("//removeme"); node.ParentNode.RemoveChild(node, true); 

这应该工作:

 foreach (var item in doc.DocumentNode.SelectNodes("//removeMe")) { if (item.PreviousSibling == null) { //First element -> so add it at beginning of the parent's innerhtml item.ParentNode.InnerHtml = item.InnerHtml + item.ParentNode.InnerHtml; } else { //There is an element before itemToRemove -> add the innerhtml after the previous item foreach(HtmlNode node in item.ChildNodes){ item.PreviousSibling.ParentNode.InsertAfter(node, item.PreviousSibling); } } item.Remove(); } 

bool KeepGrandChildren实现的问题可能是那些可能包含他们试图删除的元素的文本的人。 如果removeme标记中包含文本,则文本也将被删除。 例如text

more text

将成为

more text

试试这个:

 private static void RemoveElementKeepText(HtmlNode node) { //node.ParentNode.RemoveChild(node, true); HtmlNode parent = node.ParentNode; HtmlNode prev = node.PreviousSibling; HtmlNode next = node.NextSibling; foreach (HtmlNode child in node.ChildNodes) { if (prev != null) parent.InsertAfter(child, prev); else if (next != null) parent.InsertBefore(child, next); else parent.AppendChild(child); } node.Remove(); } 

有一个简单的方法:

  element.InnerHtml = element.InnerHtml.Replace("
", "{1}"); var innerTextWithBR = element.InnerText.Replace("{1}", "
");

也许这可能是你想要的?

 foreach (HtmlNode node in html.DocumentNode.SelectNodes("//removeme")) { HtmlNodeCollection children = node.ChildNodes; //get 's children HtmlNode parent = node.ParentNode; //get 's parent node.Remove(); //remove  parent.AppendChildren(children); //append the children to the parent } 

编辑:LB的答案更清晰。 和他一起去!

这个怎么样?

 var removedNodes = document.SelectNodes("//removeme"); if(removedNodes != null) foreach(var rn in removedNodes){ HtmlTextNode innernodes =document.CreateTextNode(rn.InnerHtml); rn.ParnetNode.ReplaceChild(innernodes, rn); } 

添加我的两分钱,因为这些方法都没有处理我想要的东西(删除一组给定的标签,如pdiv并在保留内部标签时正确处理嵌套)。

这就是我想出的内容,并将我所考虑的大部分案例中的所有unit testing通过:

 var htmlDoc = new HtmlDocument(); // load html htmlDoc.LoadHtml(html); var tags = (from tag in htmlDoc.DocumentNode.Descendants() where tagNames.Contains(tag.Name) select tag).Reverse(); // find formatting tags foreach (var item in tags) { if (item.PreviousSibling == null) { // Prepend children to parent node in reverse order foreach (HtmlNode node in item.ChildNodes.Reverse()) { item.ParentNode.PrependChild(node); } } else { // Insert children after previous sibling foreach (HtmlNode node in item.ChildNodes) { item.ParentNode.InsertAfter(node, item.PreviousSibling); } } // remove from tree item.Remove(); } // return transformed doc html = htmlDoc.DocumentNode.WriteContentTo().Trim(); 

以下是我用来测试的案例:

 [TestMethod] public void StripTags_CanStripSingleTag() { var input = "

tag

"; var expected = "tag"; var actual = HtmlUtilities.StripTags(input, "p"); Assert.AreEqual(expected, actual); } [TestMethod] public void StripTags_CanStripNestedTag() { var input = "

tag

inner

"; var expected = "tag inner"; var actual = HtmlUtilities.StripTags(input, "p"); Assert.AreEqual(expected, actual); } [TestMethod] public void StripTags_CanStripTwoTopLevelTags() { var input = "

tag

block
"; var expected = "tag block"; var actual = HtmlUtilities.StripTags(input, "p", "div"); Assert.AreEqual(expected, actual); } [TestMethod] public void StripTags_CanStripMultipleNestedTags_2LevelsDeep() { var input = "

tag

inner

"; var expected = "tag inner"; var actual = HtmlUtilities.StripTags(input, "p", "div"); Assert.AreEqual(expected, actual); } [TestMethod] public void StripTags_CanStripMultipleNestedTags_3LevelsDeep() { var input = "

tag

inner

superinner

"; var expected = "tag inner superinner"; var actual = HtmlUtilities.StripTags(input, "p", "div"); Assert.AreEqual(expected, actual); } [TestMethod] public void StripTags_CanStripTwoTopLevelMultipleNestedTags_3LevelsDeep() { var input = "

tag

inner

superinner

inner

toplevel
"; var expected = "tag inner superinner inner toplevel"; var actual = HtmlUtilities.StripTags(input, "p", "div"); Assert.AreEqual(expected, actual); } [TestMethod] public void StripTags_IgnoresTagsThatArentSpecified() { var input = "

tag

"; var expected = "tag inner superinner"; var actual = HtmlUtilities.StripTags(input, "p", "div"); Assert.AreEqual(expected, actual); input = "

tag

inner

"; expected = "tag inner"; actual = HtmlUtilities.StripTags(input, "p", "div"); Assert.AreEqual(expected, actual); } [TestMethod] public void StripTags_CanStripSelfClosingAndUnclosedTagsLikeBr() { var input = "

tag



"; var expected = "tag"; var actual = HtmlUtilities.StripTags(input, "p", "br"); Assert.AreEqual(expected, actual); }

它可能无法处理所有事情,但它可以满足我的需求。

通常,正确的表达式是node.ParentNode.RemoveChildren(node, true)

由于HtmlNode.RemoveChildren()http://htmlagilitypack.codeplex.com/discussions/79587 )中的排序错误,我创建了一个类似的方法。 对不起,这是VB。 如果有人想要翻译,我会写一个。

 'The HTML Agility Pack (1.4.9) includes the HtmlNode.RemoveChild() method but it has an ordering bug with preserving child nodes. 'The below implementation orders children correctly. Private Shared Sub RemoveNode(node As HtmlAgilityPack.HtmlNode, keepChildren As Boolean) Dim parent = node.ParentNode If keepChildren Then For i = node.ChildNodes.Count - 1 To 0 Step -1 parent.InsertAfter(node.ChildNodes(i), node) Next End If node.Remove() End Sub 

我已使用以下测试标记测试了此代码:

  outertextbegin 

innertext1

innertext2

outertextend

输出是:

 outertextbegin 

innertext1

innertext2

outertextend

使用正则表达式你可以做什么或者你需要用htmlagilitypack做什么?

 string html = ""; html = Regex.Replace(html, "", "", RegexOptions.Compiled); html = Regex.Replace(html, "", "", RegexOptions.Compiled);