HTMLagilitypack没有删除所有的html标签如何有效地解决这个问题?
我使用以下方法从字符串中删除所有html:
public static string StripHtmlTags(string html) { if (String.IsNullOrEmpty(html)) return ""; HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(html); return doc.DocumentNode.InnerText; }
但它似乎忽略了以下标签: […]
所以字符串基本返回:
> A hungry thief who stole a rack of pork ribs from a grocery store has > been sentenced to spend 50 years in prison. Willie Smith Ward felt the > full force of the law after being convicted of the crime in Waco, > Texas, on Wednesday. The 43-year-old may feel slightly aggrieved over > the severity of the […]
如何确保剥离这些标签?
任何forms的帮助表示赞赏,谢谢。
试试HttpUtility.HtmlDecode
public static string StripHtmlTags(string html) { if (String.IsNullOrEmpty(html)) return ""; HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(html); return HttpUtility.HtmlDecode(doc.DocumentNode.InnerText); }
HtmlDecode会将[…]
转换为[…]