如何在C＃中以最快的方式检索HTMLDocument的所有文本节点？

我需要在HTMLDocument的所有文本节点上执行一些逻辑。这就是我目前这样做的方式：

HTMLDocument pageContent = (HTMLDocument)_webBrowser2.Document; IHTMLElementCollection myCol = pageContent.all; foreach (IHTMLDOMNode myElement in myCol) { foreach (IHTMLDOMNode child in (IHTMLDOMChildrenCollection)myElement.childNodes) { if (child.nodeType == 3) { //Do something with textnode! } } }

由于myCol中的某些元素也有子元素，它们本身也在myCol中，因此我不止一次访问某些节点！必须有更好的方法来做到这一点？

最好在递归函数中迭代childNodes（直接后代），从顶层开始，类似于：

 HtmlElementCollection collection = pageContent.GetElementsByTagName("HTML"); IHTMLDOMNode htmlNode = (IHTMLDOMNode)collection[0]; ProcessChildNodes(htmlNode); private void ProcessChildNodes(IHTMLDOMNode node) { foreach (IHTMLDOMNode childNode in node.childNodes) { if (childNode.nodeType == 3) { // ... } ProcessChildNodes(childNode); } }

您可以使用HTML Agility Pack中的 XPath一次访问所有文本节点。

我认为这将如图所示，但没有尝试过。

 using HtmlAgilityPack; HtmlDocument htmlDoc = new HtmlDocument(); // filePath is a path to a file containing the html htmlDoc.Load(filePath); HtmlNodeCollection coll = htmlDoc.DocumentNode.SelectNodes("//text()"); foreach (HTMLNode node in coll) { // do the work for a text node here }

Interesting Posts

使用拖放时，我可以使Treeview扩展用户hover的节点吗？

从前端javascript中加载的DLL中调用函数（在clientside javascript中加载dll）

在调试/单步执行中检查变量时，函数评估超时

不使用LinQ查询语法调用Select方法

Windows Phone GeoCoordinateWatcher上相同物理位置的不同GPS位置读取

Linq to entities – 2个键之间的第一个字符串

amazon web services工作示例

处理回发数据中的数组 – MVC3

如何使用Win32调用在C＃中关闭/打开控制台？

搜索范围列表中数字的最快方法