HtmlAgilityPack – 从html表中获取数据

我的程序使用HtmlAgilityPack并抓取HTML网页,将其存储在变量中,并且我试图从HTML两个表中获取特定Div类标记(boardcontainer)。 使用我当前的代码,它在整个网页中搜索每个表并显示它们,但是当一个单元格为空时它会抛出exception:

“NullReferenceException未处理 – 对象引用未设置为对象的实例。”

HTML的一小部分(在这种情况下,我在网站上搜索’Microsoft’:

这是我当前的代码,它只抓取表并显示行+单元格然后在Null上抛出exception。

  string html = myRequest.GetResponse(); HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(html); foreach (HtmlNode table in htmlDoc.DocumentNode.SelectNodes("//table")) { Console.WriteLine("Found: " + table.Id); foreach (HtmlNode row in table.SelectNodes("tr")) { Console.WriteLine("row"); foreach (HtmlNode cell in row.SelectNodes("th|td")) //Exception is thrown here { Console.WriteLine("cell: " + cell.InnerText); } } } 

如何更改此选项以搜索特定div类并从内部提取表?

谢谢你的阅读。

完整的HTML:

     
SAYNOTO0870.COM - Non-Geographical Alternative Telephone Numbers
Main Database
Company Name 0870 / 0871 0844 / 0845 01 / 02 / 03 Freephone Other Information
Microsoft 0870 601 0100 0844 800 2400 01954 713950 Customer Support
Straight to agent (no menu)
Also for 0870 6010200
Microsoft 0870 601 0100 0844 800 2400 0118 909 7800 Main UK Switchboard
Ask to be put through to required department
Also for 0870 6010200
SAYNOTO0870.COM Non-Geographical Alternative Telephone Numbers
Awarded Website Of The Day by BBC Radio 2, and featured
on the BBC Radio 2's Jeremy Vine show and The Guardian.
Save Money on your Gas and Electricity
HomeHome Discussion ForumDiscussion Forum LinksLinks HelpHelp Contact UsContact Us
SearchSearch to find an alternative number Add A New NumberClick here to add a new alternative number


Main Database
Company Name 0870 / 0871 0844 / 0845 01 / 02 / 03 Freephone Other Information
Microsoft 0870 601 0100 0844 800 2400 01954 713950 Customer Support
Straight to agent (no menu)
Also for 0870 6010200
Microsoft 0870 601 0100 0844 800 2400 0118 909 7800 Main UK Switchboard
Ask to be put through to required department
Also for 0870 6010200
Microsoft 0870 601 0100 0844 800 2400 +35314502113 Customer Support
Answers as Microsoft Ireland with same options as UK 08 numbers
Reduce cost using 1899 (or similar)
Also for 0870 6010200
Microsoft 0870 241 1963 0844 800 2400 020 3147 4930 0800 0188354 Product Activation
Home & Business (Volume Licensing)
Also: 0800 018 8364 & +800 2284 8283
Also for 0870 6010100 & 0870 6010200
Microsoft 0870 241 1963 0800 9179016 Volume Licensing
Microsoft 020 3027 6039 0800 7318457 Online Services Support
MSN, Hotmail, Live, Messenger etc
Also: 0800 587 2920
Microsoft 0870 607 0700 0844 800 6006 +35317065353 Ask Partner Hotline
Answers with same options
Reduce cost using 1899 (or similar)
Microsoft 0870 607 0700 0844 800 6006 0800 9173128 Partner Network Regional Service Centre
Help with membership questions and tools, benefits and resource queries
Microsoft 0870 601 0100 0844 800 2400 0800 0324479 Direct Services
Also for 0870 6010200
Microsoft 0870 601 0100 0844 800 2400 +35318831002 0800 0517215 MSDN (Microsoft Developers Network)
When calling +353 reduce cost using 1899 (or similar)
Also for 0870 6010200
Microsoft 0870 601 0100 0844 800 2400 +35318831002 0800 281221 Microsoft Technet
When calling +353 reduce cost using 1899 (or similar)
Also for 0870 6010200
Microsoft XBOX 020 7365 9792 0800 5871102 Customer Support

Unverified Numbers Database
Company Name 0870 / 0871 0844 / 0845 01 / 02 / 03 Freephone Other Information
Microsoft 0870 501 0800 0844 800 8338 0118 909 7994 Premier Support
Microsoft AskPartner (Licensing) 0870 607 0700 020 8784 1000 Switchboard of Sitel UK in Kingston where the AskPartner team is based. Ask for Microsoft Team. 0800 - 1800.
Microsoft Office Live Meeting 020 3024 9260 0800 0854811 EMC Conferencing on Meeting Place

ConsumerChoices

Info Centre
Please use the Contact Us option, to report any incorrect numbers that you notice on the site. Thanks for your support.
lllll Main Database - A number that has been checked and at the time it was checked worked correctly. Please let us know of any numbers that no longer work as expected.
lllll Unverified Number - A number that has been added by a visitor to the website, and hasn't yet been verified as correct. Please use the Contact Us link at the top of the page to let us know if these work (or don't work) for you.




Website and Content © 1999-2011 SAYNOTO0870.COM.  All Rights Reserved.
Written permission is required to duplicate any of the content within this site.
_uacct = "UA-194609-1";urchinTracker();

以下XPATH允许您在HTML文档中搜索特定的DIV (带有“boardcontainer”类):

 //div[@class='boardcontainer']/table 

要处理空行,只需检查返回的HtmlNodeCollection是否为null

这是一个完整的例子:

 HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(html); foreach (HtmlNode table in htmlDoc.DocumentNode.SelectNodes("//div[@class='boardcontainer']/table")) { Console.WriteLine("Found: " + table.Id); foreach (HtmlNode row in table.SelectNodes("tr")) { Console.WriteLine("row"); HtmlNodeCollection cells = row.SelectNodes("th|td"); if (cells == null) { continue; } foreach (HtmlNode cell in cells) { Console.WriteLine("cell: " + cell.InnerText); } } } 

您还应该检查是否找到了一个表,以及找到的表是否包含行。

尝试:

 foreach (HtmlNode table in htmlDoc.DocumentNode.SelectNodes("//div[@class='boardcontainer']/table")) 

它是与属性匹配的XPath表达式。 有关详情,请参阅此处:

http://www.exampledepot.com/egs/org.w3c.dom/xpath_getelembyattr.html