为什么这个HtmlAgilityPack操作在确实存在匹配元素时无效?

我使用以下代码获取“InvalidOperationException> Message = Sequence不包含匹配元素”:

private void buttonLoadHTML_Click(object sender, EventArgs e) { GetParagraphsListFromHtml(@"C:\PlatypiRUs\fitt.html"); } // This code adapted from Kirk Woll's answer at http://stackoverflow.com/questions/4752840/html-agility-pack-c-sharp-paragraph- parsing-problem public List GetParagraphsListFromHtml(string sourceHtml) { var pars = new List(); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(sourceHtml); foreach (var par in doc.DocumentNode .DescendantNodes() .Single(x => x.Id == "body") .DescendantNodes() .Where(x => x.Name == "p")) //.Where(x => x.Name == "h1" || x.Name == "h2" || x.Name == "h3" || x.Name == "hp" || )) <-- This is what I'd really like to do, but I don't know if this is possible or, if it is, if the syntax is correct { pars.Add(par.InnerText); } // test foreach (string s in pars) { MessageBox.Show(s); } return pars; } 

为什么代码没有找到段落?

我真的想找到所有文本(h1..3或更高的值),但这是一个开始。

BTW:我正在测试的html文件确实有一些段落元素。

UPDATE

为了回应Amy的隐含请求,并且为了完全公开/终极照明,这里是整个测试html文件:

  body { background-color: orange; font-family: Verdana, sans-serif; } h1 { color: Blue; font-family: 'Segoe UI', Verdana, sans-serif; } h2 { color: white; font-family: 'Palatino Linotype', 'Palatino', sans-serif; } h3 { display: inline-block; }  

Found in the Translation

Bilingual Editions of Classic Literature

Around the World in 80 Days by Jules Verne (French & English Side by Side)

Paperback

Kindle

Gulliver's Travels by Jonathan Swift (English & French Side by Side)

Paperback

Kindle

Journey to the Center of the Earth by Jules Verne (French & English Side by Side)

Paperback

Kindle

Treasure Island by Robert Louis Stevenson (English & Finnish Side by Side)

Paperback

Kindle

Robinson Crusoe by Daniel Defoe (English & French Side by Side)

Paperback

Kindle

Don Quixote by Miguel de Cervantes Saavedra (Spanish & English Side by Side)

Paperback


Volume I

Volume II

Volume III


Kindle


Volume I

Volume II

Volume III


Alice's Adventures in Wonderland by Lewis Carroll (English & German Side by Side)

Coming soon; for now, see:


Paperback

Kindle

Alice's Adventures in Wonderland by Lewis Carroll (English & Italian Side by Side)

Coming soon; for now, see:


Paperback

Kindle

Other Sites:

USA Map-O-Rama

Award-winning Movies, Books, and Music

Garrapata State Park in Big Sur Throughout the Seasons

更新2

这有效(虽然它是“实时”网页,而不是保存到磁盘的html文件):

 public List GetParagraphsListFromHtml(string sourceHtml) { var pars = new List(); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(sourceHtml); var getHtmlWeb = new HtmlWeb(); var document = getHtmlWeb.Load("http://www.montereycountyweekly.com/opinion/letters/article_e333a222-942d-11e3-ba9c-001a4bcf6878.html"); //http://www.bigsurgarrapata.com/ only returned one paragraph // http://usamaporama.azurewebsites.net/ <-- none // http://www.awardwinnersonly.com/ <- same as bigsurgarrapata var pTags = document.DocumentNode.SelectNodes("//p"); int counter = 1; if (pTags != null) { foreach (var pTag in pTags) { pars.Add(pTag.InnerText); MessageBox.Show(pTag.InnerText); counter++; } } MessageBox.Show("done!"); return pars; } 

事实certificate这很简单; 这还不完整,但是这个受到这个答案的启发,足以让我们开始:

 HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); // There are various options, set as needed htmlDoc.OptionFixNestedTags = true; htmlDoc.Load(@"C:\Platypus\dplatypus.htm"); if (htmlDoc.DocumentNode != null) { IEnumerable textNodes = htmlDoc.DocumentNode.SelectNodes("//text()"); foreach (HtmlNode node in textNodes) { if (!string.IsNullOrWhiteSpace(node.InnerText)) { MessageBox.Show(node.InnerText); } } }