如何用HtmlAgilityPack解析标签的InnerText?

语境:

我试图在这里解析这个页面中的“城市”。 我已经设法模拟了这个combobox的数据请求,这是一个Ajax调用。

小提琴请求:

POST http://www.telelistas.net/AjaxHandler.ashx HTTP/1.1 Host: www.telelistas.net Connection: keep-alive Content-Length: 106 Origin: http://www.telelistas.net X-Requested-With: XMLHttpRequest User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 Content-Type: application/x-www-form-urlencoded; charset=UTF-8 Accept: */* Referer: http://www.telelistas.net/ Accept-Encoding: gzip,deflate,sdch Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie: cert_Origin=directo; email=bdc.testes@gmail.com; auto=automatico=0; searchparameters=bottom=0&btnsite=0&email=&uf=rj&origem=0&nome=&pagina=1&codlogradouro=&predio=213&tiquete=0&localidadeendmap=&codbairro=0&pcount=25&estacionamento=0&letra=&top=&entrega=0&pchave=&info=&logradouro=rua+da+lapa&codtitulo=-1&chave=&zoom=&comercial=0&ddd=0&comib=0&btnemail=0&pgresultado=&localidade=&telefone=&manobrista=0&codlocalidade=21000&site=&cartoes=0&atividade=&bairro=&reserva=0&residencial=0; perfil=logged=1&iduser=2563063&email=bdc.testes@gmail.com&usertype=2&specialsearch=3&siteusernome=BigDataCorp&siteuserdatanasc=15/01/1988&siteusersexo=M&siteuserlocalidade=21000&siteuseruf=RJ&siteuserddd=21&siteusertelefone=94118439&siteuserprofissao=4&siteuserrenda=5000&siteuserformacao=4&siteusernovidades=0&siteusernovidadesrevista=&siteusernovidadesparceiros=0&siteusercpf=10541308769&siteuseracesso=brasil&siteusercep=22631000&siteuseridade=24&siteuserparceiro=telelistas&siteuserconhecimento=2&siteuseroperadora=oi&siteuserurlorigem=http://www.telelistas.net/&siteuserdatacadastro=13/12/2012 11:45:00; __utma=70879631.392027796.1355939587.1356014801.1356021821.5; __utmb=70879631.1.10.1356021821; __utmc=70879631; __utmz=70879631.1355939587.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none) PostData : state=rj&style=busca_interna&selectedCity=21000&clientId=pch_localidade_select&method=GetSearchCitiesNamed 

问题:

以下是此请求返回的字符串片段:

 SelecioneRio de JaneiroAbraãoAfonso ArinosAgência LuterbackAgriões de Dentro 

我想要做的是达到Option标签的InnerText (“里约热内卢”,“Abraao”……),但由于一些奇怪的原因,对于找到的每个选项节点, InnerText总是为空的。

有一些代码片段失败了:

  // Iterating over nodes to build the dictionary foreach (HtmlNode city in citiesNodes) { string key = city.InnerText; string value = city.Attributes["value"].Value; citiesHash.AddCity (key,value); } 

技术到位:

我正在使用HtmlAgilityPack ,它支持用于节点选择的XPath语法,C#代码和用于WebDebugging的Fiddler2。

提前致谢

只需使用HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("option"); 在加载html之前

 HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("option"); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(html); var options = doc.DocumentNode.Descendants("option").Skip(1) .Select(n => new { Value = n.Attributes["value"].Value, Text = n.InnerText }) .ToList(); 

出于一些奇怪的原因,HtmlAgilityPack没有正确处理这些标签,所以这设法解决了我的问题。

  // Iterating over nodes to build the dictionary foreach (HtmlNode city in citiesNodes) { if (city.NextSibling != null) { string key = city.NextSibling.InnerText; string value = city.Attributes["value"].Value; citiesHash.AddCity (key,value); } } 

我没有直接到达节点,而是通过使用上一个NextSimbling引用来获取每个节点的值。