HtmlAgilityPack WebGet.Load给出错误“对象引用未设置为对象的实例”

我正在开展一个关于从经销商网站获取新车价格的项目。 我可以获取大多数网站的HTML。 但是当我尝试加载其中一个时,WebGet.Load(url)方法将Object reference not set to an instance of an object. 错误。 我发现这些网站之间没有任何差异。

正常工作url示例:

http://www.renault.com.tr/page.aspx?id=1715

http://www.hyundai.com.tr/tr/Content.aspx?id=fiyatlistesi

网站有问题:

 http://www.fiat.com.tr/Pages/tr/otomobiller/grandepunto_fiyat.aspx 

谢谢您的帮助。

 var webGet = new HtmlWeb(); var document = webGet.Load("http://www.fiat.com.tr/Pages/tr/otomobiller/grandepunto_fiyat.aspx"); 

当我使用这个url文件时没有加载。

实际问题出在HtmlAgilityPack内部。 页面无法使用此元内容类型: 其中charset=8859-9似乎是不正确的。 HAL内部尝试通过使用类似Encoding.GetEncoding("8859-9")类的东西来获得该字符串的适当编码,这会引发错误(我认为实际编码应该是iso-8859-9 )。

实际上你只需要告诉HAL不要读取HtmlDocument编码(只是HtmlDocument.OptionReadEncoding = true ),但这似乎是不可能的HtmlWeb.Load (设置HtmlWeb.AutoDetectEncoding在这里不起作用)。 因此,解决方法可能是手动读取url(最简单的方法):

 var document = new HtmlDocument(); document.OptionReadEncoding = false; var url = new Uri("http://www.fiat.com.tr/Pages/tr/otomobiller/grandepunto_fiyat.aspx"); var request = (HttpWebRequest)WebRequest.Create(url); request.Method = "GET"; using (var response = (HttpWebResponse)request.GetResponse()) { using (var stream = response.GetResponseStream()) { document.Load(stream, Encoding.GetEncoding("iso-8859-9")); } } 

这样做,并成功解析页面。

编辑: @:Simon Mourier:是的,它引发了NullReferenceException因为它捕获了ArgumentException并在那里设置了_declaredencoding = null 。 然后_declaredencoding.WindowsCodePage行抛出空引用。

这是来自HtmlDocument.cs, ReadDocumentEncoding方法的代码块:

 try { _declaredencoding = Encoding.GetEncoding(charset); } catch (ArgumentException) { _declaredencoding = null; } if (_onlyDetectEncoding) { throw new EncodingFoundException(_declaredencoding); } if (_streamencoding != null) { if (_declaredencoding.WindowsCodePage != _streamencoding.WindowsCodePage) { AddError( HtmlParseErrorCode.CharsetMismatch, _line, _lineposition, _index, node.OuterHtml, "Encoding mismatch between StreamEncoding: " + _streamencoding.WebName + " and DeclaredEncoding: " + _declaredencoding.WebName); } } 

这是我的堆栈跟踪:

 System.NullReferenceException was unhandled Message=Object reference not set to an instance of an object. Source=HtmlAgilityPack StackTrace: at HtmlAgilityPack.HtmlDocument.ReadDocumentEncoding(HtmlNode node) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1916 at HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32 index, Boolean close) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1805 at HtmlAgilityPack.HtmlDocument.Parse() in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1468 at HtmlAgilityPack.HtmlDocument.Load(TextReader reader) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 769 at HtmlAgilityPack.HtmlDocument.Load(Stream stream, Boolean detectEncodingFromByteOrderMarks) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 597 at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1515 at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563 at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1152 at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107 at test.console.Program.Main(String[] args) in W:\Projects\Me\test.console\test.console\Program.cs:line 54 at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args) at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args) at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly() at System.Threading.ThreadHelper.ThreadStart_Context(Object state) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx) at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) at System.Threading.ThreadHelper.ThreadStart() InnerException: