如何使用Html Agility Pack超时请求

我正在向当前处于脱机状态的远程Web服务器发出请求(故意)。

我想找出超时请求的最佳方法。 基本上,如果请求的运行时间超过“X”毫秒,则退出请求并返回null响应。

目前网络请求只是坐在那里等待响应…..

我怎样才能最好地解决这个问题?

这是一个当前的代码片段

  public JsonpResult About(string HomePageUrl) { Models.Pocos.About about = null; if (HomePageUrl.RemoteFileExists()) { // Using the Html Agility Pack, we want to extract only the // appropriate data from the remote page. HtmlWeb hw = new HtmlWeb(); HtmlDocument doc = hw.Load(HomePageUrl); HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']"); if (node != null) { about = new Models.Pocos.About { html = node.InnerHtml }; } //todo: look into whether this else statement is necessary else { about = null; } } return this.Jsonp(about); } 

通过以下方法检索您的url:

 private static string retrieveData(string url) { // used to build entire input StringBuilder sb = new StringBuilder(); // used on each read operation byte[] buf = new byte[8192]; // prepare the web page we will be asking for HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url); request.Timeout = 10; //10 millisecond // execute the request HttpWebResponse response = (HttpWebResponse) request.GetResponse(); // we will read data via the response stream Stream resStream = response.GetResponseStream(); string tempString = null; int count = 0; do { // fill the buffer with data count = resStream.Read(buf, 0, buf.Length); // make sure we read some data if (count != 0) { // translate from bytes to ASCII text tempString = Encoding.ASCII.GetString(buf, 0, count); // continue building the string sb.Append(tempString); } } while (count > 0); // any more data to read? return sb.ToString(); } 

并使用HTML Agility包并检索html标记,如下所示:

 public static string htmlRetrieveInfo() { string htmlSource = retrieveData("http://example.com/test.html"); HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(htmlSource); if (doc.DocumentNode.SelectSingleNode("//body") != null) { HtmlNode node = doc.DocumentNode.SelectSingleNode("//body"); } return node.InnerHtml; } 

Html Agility Pack是开源的。 这就是为什么你可以修改源自己。 首先将此代码添加到类HtmlWeb

 private int _timeout = 20000; public int Timeout { get { return _timeout; } set { if (_timeout < 1) throw new ArgumentException("Timeout must be greater then zero."); _timeout = value; } } 

然后找到这个方法

 private HttpStatusCode Get(Uri uri, string method, string path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) 

并修改它:

 req = WebRequest.Create(uri) as HttpWebRequest; req.Method = method; req.UserAgent = UserAgent; req.Timeout = Timeout; //add this 

或类似的东西:

 htmlWeb.PreRequest = request => { request.Timeout = 15000; return true; }; 

我不得不对我最初发布的代码做一个小调整

  public JsonpResult About(string HomePageUrl) { Models.Pocos.About about = null; // ************* CHANGE HERE - added "timeout in milliseconds" to RemoteFileExists extension method. if (HomePageUrl.RemoteFileExists(1000)) { // Using the Html Agility Pack, we want to extract only the // appropriate data from the remote page. HtmlWeb hw = new HtmlWeb(); HtmlDocument doc = hw.Load(HomePageUrl); HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='wrapper1-border']"); if (node != null) { about = new Models.Pocos.About { html = node.InnerHtml }; } //todo: look into whether this else statement is necessary else { about = null; } } return this.Jsonp(about); } 

然后我修改了我的RemoteFileExists扩展方法以使其超时

  public static bool RemoteFileExists(this string url, int timeout) { try { //Creating the HttpWebRequest HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest; // ************ ADDED HERE // timeout the request after x milliseconds request.Timeout = timeout; // ************ //Setting the Request method HEAD, you can also use GET too. request.Method = "HEAD"; //Getting the Web Response. HttpWebResponse response = request.GetResponse() as HttpWebResponse; //Returns TRUE if the Status code == 200 return (response.StatusCode == HttpStatusCode.OK); } catch { //Any exception will returns false. return false; } } 

在这种方法中,如果我的超时在RemoteFileExists可以确定标头响应之前RemoteFileExists ,那么我的bool将返回false。

您可以使用标准的HttpWebRequest来获取远程资源并设置Timeout属性。 然后,如果成功使用HTML Agility Pack进行解析,则将生成的HTML提供。