HtmlAgilityPack获取Title和meta

我尝试练习“HtmlAgilityPack”，但我遇到了一些问题。这是我编码的内容，但我无法正确获取网页的标题和描述…如果有人可以启发我的错误:)

... public static void Main(string[] args) { string link = null; string str; string answer; int curloc; // holds current location in response string url = "http://stackoverflow.com/"; try { do { HttpWebRequest HttpWReq = (HttpWebRequest)WebRequest.Create(url); HttpWReq.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5"; HttpWebResponse HttpWResp = (HttpWebResponse)HttpWReq.GetResponse(); //url = null; // disallow further use of this URI Stream istrm = HttpWResp.GetResponseStream(); // Wrap the input stream in a StreamReader. StreamReader rdr = new StreamReader(istrm); // Read in the entire page. str = rdr.ReadToEnd(); curloc = 0; //WebPage result; do { // Find the next URI to link to. link = FindLink(str, ref curloc); //return the good link Console.WriteLine("Title found: " + curloc); //title = Title(str, ref curloc); if (link != null) { Console.WriteLine("Link found: " + link); using (System.Net.WebClient client = new System.Net.WebClient()) { HtmlDocument htmlDoc = new HtmlDocument(); var html = client.DownloadString(url); htmlDoc.LoadHtml(link); //chargement de HTMLAgilityPack var htmlElement = htmlDoc.DocumentNode.Element("html"); HtmlNode node = htmlDoc.DocumentNode.SelectSingleNode("//meta[@name='description']"); if (node != null) { string desc = node.GetAttributeValue("content", ""); Console.Write("DESCRIPTION: " + desc); } else { Console.WriteLine("No description"); } var titleElement = htmlDoc.DocumentNode .Element("html") .Element("head") .Element("title"); if (titleElement != null) { string title = titleElement.InnerText; Console.WriteLine("Titre: {0}", title); } else { Console.WriteLine("no Title"); } Console.Write("Done"); } Console.Write("Link, More, Quit?"); answer = Console.ReadLine(); } else { Console.WriteLine("No link found."); break; } } while (link.Length > 0); // Close the Response. HttpWResp.Close(); } while (url != null); } catch{ ...}

提前致谢：）

这样做吧：

 HtmlNode mdnode = htmlDoc.DocumentNode.SelectSingleNode("//meta[@name='description']"); if (mdnode != null) { HtmlAttribute desc; desc = mdnode.Attributes["content"]; string fulldescription = desc.Value; Console.Write("DESCRIPTION: " + fulldescription); }

我认为你的问题在这里：

 htmlDoc.LoadHtml(link); //chargement de HTMLAgilityPack

它应该是：

  htmlDoc.LoadHtml(html); //chargement de HTMLAgilityPack

LoadHtml需要一个包含HTML源代码的字符串，而不是url。

也许你想改变：

 var html = client.DownloadString(url);

至

 var html = client.DownloadString(link);

您是否使用了断点并在行中查看错误可能发生的位置？

如果你有，那么尝试这样的事情：

 string result = string.Empty; HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.google.com"); request.Method = "GET"; try { using (var stream = request.GetResponse().GetResponseStream()) using (var reader = new StreamReader(stream, Encoding.UTF8)) { result = reader.ReadToEnd(); } } HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); htmlDoc.LoadHtml(result);

然后在htmlDoc.LoadHtml下面inheritance你的其余代码

  [HttpPost] public ActionResult Create(WebSite website) { string desc = HtmlAgi(website.Url, "description"); string keyword = HtmlAgi(website.Url, "Keywords"); if (ModelState.IsValid) { var userId = ((CustomPrincipal)User).UserId; r.Create(new WebSite { Description = desc, Tags = keyword, Url = website.Url, UserId = userId, Category = website.Category }); return RedirectToAction("Index"); } return View(website); }

  public string HtmlAgi(string url, string key) { //string.Format var Webget = new HtmlWeb(); var doc = Webget.Load(url); HtmlNode ourNode = doc.DocumentNode.SelectSingleNode(string.Format("//meta[@name='{0}']", key)); if (ourNode != null) { return ourNode.GetAttributeValue("content", ""); } else { return "not fount"; } }

HtmlAgilityPack获取Title和meta

隐式强制转换对委托类型推断的意外影响

设置’引发了’System.Data.SqlClient.SqlException’类型的exception

C＃Generic传递具有相同属性的不同对象

WPF工具包DatePicker仅限月/年

在表中插入数据之前，是否可以获取Id（IDENTITY）的新值？

在C＃中超过100k +字符串的快速动态模糊搜索

Asp.Net MVC：如何获取当前控制器/视图的虚拟URL？

在WPF TextBox上捕获鼠标单击

如何避免内存泄漏？

Thread.IsAlive和Thread.ThreadState == ThreadState.Running