如何使用HtmlAgilityPack提取完整的URL – C＃

好吧，下面的方式是只提取这样的引用url

提取代码：

foreach (HtmlNode link in hdDoc.DocumentNode.SelectNodes("//a[@href]")) { lsLinks.Add(link.Attributes["href"].Value.ToString()); }

url代码

 Login

提取的url

 https://stackoverflow.com/questions/7760286/how-to-extract-full-url-with-htmlagilitypack-c-sharp/Login.aspx

但我希望得到真正的链接浏览器解析

 http://www.monstermmorpg.com/https://stackoverflow.com/questions/7760286/how-to-extract-full-url-with-htmlagilitypack-c-sharp/Login.aspx

我可以检查url是否包含http，如果没有添加域值，但在某些情况下可能会导致一些问题，我认为这不是一个非常明智的解决方案。

c＃4.0，HtmlAgilityPack.1.4.0

假设您有原始url，您可以将解析后的url组合如下：

 // The address of the page you crawled var baseUrl = new Uri("http://example.com/path/to-page/here.aspx"); // root relative var url = new Uri(baseUrl, "/Login.aspx"); Console.WriteLine (url.AbsoluteUri); // prints 'http://example.com/Logon.aspx' // relative url = new Uri(baseUrl, "../foo.aspx?q=1"); Console.WriteLine (url.AbsoluteUri); // prints 'http://example.com/path/foo.aspx?q=1' // absolute url = new Uri(baseUrl, "http://stackoverflow.com/questions/7760286/"); Console.WriteLine (url.AbsoluteUri); // prints 'http://stackoverflow.com/questions/7760286/' // other... url = new Uri(baseUrl, "javascript:void(0)"); Console.WriteLine (url.AbsoluteUri); // prints 'javascript:void(0)'

注意使用AbsoluteUri而不依赖于ToString()因为ToString对URL进行解码（使其更具“人类可读性”），这通常不是您想要的。

我可以通过检查URL是否包含http以及是否添加域值来实现

这就是你应该做的。 Html Agility Pack对此没有任何帮助：

 var url = new Uri( new Uri(baseUrl).GetLeftPart(UriPartial.Path), link.Attributes["href"].Value) );

如何使用HtmlAgilityPack提取完整的URL – C＃

如何将任务的已取消状态传播到延续任务

无法确定呼叫者的应用程序身份？

将数据绑定到ToolStripComboBox

在.Net中创建EPUB

以LINQ方式初始化Jagged数组

在表达式树中使用可空类型

在foreach循环中使用parallel.foreach和task之间的性能差异是什么？

将OBJECT转换为System.Drawing.Color

在C＃中使用ComImport，如何在shell32.dll中找到像IFileDialog这样的类的GUID？

Console.WriteLine很慢