使用BrowserSession和HtmlAgilityPack通过.NET登录Facebook

我正在尝试使用Rohit Agarwal的BrowserSession课程和HtmlAgilityPack登录并随后浏览Facebook。

我之前通过编写自己的HttpWebRequest来管理同样的事情。 但是,它只适用于我从浏览器手动获取cookie并在每次进行新的“会话”时向请求中插入新的cookie字符串。 现在我正在尝试使用BrowserSession来获得更智能的导航。

这是当前的代码:

BrowserSession b = new BrowserSession(); b.Get(@"http://www.facebook.com/login.php"); b.FormElements["email"] = "some@email.com"; b.FormElements["pass"] = "xxxxxxxx"; b.FormElements["lsd"] = "qDhIH"; b.FormElements["trynum"] = "1"; b.FormElements["persistent_inputcheckbox"] = "1"; var response = b.Post(@"https://login.facebook.com/login.php?login_attempt=1"); 

以上工作正常。 当我尝试再次使用此BrowserSession来获取另一个页面时出现问题。 我这样做是因为BrowserSession保存了来自最后一个响应的cookie并将它们插入到下一个请求中,因此我不必再手动从我的浏览器中取出的cookiedata。

但是,当我尝试做这样的事情时:

 var profilePage = b.Get(@"https://m.facebook.com/profile.php?id=1111111111"); 

我得到的文件是空的。 对于我做错了什么,我会很感激。

抱歉,我对您提到的HTML敏捷包或BrowserSession类知之甚少。 但我确实尝试了与HtmlUnit相同的场景,它运行得很好。 我正在使用.NET包装器(其源代码可在此处找到并在此处进行了解释),这里是我使用的代码(删除了一些细节以保护无辜者):

 var driver = new HtmlUnitDriver(true); driver.Url = @"http://www.facebook.com/login.php"; var email = driver.FindElement(By.Name("email")); email.SendKeys("some@email.com"); var pass = driver.FindElement(By.Name("pass")); pass.SendKeys("xxxxxxxx"); var inputs = driver.FindElements(By.TagName("input")); var loginButton = (from input in inputs where input.GetAttribute("value").ToLower() == "login" && input.GetAttribute("type").ToLower() == "submit" select input).First(); loginButton.Click(); driver.Url = @"https://m.facebook.com/profile.php?id=1111111111"; Assert.That(driver.Title, Is.StringContaining("Title of page goes here")); 

希望这可以帮助。

如果有人关心,我修复了这个的根本原因。 事实certificate,cookie被保存在REQUEST对象的CookieContainer中,而不是响应对象。 我还添加了下载文件的function(前提是该文件是基于字符串的)。 代码肯定不是线程安全的,但该对象开头时不是线程安全的:

 public class BrowserSession { private bool _isPost; private bool _isDownload; private HtmlDocument _htmlDoc; private string _download; ///  /// System.Net.CookieCollection. Provides a collection container for instances of Cookie class ///  public CookieCollection Cookies { get; set; } ///  /// Provide a key-value-pair collection of form elements ///  public FormElementCollection FormElements { get; set; } ///  /// Makes a HTTP GET request to the given URL ///  public string Get(string url) { _isPost = false; CreateWebRequestObject().Load(url); return _htmlDoc.DocumentNode.InnerHtml; } ///  /// Makes a HTTP POST request to the given URL ///  public string Post(string url) { _isPost = true; CreateWebRequestObject().Load(url, "POST"); return _htmlDoc.DocumentNode.InnerHtml; } public string GetDownload(string url) { _isPost = false; _isDownload = true; CreateWebRequestObject().Load(url); return _download; } ///  /// Creates the HtmlWeb object and initializes all event handlers. ///  private HtmlWeb CreateWebRequestObject() { HtmlWeb web = new HtmlWeb(); web.UseCookies = true; web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest); web.PostResponse = new HtmlWeb.PostResponseHandler(OnAfterResponse); web.PreHandleDocument = new HtmlWeb.PreHandleDocumentHandler(OnPreHandleDocument); return web; } ///  /// Event handler for HtmlWeb.PreRequestHandler. Occurs before an HTTP request is executed. ///  protected bool OnPreRequest(HttpWebRequest request) { AddCookiesTo(request); // Add cookies that were saved from previous requests if (_isPost) AddPostDataTo(request); // We only need to add post data on a POST request return true; } ///  /// Event handler for HtmlWeb.PostResponseHandler. Occurs after a HTTP response is received ///  protected void OnAfterResponse(HttpWebRequest request, HttpWebResponse response) { SaveCookiesFrom(request, response); // Save cookies for subsequent requests if (response != null && _isDownload) { Stream remoteStream = response.GetResponseStream(); var sr = new StreamReader(remoteStream); _download = sr.ReadToEnd(); } } ///  /// Event handler for HtmlWeb.PreHandleDocumentHandler. Occurs before a HTML document is handled ///  protected void OnPreHandleDocument(HtmlDocument document) { SaveHtmlDocument(document); } ///  /// Assembles the Post data and attaches to the request object ///  private void AddPostDataTo(HttpWebRequest request) { string payload = FormElements.AssemblePostPayload(); byte[] buff = Encoding.UTF8.GetBytes(payload.ToCharArray()); request.ContentLength = buff.Length; request.ContentType = "application/x-www-form-urlencoded"; System.IO.Stream reqStream = request.GetRequestStream(); reqStream.Write(buff, 0, buff.Length); } ///  /// Add cookies to the request object ///  private void AddCookiesTo(HttpWebRequest request) { if (Cookies != null && Cookies.Count > 0) { request.CookieContainer.Add(Cookies); } } ///  /// Saves cookies from the response object to the local CookieCollection object ///  private void SaveCookiesFrom(HttpWebRequest request, HttpWebResponse response) { //save the cookies ;) if (request.CookieContainer.Count > 0 || response.Cookies.Count > 0) { if (Cookies == null) { Cookies = new CookieCollection(); } Cookies.Add(request.CookieContainer.GetCookies(request.RequestUri)); Cookies.Add(response.Cookies); } } ///  /// Saves the form elements collection by parsing the HTML document ///  private void SaveHtmlDocument(HtmlDocument document) { _htmlDoc = document; FormElements = new FormElementCollection(_htmlDoc); } } ///  /// Represents a combined list and collection of Form Elements. ///  public class FormElementCollection : Dictionary { ///  /// Constructor. Parses the HtmlDocument to get all form input elements. ///  public FormElementCollection(HtmlDocument htmlDoc) { var inputs = htmlDoc.DocumentNode.Descendants("input"); foreach (var element in inputs) { string name = element.GetAttributeValue("name", "undefined"); string value = element.GetAttributeValue("value", ""); if (!this.ContainsKey(name)) { if (!name.Equals("undefined")) { Add(name, value); } } } } ///  /// Assembles all form elements and values to POST. Also html encodes the values. ///  public string AssemblePostPayload() { StringBuilder sb = new StringBuilder(); foreach (var element in this) { string value = System.Web.HttpUtility.UrlEncode(element.Value); sb.Append("&" + element.Key + "=" + value); } return sb.ToString().Substring(1); } } 

您可能希望使用WatiN(Web应用程序测试在.Net)或Selenium来驱动您的浏览器。 这将有助于确保您不必操纵cookie并进行任何自定义工作以使后续请求工作,因为您正在模拟实际用户。

我有类似的症状 – 登录工作但cookie容器中不存在身份validationcookie,因此未在后续请求中发送。 我发现这是因为Web请求在内部处理Location:标头,在后台重定向到新页面,在此过程中丢失了cookie。 我通过添加:

  request.AllowAutoRedirect = false; // Location header messing up cookie handling! 

…到OnPreRequest()函数。 它现在看起来像这样:

  protected bool OnPreRequest(HttpWebRequest request) { request.AllowAutoRedirect = false; // Location header messing up cookie handling! AddCookiesTo(request); // Add cookies that were saved from previous requests if (_isPost) AddPostDataTo(request); // We only need to add post data on a POST request return true; } 

我希望这可以帮助遇到同样问题的人。

今天我遇到了同样的问题。 我还与Rohit Agarwal的BrowserSession课程以及HtmlAgilityPack一起工作。 经过一整天的试错编程,我发现问题是由于未在后续请求中设置正确的cookie而引起的。 我不能更改初始的BrowserSession代码才能正常工作,但我添加了以下函数并略微修改了SameCookieFrom函数。 最后它对我很好。

添加/修改的function如下:

 class BrowserSession{ private bool _isPost; private HtmlDocument _htmlDoc; public CookieContainer cookiePot; //<- This is the new CookieContainer ... public string Get2(string url) { HtmlWeb web = new HtmlWeb(); web.UseCookies = true; web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest2); web.PostResponse = new HtmlWeb.PostResponseHandler(OnAfterResponse2); HtmlDocument doc = web.Load(url); return doc.DocumentNode.InnerHtml; } public bool OnPreRequest2(HttpWebRequest request) { request.CookieContainer = cookiePot; return true; } protected void OnAfterResponse2(HttpWebRequest request, HttpWebResponse response) { //do nothing } private void SaveCookiesFrom(HttpWebResponse response) { if ((response.Cookies.Count > 0)) { if (Cookies == null) { Cookies = new CookieCollection(); } Cookies.Add(response.Cookies); cookiePot.Add(Cookies); //-> add the Cookies to the cookiePot } } 

它的作用:它基本上保存了来自最初的“后响应”的cookie,并将相同的CookieContainer添加到稍后调用的请求中。 我不完全理解为什么它不能在初始版本中工作,因为它在AddCookiesTo函数中以某种方式相同。 (如果(Cookies!= null && Cookies.Count> 0)request.CookieContainer.Add(Cookies);)无论如何,使用这些添加的function它现在应该可以正常工作。

它可以像这样使用:

 //initial "Login-procedure" BrowserSession b = new BrowserSession(); b.Get("http://www.blablubb/login.php"); b.FormElements["username"] = "yourusername"; b.FormElements["password"] = "yourpass"; string response = b.Post("http://www.blablubb/login.php"); 

所有后续调用应使用:

 response = b.Get2("http://www.blablubb/secondpageyouwannabrowseto"); response = b.Get2("http://www.blablubb/thirdpageyouwannabrowseto"); ... 

我希望它可以帮助许多人面临同样的问题!

你有没有检查过他们的新API? http://developers.facebook.com/docs/authentication/

您可以调用一个简单的URL来获取oauth2.0访问令牌,并将其附加到其余请求中…

 https://graph.facebook.com/oauth/authorize? client_id=...& redirect_uri=http://www.example.com/oauth_redirect 

将redirect_uri更改为您想要的任何URL,它将通过一个名为“access_token”的参数回调。 得到它并进行您想要的任何自动SDK调用。