C#控制台/服务器访问网站

我正在开发一个C#项目,我需要从没有API或Web服务的安全网站获取数据。 我的计划是登录,进入我需要的页面,并解析HTML以获取我需要登录数据库的数据位。 现在我正在使用控制台应用程序进行测试,但最终会将其转换为Azure Service总线应用程序。

为了得到任何东西,你必须登录他们的login.cfm页面,这意味着我需要在页面上加载用户名和密码输入控件,然后单击提交按钮。 然后导航到我需要解析的页面。

由于我没有“浏览器”来解析控件,我试图使用各种C#.NET类来访问页面,设置用户名和密码,然后单击“提交”,但似乎没有任何工作。

我可以看到的任何例子,或者我应该审查的.NET类都是为这类项目设计的?

谢谢!

使用System.Net中的WebClient类

对于会话cookie的持久性,您必须创建自定义WebClient类。

#region webclient with cookies public class WebClientX : WebClient { public CookieContainer cookies = new CookieContainer(); protected override WebRequest GetWebRequest(Uri location) { WebRequest req = base.GetWebRequest(location); if (req is HttpWebRequest) (req as HttpWebRequest).CookieContainer = cookies; return req; } protected override WebResponse GetWebResponse(WebRequest request) { WebResponse res = base.GetWebResponse(request); if (res is HttpWebResponse) cookies.Add((res as HttpWebResponse).Cookies); return res; } } #endregion 

使用像FireBug这样的浏览器插件或Chrome中内置的开发工具来获取提交表单时发送的HTTP POST数据。 使用WebClientX类发送这些POST并解析响应HTML。

当您已经知道格式时 ,解析HTML的最快方法是使用简单的Regex.Match。 因此,您将使用开发工具在浏览器中完成操作,以记录您的POST,URL和HTML内容,然后使用WebClientX执行相同的任务。

好的,所以这里是完整的代码登录到一个页面,然后在登录后从第二页读取。

  class Program { static void Main(string[] args) { string uriString = "http://www.remotesite.com/login.cfm"; // Create a new WebClient instance. WebClientX myWebClient = new WebClientX(); // Create a new NameValueCollection instance to hold some custom parameters to be posted to the URL. NameValueCollection myNameValueCollection = new NameValueCollection(); // Add necessary parameter/value pairs to the name/value container. myNameValueCollection.Add("userid", "myname"); myNameValueCollection.Add("mypassword", "mypassword"); Console.WriteLine("\nUploading to {0} ...", uriString); // 'The Upload(String,NameValueCollection)' implicitly method sets HTTP POST as the request method. byte[] responseArray = myWebClient.UploadValues(uriString, myNameValueCollection); // Decode and display the response. Console.WriteLine("\nResponse received was :\n{0}", Encoding.ASCII.GetString(responseArray)); Console.WriteLine("\n\n\n pausing..."); Console.ReadKey(); // Go to 2nd page on the site to get additional data Stream myStream = myWebClient.OpenRead("https://www.remotesite.com/status_results.cfm?t=8&prog=d"); Console.WriteLine("\nDisplaying Data :\n"); StreamReader sr = new StreamReader(myStream); StringBuilder sb = new StringBuilder(); using (StreamReader reader = new StreamReader(myStream, System.Text.Encoding.UTF8)) { string line; while ((line = reader.ReadLine()) != null) { sb.Append(line + "\r\n"); } } using (StreamWriter outfile = new StreamWriter(@"Logfile1.txt")) { outfile.Write(sb.ToString()); } Console.WriteLine(sb.ToString()); Console.WriteLine("\n\n\n pausing..."); Console.ReadKey(); } } public class WebClientX : WebClient { public CookieContainer cookies = new CookieContainer(); protected override WebRequest GetWebRequest(Uri location) // public override WebRequest GetWebRequest(Uri location) { WebRequest req = base.GetWebRequest(location); if (req is HttpWebRequest) (req as HttpWebRequest).CookieContainer = cookies; return req; } protected override WebResponse GetWebResponse(WebRequest request) { WebResponse res = base.GetWebResponse(request); if (res is HttpWebResponse) cookies.Add((res as HttpWebResponse).Cookies); return res; } }