Asp.net Request.Browser.Crawler – 动态爬虫列表?

我在C#中了解了为什么Request.Browser.Crawler始终为假( http://www.digcode.com/default.aspx?page=ed51cde3-d979-4daf-afae-fa6192562ea9&article=bc3a7a4f-f53e-4f88-8e9c-c9337f6c05a0 ) 。

有没有人使用某种方法来动态更新Crawler的列表,所以Request.Browser.Crawler会非常有用吗?

我很高兴Ocean的Browsercaps提供的结果。 它支持Microsoft的配置文件没有费力检测的爬虫。 它甚至会解析您网站上的爬虫版本,而不是我真正需要的详细程度。

您可以检查(regex) Request.UserAgent

Peter Bromberg写了一篇关于在ASP.NET中编写ASP.NET请求记录器和Crawler Killer的好文章。

以下是他在Logger类中使用的方法:

 public static bool IsCrawler(HttpRequest request) { // set next line to "bool isCrawler = false; to use this to deny certain bots bool isCrawler = request.Browser.Crawler; // Microsoft doesn't properly detect several crawlers if (!isCrawler) { // put any additional known crawlers in the Regex below // you can also use this list to deny certain bots instead, if desired: // just set bool isCrawler = false; for first line in method // and only have the ones you want to deny in the following Regex list Regex regEx = new Regex("Slurp|slurp|ask|Ask|Teoma|teoma"); isCrawler = regEx.Match(request.UserAgent).Success; } return isCrawler; }