How to Limit the Crawl Rate of Baidu Spider
Using the ngx_http_limit_req_module of nginx
To limit the crawling frequency of Baidu spider, you can use the ngx_http_limit_req_module of nginx. This module can help control the spider's access frequency, thereby reducing the server's load.
Configuring nginx
Some settings need to be done in the nginx configuration file to limit the crawling frequency of Baidu spider. Add the following content in the global configuration:
limit_req_zone $anti_spider zone=anti_spider:60m rate=200r/m;
Add the following configuration in a specific server:
limit_req zone=anti_spider burst=5 nodelay; if ($http_user_agent ~* "baiduspider") {set $anti_spider $http_user_agent;}
Parameter explanation
In the above configuration, some parameters need explanation:
rate=200r/m:Indicates that only 200 requests can be processed per minute.
burst=5:Indicates a maximum concurrency of 5, meaning only 5 requests can be processed simultaneously.
nodelay:Indicates that when the maximum concurrency is reached, a 503 error is returned directly.
IF section:Used to determine if the request is from Baidu spider. If it is, the variable $anti_spider is assigned to limit its access.
By configuring the above, you can effectively limit the crawling frequency of Baidu spiders, avoiding excessive pressure on the server.