nginx屏蔽网站首页
功能:屏蔽网站首页,不屏蔽二级页面,允许爬虫访问首页。
location = / {
default_type text/plain;
charset utf-8;
if ( $http_user_agent !~ "googlebot|baiduspider|slurp|ia_archiver|msnbot|
bingbot|scooter|webcrawler|slurp|yodaobot|sogou|soso|360|spider|bot|crawler") {
return 403;
}
try_files $uri $uri/ /index.php?$query_string;
}
location **** {
******
*****
}
如果网站有CDN,设置首页缓存为0天。否则会导致有爬虫请求首页后,用户请求如果走CDN,会访问到首页内容。(阿里云DNS地区解析回源不能屏蔽阿里云、腾讯云或其他云的公网IP(这些IP不归属于电信、移动、联通等))
iis 屏蔽网站首页
功能:屏蔽网站首页,不屏蔽二级页面,允许爬虫访问首页
在web.config 里 对应的位置配置如下
<rule name="index_rewrite" enabled="false" stopProcessing="true">
<match url="^$" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{HTTP_USER_AGENT}"pattern="googlebot|baiduspider|slurp|ia_archiver|msnbot|
bingbot|scooter|webcrawler|slurp|yodaobot|sogou|soso|360|spider|bot|
crawler" negate="true" />
Comments | NOTHING