vendredi 17 octobre 2014

Nutch crawler: accept only english pages


Vote count:

0




How can I do to configure the crawler nutch so that crawl only pages in English?


I set in nutch-site.xml file this setting, but it does not work:



<property>
<name>http.accept.language</name>
<value>en-us,en-gb,en;q=0.7,*;q=0.3</value>
<description>Value of the "Accept-Language" request header field.
This allows selecting non-English language as default one to retrieve.
It is a useful setting for search engines build for certain national group.
</description>
</property>


asked 1 min ago







Nutch crawler: accept only english pages

Aucun commentaire:

Enregistrer un commentaire