Vote count:
0
How can I do to configure the crawler nutch so that crawl only pages in English?
I set in nutch-site.xml file this setting, but it does not work:
<property>
<name>http.accept.language</name>
<value>en-us,en-gb,en;q=0.7,*;q=0.3</value>
<description>Value of the "Accept-Language" request header field.
This allows selecting non-English language as default one to retrieve.
It is a useful setting for search engines build for certain national group.
</description>
</property>
asked 1 min ago
Nutch crawler: accept only english pages
Aucun commentaire:
Enregistrer un commentaire