Vote count:
0
I am using Python 2.7.
I want to open the URL of a website and extract information out of it. The information I am looking for is within the US version of the website (http://ift.tt/1ms3x4f) . Since I am based in Canada, I get automatically redirected to the Canadian version of the website (http://ift.tt/1gVtZ6P). I am looking for a solution to try to avoid this.
If I take any browser (IE, Firefox, Chrome, ...) and navigate to http://ift.tt/1ms3x4f, I will get redirected. The website offers a menu where the visitor can pick the "country-version" of the website he wants to view. Once I select United States, I am no longer redirected to the Canadian version of the website. This is true for any new tab within the browsing session. I suspect this has to do with cookies storage.
I tried to use the following code to prevent the redirect:
import urllib2
class RedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
result = urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
result.status = code
return result
http_error_301 = http_error_303 = http_error_307 = http_error_302
opener = urllib2.build_opener(RedirectHandler())
webpage = opener.open('http://ift.tt/1ms3x4f')
but it didn't seem to work since the only bit of code that can be extracted afterwards is:
<html><head></head><body>‹</body></html>
A solution to my problem would be to use a proxy while scraping the website but I was wondering if there is any way to prevent these kind of redirects using exclusively Python or Python packages.
Aucun commentaire:
Enregistrer un commentaire