3D grphique: Extracting infobox from (German) Wikipedia using Wikimedia API

jeudi 5 février 2015

Extracting infobox from (German) Wikipedia using Wikimedia API

Vote count:

0

I want to extract the information in the infobox from specific Wikipedia sites, mainly countries. Specifically I want to achieve this without scraping the page using Python + BeautifulSoup4 or any other languages + libraries, if possible, but use the official API, because I noticed the CSS tags are different for different Wikipedia sites (different sites as in other languages).

In mediawiki api: how to get infobox from a wikipedia article states that using the following method would work, which is indeed true for the given tital (Scary Monsters and Nice Sprites), but unfortunately doesn't work on the pages I tried (further below).


http://ift.tt/1DKeJl2

However, I suppose Wikimedia changed their infobox template, because when I run the above query all I get is the content, but not the infobox. E.g. running the query on Europäische_Union (European_Union) results (among others) in the following snippet


{{Infobox Europäische Union}}
<!--{{Infobox Staat}} <- Vorlagen-Parameter liegen in [[Spezial:Permanenter Link/108232313]] -->

It works fine for the English version of Wikipedia though.

So the page I want to extract the infobox from would be: http://ift.tt/HkcYMs

And this is the code I'm using:


#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

import lxml.etree
import urllib

title = "Europäische_Union"

params = { "format":"xml", "action":"query", "prop":"revisions", "rvprop":"content", "rvsection":0 }
params["titles"] = "API|%s" % urllib.quote(title.encode("utf8"))
qs = "&".join("%s=%s" % (k, v)  for k, v in params.items())
url = "http://ift.tt/1DKeLJG" % qs
tree = lxml.etree.parse(urllib.urlopen(url))
revs = tree.xpath('//rev')

print revs[-1].text

Am I missing something very substantial?

asked 20 secs ago

user3607973

62

Extracting infobox from (German) Wikipedia using Wikimedia API

3D grphique

jeudi 5 février 2015

Extracting infobox from (German) Wikipedia using Wikimedia API

Vote count:

0

Aucun commentaire:

Enregistrer un commentaire

jeudi 5 février 2015

Extracting infobox from (German) Wikipedia using Wikimedia API

Vote count: 0

Aucun commentaire:

Enregistrer un commentaire

Vote count:

0