mercredi 8 février 2017

Beautifulsoup incomplete table parsing in Python

Vote count: 0

I'm trying to scrape a web table using Beautifulsoup and python2.7 I'm trying to fetch all the cells.

The request is ok, but parsing is incomplete. It seems to stop around 1668 cells no matter the real table lenght.

Here is the code :

import os, time, string, operator, requests
from bs4 import BeautifulSoup

url='http://ift.tt/2kn0nDZ'

params ={'selectplane':'Cessna 208 Caravan','submit':''}
response=requests.post(url, data=params)

soup = BeautifulSoup(response.text, "lxml")  #  INCOMPLETE HERE 
table=soup.find('table',  attrs={'class':"tablesorter-header tablesorter-headerUnSorted"})

for table_row in soup.select("table"):
    cells = table_row.findAll('td')

How can I retrive all cells?

Do you know how to retrieve the cells from "bg-success" class only ? (few cells are from "bg-danger" class)

Is it possible to sort the table using soup.find attributes ? here :

table=soup.find('table',  attrs={'class':"tablesorter-header tablesorter-headerUnSorted"})

I'm pretty new to web scraping, any help would be very appreciated

Thank you !

asked 39 secs ago

Let's block ads! (Why?)



Beautifulsoup incomplete table parsing in Python

Aucun commentaire:

Enregistrer un commentaire