3D grphique: Need to switch messy urllib and beautifulsoup code into requests?

mercredi 4 février 2015

Need to switch messy urllib and beautifulsoup code into requests?

Vote count:

-1

This code is extremely messy and slow. I have tried to understand how to switch the urllib and beautifulsoup stuff into simple requests code with no luck.


    #get the file
    f = urllib2.urlopen("http://hi.com")
    s = str(f.read())
    f.close()

    #regular expression pattern matching everything inside < > tags and double-slashed n
    pattern = r'(<.*?>|\\n)'

    #replaces all instances of the pattern with a newline, then writes it into the file 'refined.txt'
    ff = open('refined.txt', 'w')
    ff.close()


    from bs4 import BeautifulSoup
    soup = BeautifulSoup(s)
    links=[]
    for link in soup.find_all('a'):
        links.append(link.get('href'))
    i=0    
    for element in links:
        #get the file
        f = urllib2.urlopen("http://hi.com/"+element)
        s = str(f.read())
        f.close()

        #regular expression pattern matching everything inside < > tags and double-slashed n
        pattern = r'(<.*?>|\\n)'

        #replaces all instances of the pattern with a newline, then writes it into the file 'refined.txt'
        with open('refined.txt', 'w') as ff:
            ff.write(re.sub(pattern, '\n', s))

        #prints the file line by line
        with open('refined.txt') as of:
            d=of.readlines()

3D grphique

mercredi 4 février 2015

Need to switch messy urllib and beautifulsoup code into requests?

Vote count:

-1

Aucun commentaire:

Enregistrer un commentaire

mercredi 4 février 2015

Need to switch messy urllib and beautifulsoup code into requests?

Vote count: -1

Aucun commentaire:

Enregistrer un commentaire

Vote count:

-1