3D grphique: Web scraper with DOMDocument

dimanche 5 avril 2015

Web scraper with DOMDocument

Vote count:

0

I'm trying to scrape a web page for content, using file_get_contents to grab the HTML and then using a DOMDocument object. My problem is that I cannot get the appropriate information. I'm not sure if this is because I'm using DOMDocument's methods wrong, or if the (X)HTML in my source is just poor.

In the source, there is an element with and id of 'cards', which has two child divs. I want the first child, which has many child divs, who in turn have an anchor child, as well as a div child. I want the href from the anchor and the nodeValue from the div.

The structure is like this:


<div id="cards">
    <div class="grid">
        <div class="card-wrap">
            <a href="linkValue">
                <img src="..."/>
                <div>nameValue</div>
            </a>
        </div>
        ...
   </div>
   <div id="...">
   </div>
</div>

I've started out with $cards = $dom->getElementById("cards"). I get a DOMText Object, a DOMElement Object, a DOMText Object, a DOMElement Object, and a DOMText Object. I then use $grid = $cards->childNodes->item(1) to get the first DOMElement Object, which is presumably the .grid element. However, when I then iterate through the $grid with:


foreach($grid->childNodes as $item){
    if($item->nodeName == "div"){
        echo $item->nodeName,' | ',$item->nodeValue,'<br>';
    }
}

I end up with a page full of "div | nameValue" where nameValue is the embedded div's nodeValue, and I am unable to locate the anchors to get their href value.

Am I doing something obviously wrong with my DOMDocument, or perhaps there is something more going on here?

3D grphique

dimanche 5 avril 2015

Web scraper with DOMDocument

Vote count:

0

Aucun commentaire:

Enregistrer un commentaire

dimanche 5 avril 2015

Web scraper with DOMDocument

Vote count: 0

Aucun commentaire:

Enregistrer un commentaire

Vote count:

0