Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
183 views
in Technique[技术] by (71.8m points)

php - Simple HTML DOM:Cannot fetch ant pagination

I am trying to scrape a website to estimate number of products against provided keyword. to accomplish this task instead of scrolling through each page and count the products manually, all that I want to do is to find the last page displayed in ant-pagination as ant-pagination-item and multiply it with total number of products on one page to get an estimated number of products. I have written it using simple_html_dom.php this is what my code looks like.

<?php
    require_once('simple_html_dom.php');
    $query = $_POST['q'];
    $url = "https://www.daraz.pk/catalog/?q=".$query;
    $html = file_get_html($url);

    if (!empty($html)) {
        $pages = $html->find("li.ant-pagination-item",-1);  
        $pages = html_entity_decode($pages->plaintext);
    }

    else {
        echo "Something went wrong";
    }

    echo "<div>";
    if (isset($pages)){
        echo "FOUND $pages";
    }
    echo "</div>";

I am passing a query through form and appending it to $url the problem is when the code runs it only shows FOUND which means $pages is set. I tried to check if this only happens with pagination or anything else too and found out that only pagination is showing such behavior and I can't figure out a single way to solve it. It would be a great help if someone could help me understanding the mistake I am making. you could try visiting This Link and there will be 102 pages but this script wont return FOUND 102 but only FOUND is printed.

question from:https://stackoverflow.com/questions/65909632/simple-html-domcannot-fetch-ant-pagination

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is not possible due to dynamic behavior of page. since the scraper's library is in php and it only extracts the content of pages as soon as it is loaded, ignoring all the dynamic content, dependent on JavaScript events. Solution to this problem is to use SimpleHTMLDOM with CasperJS and PhantomJs. Casper will allow to use delays while the page finishes loading making it easier to scrape dynamically loaded components.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...