php - Simple HTML DOM:Cannot fetch ant pagination

Question

Welcome To Ask or Share your Answers For Others

php - Simple HTML DOM:Cannot fetch ant pagination

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

php - Simple HTML DOM:Cannot fetch ant pagination

I am trying to scrape a website to estimate number of products against provided keyword. to accomplish this task instead of scrolling through each page and count the products manually, all that I want to do is to find the last page displayed in ant-pagination as ant-pagination-item and multiply it with total number of products on one page to get an estimated number of products. I have written it using simple_html_dom.php this is what my code looks like.

<?php
    require_once('simple_html_dom.php');
    $query = $_POST['q'];
    $url = "https://www.daraz.pk/catalog/?q=".$query;
    $html = file_get_html($url);

    if (!empty($html)) {
        $pages = $html->find("li.ant-pagination-item",-1);  
        $pages = html_entity_decode($pages->plaintext);
    }

    else {
        echo "Something went wrong";
    }

    echo "<div>";
    if (isset($pages)){
        echo "FOUND $pages";
    }
    echo "</div>";

I am passing a query through form and appending it to $url the problem is when the code runs it only shows FOUND which means $pages is set. I tried to check if this only happens with pagination or anything else too and found out that only pagination is showing such behavior and I can't figure out a single way to solve it. It would be a great help if someone could help me understanding the mistake I am making. you could try visiting This Link and there will be 102 pages but this script wont return FOUND 102 but only FOUND is printed.

question from:https://stackoverflow.com/questions/65909632/simple-html-domcannot-fetch-ant-pagination

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:12:30+0000

This is not possible due to dynamic behavior of page. since the scraper's library is in php and it only extracts the content of pages as soon as it is loaded, ignoring all the dynamic content, dependent on JavaScript events. Solution to this problem is to use SimpleHTMLDOM with CasperJS and PhantomJs. Casper will allow to use delays while the page finishes loading making it easier to scrape dynamically loaded components.

Categories

php - Simple HTML DOM:Cannot fetch ant pagination

php - Simple HTML DOM:Cannot fetch ant pagination

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags