Usually it is not safe and reliable to blindly follow browser's suggestion about how to locate an element.
First of all, XPath expression that developer tools generate are usually absolute - starting from the the parent of all parents - html
tag, which makes it being more dependant on the page structure (well, firebug
can also make expressions based on id
attributes).
Also, the HTML code you see in the browser can be pretty much different from what Scrapy receives due to asynchronous nature of the website page load and javascript being dynamically executed in the browser. Scrapy
is not a browser and "sees" only the initial HTML code of a page, before the "dynamic" part.
Instead, inspect what Scrapy really has in the response: open up the Scrapy Shell
, inspect the response and debug your XPath expressions and CSS selectors:
$ scrapy shell https://google.com
>>> response.xpath('//div[@id="myid"]')
...
Here is what I've got for the google image search:
$ scrapy shell "https://www.google.com/search?q=test&tbm=isch&qscrl=1"
In [1]: response.xpath('//*[@id="ires"]//img/@src').extract()
Out[1]:
[u'https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcRO9ZkSuDqt0-CRhLrWhHAyeyt41Z5I8WhOhTkGCvjiHmRiTSvDBfHKYjx_',
u'https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQpwyzbW_qsRenDw3d4wwpwwm8n99ukMtLCVaPiTJxyviyQVBQeRCglVaY',
u'https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSrxtoY3-3QHwhjc5Ofx8090uDYI8VOUbi3gUrd9USxZ-Vb1D5pAbOzJLMS',
u'https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcTQO1A3dDJ07tIaFMHlXNOsOnpiY_srvHKJE1xOpsMZscjL3aKGxaGLOgru',
u'https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQ71ukeTGCPLuClWd6MetTtQ0-0mwzo3rn1ug0MUnbpXmKnwNuuBnSWXHU',
u'https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcRZmWrYR9A4W97jpjhtIbyUM5Lj3vRL0vgCKG_xfylc5wKFAk6UB8jiiKA',
...
u'https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcRj08jK8sBjX90Tu1RO4BfZkKe5A59U0g1TpMWPFZlNnA70SQ5i5DMJkvV0']
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…