Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
782 views
in Technique[技术] by (71.8m points)

Google Sheets importXML Returns Empty Value

Im trying to scrape this website (https://kamadan.gwtoolbox.com/) with google sheets for material costs for a game that I play. There are two tables; "Common Materials" and "Rare Materials" in a drop down in the top right corner. I am trying to pull the values for both as the prices update. I copied the full Xpath and used the function below in an empty cell on a sheet.

=importxml("https://kamadan.gwtoolbox.com/","/html/body/div[2]/div[1]/div/div[2]/table/tbody")

This returns a #N/A error saying it is returning an empty value.

I also tried it with the regular xpath...

=importxml("https://kamadan.gwtoolbox.com/","//*[@id='trader-overlay-items']")

Which just returns a blank cell. I have also tried both methods using the inspect function through chrome on the ancestors and children they return either of the two errors above.

Sorry if this is a really easy one. I am not familiar at all with Xpaths or html. I mostly dabble in VBA in excel.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Answer:

IMPORTXML can not retrieve data which is populated by a script, and so using this formula to retrieve data from this table is not possible to do.

More Information:

As you've already mentioned, you can attempt to get the data directly from the table using:

=IMPORTXML("https://kamadan.gwtoolbox.com/","//table[@id='trader-overlay-items']")

Which just gets a blank cell.

I went a step further and tried to reverse-engineer this by calling IMPORTXML on the HTML elements on the page in steps:

=IMPORTXML("https://kamadan.gwtoolbox.com/","html")
=IMPORTXML("https://kamadan.gwtoolbox.com/","html/body")
=IMPORTXML("https://kamadan.gwtoolbox.com/","html/body/div[1]")
=IMPORTXML("https://kamadan.gwtoolbox.com/","html/body/div[1]/div[0]")
...

html/body/div[1]/div[0] is the first path which gives no imported content, and we can see from importing html/body that the full body does not contain the imformation and only a template of it - in cell B1 we have references to 'Common materials' and 'Rare materials':

enter image description here

And in D1 we start to see JavaScript and JSON objects which are not called by IMPORTXML and so the results of which can not be retrieved:

enter image description here

As you can see if you disable JavaScript on the site, almost nothing is actually rendered and so can't be obtained using IMPORTXML:

enter image description here

I know this is generally bad news, but I hope this is helpful to you!

References:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...