Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
157 views
in Technique[技术] by (71.8m points)

SQL Query to retrieve values inside tags(child nodes) from a XML data in SnowFlake

I have a XML(edited) File with the following data:

<dept dept_id="1" dept_name="Marketing">
     <progress>1</progress>
    <employee empname="a">
         <end> 1 </end>
        <address addr1="123 abc">
        ...
        </address>
    </employee>
</dept>
<dept dept_id="2" dept_name="Sales">
    <progress>1</progress>
    <employee empname="b">
        <end> 1 </end>
        <address addr1="456 cde">
        ...
        </address>
    </employee>
</dept>

I put this file in AWS S3 and then used 'copy into' to transfer this data into a external table in snowflake in a variant column. Like Below:

  copy into DB.AWS_S3_STAGE_SCHEMA.test_XML_copy
  from @AWS_S3_LANDING/websiteXML/Test_xml.xml
  FILE_FORMAT = ( TYPE =  XML STRIP_OUTER_ELEMENT = TRUE  )  ;

Now, I can query the data from this table and retrieve data like 'marketing' and 'sales', which are inside tags using the following query(I got the syntax from snowflake docs and used it. ):

SELECT
    GET(xmldata, '@dept_id')::integer as dept_id,
    GET(xmldata, '@dept_name')::string as dept_name
FROM test_XML_copy;

But, I cannot query the data inside tags which are in child nodes. For example: I need data like 'a' and '123 abc'. If anybody can help me in this query, that would be appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use a combination of XMLGET and GET functions to traverse nested objects in an XML document.

The former helps fetch an entire tag object found below the current tag, while the latter allows querying attributes and regular values within the current tag.

SELECT
    -- <dept> (root)
    GET(xmldata, '@dept_id')::integer as dept_id,
    GET(xmldata, '@dept_name')::string as dept_name,
    -- <dept>.<employee>
    GET(XMLGET(xmldata, 'employee'), '@empname')::string as employee_name,
    -- <dept>.<employee>.<address>
    GET(XMLGET(XMLGET(xmldata, 'employee'), 'address'), '@addr1')::string as address_1
FROM test_XML_copy;

Which should yield:

+---------+-----------+---------------+-----------+                             
| DEPT_ID | DEPT_NAME | EMPLOYEE_NAME | ADDRESS_1 |
|---------+-----------+---------------+-----------|
|       1 | Marketing | a             | 123 abc   |
|       2 | Sales     | b             | 456 def   |
| ...     | ...       | ...           | ...       |
+---------+-----------+---------------+-----------+

Your example data shows no repetition of child tags, but if they do repeat (such as multiple employee in each dept) then FLATTEN can first be used to produce one employee per row and the above approach can be reapplied. Alternatively, if it is a fixed form of tag structure and they are always ordered, you can use the instance number in XMLGET to point to each one (implicit default is 0, the first object).

An example that explodes the document into one row per employee and per address inner tags:

SELECT
xmldata:"@dept_id"::integer as dept_id,
xmldata:"@dept_name"::string as dept_name,
emp.value:"@empname"::string as employee_name,
addr.value:"@addr1"::string as address_1
FROM
test_XML_copy,
LATERAL FLATTEN(xmldata:"$") emp,
LATERAL FLATTEN(emp.value:"$") addr
WHERE emp.value:"@" = 'employee' AND addr.value:"@" = 'address';

(This yields results similar to the above, for the example provided in OP's question)

Note: You can also use the path syntax with $ and @ characters to navigate the structure instead of nesting functions, but these rely on strict ordering of the input data structure:

-- See outer / inner structures in 'JSON' form
SELECT
    xmldata:"@" dept_tag,
    xmldata:"$" dept_tag_contents
FROM test_XML_copy;

-- Sample equivalent query using path expressions, relying on ordering:
SELECT
    xmldata:"@dept_id",
    xmldata:"@dept_name",
    xmldata:"$"[1]."@empname",
    xmldata:"$"[1]."$"[1]."@addr1"
FROM test_XML_copy;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...