Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
502 views
in Technique[技术] by (71.8m points)

U-SQL Output in Azure Data Lake

Would it be possible to automatically split a table into several files based on column values if I don't know how many different key values the table contains? Is it possible to put the key value into the filename?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is our top ask (and has been previously asked on stackoverflow too :). We are currently working on it and hopefully have it available by summer.

Until then you have to write a script generator. I tend to use U-SQL to generate the script but you could do it with Powershell or T4 etc.

Here is an example:

Let's assume you want to write files for the column name in the following table/rowset @x:

name | value1 | value2
-----+--------+-------
A    | 10     | 20
A    | 11     | 21
B    | 10     | 30
B    | 100    | 200

You would write a script to generate the script like the following:

@x = SELECT * FROM (VALUES( "A", 10, 20), ("A", 11, 21), ("B", 10, 30), ("B", 100, 200)) AS T(name, value1, value2);

// Generate the script to do partitioned output based on name column:

@stmts = 
  SELECT "OUTPUT (SELECT value1, value2 FROM @x WHERE name == ""+name+"") TO "/output/"+name+".csv" USING Outputters.Csv();" AS output 
  FROM (SELECT DISTINCT name FROM @x) AS x;

OUTPUT @stmts TO "/output/genscript.usql" 
USING Outputters.Text(delimiter:' ', quoting:false);

Then you take genscript.usql, prepend the calculation of @x and submit it to get the data partitioned into the two files.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...