Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
302 views
in Technique[技术] by (71.8m points)

hadoop - Hive中的序列号UDF(Sequence Number UDF in Hive)

i have tried this UDF in hive : UDFRowSequence .

(我已经在蜂巢中尝试了这个UDF: UDFRowSequence 。)
But its not generating unique value ie it is repeating the sequence depending on mappers.

(但是它没有产生唯一的值,即它根据映射器重复序列。)
Suppose i have one file (Having 4 records) availble at HDFS .it will create one mapper for this job and result will be like

(假设我在HDFS上有一个文件(具有4个记录),它将为此工作创建一个映射器,结果将是)
1

(1个)
2

(2)
3

(3)
4

(4)
but when there are multiple file (large size) at HDFS Location , Multiple mapper will get created for that job and for each mapper repetitive sequence number will get generated like below

(但是,当HDFS位置有多个文件(大文件)时,将为该作业创建多个映射器,并且将为每个映射器生成重复序列号,如下所示)
1

(1个)
2

(2)
3

(3)
4

(4)
1

(1个)
2

(2)
3

(3)
4

(4)
1

(1个)
2

(2)
.

(。)

Is there any solution for this so that unique number should be generated for each record

(有什么解决办法,以便为每个记录生成唯一的编号)

  ask by Elvish_Blade translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think you are looking for ROW_NUMBER() .

(我认为您正在寻找ROW_NUMBER() 。)

You can read about it and other "windowing" functions here .

(您可以在此处阅读有关它以及其他“窗口”功能的信息 。)

Example:

(例:)

SELECT *, ROW_NUMBER() OVER ()
FROM some_database.some_table

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...