Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
260 views
in Technique[技术] by (71.8m points)

hadoop - Hive join to get the date wise picture

I am trying to join 2 tables in HIVE using a key and date from table B and the corresponding previous closest date record from table A at the time of join. For example: Below are the 2 input tables

<----------TABLE A------------->            <------------TABLE B------------>
A_id    A_date      changed_col             B_id    B_date      B_value A_id
****    ******      ***********             ****    ******      ******* *****   
A01     2017-03-20      ABC                 B01     2017-04-02  200     A01
A01     2017-04-01      XYZ                 B01     2017-04-04  500     A01
A01     2017-04-05      LLL                 B02     2017-04-07  900     A02
A02     2017-04-06      KKK                 B02     2017-04-30  800     A02
A02     2017-04-12      JJJ

However when I JOIN table B with Table A, it should look for the nearest lowest date in table A for the same key(A_id). Below is the expected output table:

B_id    B_date          A_id        A_date      changed_col   B_value
****    ******          ****        ******      ***********   *******
B01     2017-02-04      A01     2017-01-04      XYZ             200 
B01     2017-04-04      A01     2017-01-04      XYZ             500
B02     2017-04-07      A02     2017-04-06      KKK             900
B02     2017-04-30      A02     2017-04-12      JJJ             800

Any help is much appreciated. Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Figured out the query myself

select * from 
(select b.*, a.*,
row_number() over (partition by b.a_id, b.b_date 
order by b.b_date, datediff(to_date(b.b_date), to_date(a.a_date)) ) as rnk
FROM b JOIN a 
ON b.a_id= a.a_id 
WHERE a.a_date <= b.b_date
) j1 where j1.rnk = 1

B_id    B_date          B_value A_id            A_date      changed_col rnk   
B01     2017-04-02      200     A01     A01     2017-04-01      XYZ     1
B01     2017-04-04      500     A01     A01     2017-04-01      XYZ     1
B02     2017-04-07      900     A02     A02     2017-04-06      KKK     1
B02     2017-04-30      800     A02     A02     2017-04-12      JJJ     1

You can select the required columns in line 1. For now I have selected all column from A and B including the rank.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...