Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

hadoop - Place a subquery in the column list (Hive)

My DB - Hadoop, Hive I have tried to run a query with subquery on the select and it did not run. Please find the query below

Thanks for helping

select 
to_date(timestamp) as date,
count(*),
(select count(*) from table1 where to_date(timestamp)=current_date() and arrivaltime IS null) as count_null
FROM table1
WHERE to_date(ingest_timestamp) =current_date()
GROUP BY date, count_null
question from:https://stackoverflow.com/questions/65935897/place-a-subquery-in-the-column-list-hive

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

convert the sub query to a case when. also you can not group by count_null column. if you really need to group by count_null then built a query on top of this query. Below SQL will return total_count, count_null, and count_not_null

select 
to_date(timestamp) as date,
count(*) total_count,
SUM(CASE WHEN arrivaltime IS null then 1 else 0 END) as count_null,
SUM(CASE WHEN arrivaltime IS null then 0 else 1 END) as count_not_null,
SUM(CASE WHEN arrivaltime IS null then 0 else 1 END) - SUM(CASE WHEN arrivaltime IS null then 1 else 0 END)
as diff_not_null_and_null
FROM 
table1 
WHERE to_date(timestamp)=current_date
GROUP BY to_date(timestamp)

Example - Lets assume you have data like this -

Flightid, timestamp, arrivaltime 
Flight1, 2021-01-29,2021-01-29 10:00:00  
Flight2, 2021-01-29,
Flight4, 2021-01-29,2021-01-29 11:00:00 
Flight5, 2021-01-29,2021-01-29 10:30:00 
Flight9, 2021-01-29,

Result will be -

timestamp,  total_count, count_null, count_not_null, diff_not_null_and_null
2021-01-29, 5,           2,          3,              1

In above SQL i calculated three attributes -

  1. SUM(CASE WHEN arrivaltime IS null then 1 else 0 END) - when the arrivaltime is null then its going to assign value of 1 to that row. now if you sum them, you will get total count when arrival time is null. in above example its 2.
  2. SUM(CASE WHEN arrivaltime IS null then 0 else 1 END) - opposite to above logic, we assign 1 when arrivaltime is not null. Then sum will result count of all not null rows. in above example its 3.
  3. then the difference between 2 and 1. in above example its 3-2 =1.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...