sql - Hive - Is there a way to further optimize a HiveQL query?

Question

Welcome To Ask or Share your Answers For Others

sql - Hive - Is there a way to further optimize a HiveQL query?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

sql - Hive - Is there a way to further optimize a HiveQL query?

I have written a query to find 10 most busy airports in the USA from March to April. It produces the desired output however I want to try to further optimize it.

Are there any HiveQL specific optimizations that can be applied to the query? Is GROUPING SETS applicable here? I'm new to Hive and for now this is the shortest query that I've come up with.

SELECT airports.airport, COUNT(Flights.FlightsNum) AS Total_Flights
FROM (
SELECT Origin AS Airport, FlightsNum 
  FROM flights_stats
  WHERE (Cancelled = 0 AND Month IN (3,4))
UNION ALL
SELECT Dest AS Airport, FlightsNum 
  FROM flights_stats
  WHERE (Cancelled = 0 AND Month IN (3,4))
) Flights
INNER JOIN airports ON (Flights.Airport = airports.iata AND airports.country = 'USA')
GROUP BY airports.airport
ORDER BY Total_Flights DESC
LIMIT 10;

The table columns are as following:

Airports

|iata|airport|city|state|country|

Flights_stats

|originAirport|destAirport|FlightsNum|Cancelled|Month|

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:07:20+0000

It might help if you do the aggregation before the union all:

SELECT a.airport, SUM(cnt) AS Total_Flights
FROM ((SELECT Origin AS Airport, COUNT(*) as cnt 
       FROM flights_stats
       WHERE (Cancelled = 0 AND Month IN (3,4))
       GROUP BY Origin
      ) UNION ALL
      (SELECT Dest AS Airport, COUNT(*) as cnt
       FROM flights_stats
       WHERE Cancelled = 0 AND Month IN (3,4)
       GROUP BY Dest
      )
     ) f INNER JOIN
     airports a
     ON f.Airport = a.iata AND a.country = 'USA'
GROUP BY a.airport
ORDER BY Total_Flights DESC
LIMIT 10;

Categories

sql - Hive - Is there a way to further optimize a HiveQL query?

sql - Hive - Is there a way to further optimize a HiveQL query?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags