sql - PostgreSQL optimize query performance that contains Window function with CTE

Question

Welcome To Ask or Share your Answers For Others

sql - PostgreSQL optimize query performance that contains Window function with CTE

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

sql - PostgreSQL optimize query performance that contains Window function with CTE

Here the column amenity_category and parent_path is JSONB column with value like ["Tv","Air Condition"] and ["20000","20100","203"] respectively. Apart from that other columns are normal varchar and numeric type. I've around 2.5M rows with primary key on id and it is indexed. Basically the initial CTE part is taking time when rp.parent_path match multiple rows.

Sample dataset:

Current query:

WITH CTE AS
(
  SELECT id,
  property_name,
  property_type_category,
  review_score, 
  amenity_category.name, 
  count(*) AS cnt FROM table_name rp, 
  jsonb_array_elements_text(rp.amenity_categories) amenity_category(name)
  WHERE rp.parent_path ? '203' AND number_of_review >= 1
  GROUP BY amenity_category.name,id 
),
CTE2 as
(
  SELECT id, property_name,property_type_category,name,
  ROW_NUMBER() OVER (PARTITION BY property_type_category,
  name ORDER BY review_score DESC),
  COUNT(id) OVER (PARTITION BY property_type_category,
  name ORDER BY name DESC) 
  FROM CTE
)

SELECT id, property_name, property_type_category, name, COUNT 
FROM CTE2
where row_number = 1

Current Output:

So my basic question is is there any other way I can re-write this query or optimize the current query?

question from:https://stackoverflow.com/questions/65540770/postgresql-optimize-query-performance-that-contains-window-function-with-cte

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:52:47+0000

If it's safe to assume that array elements in amenity_categories are distinct (no duplicate array elements), we can radically simplify to:

SELECT DISTINCT ON (property_type_category, ac.name)
       id, property_name, property_type_category, ac.name
     , COUNT(*) OVER (PARTITION BY property_type_category, ac.name) AS count
FROM   table_name rp, jsonb_array_elements_text(rp.amenity_categories) ac(name)
WHERE  parent_path ? '203'
AND    number_of_review >= 1
ORDER  BY property_type_category, ac.name, review_score DESC;

If review_score can be NULL, make that:

...
ORDER  BY property_type_category, ac.name, review_score DESC NULLS LAST;

This works, because DISTINCT ON is applied as last step (after window functions). See:

parent_path and number_of_review should probably be indexed. Depends on data distribution and selectivity of the WHERE conditions, which you didn't disclose.

About DISTINCT ON:

Select first row in each GROUP BY group?

Assuming id is NOT NULL, count(*) is faster and equivalent to count(id).

Categories

sql - PostgreSQL optimize query performance that contains Window function with CTE

sql - PostgreSQL optimize query performance that contains Window function with CTE

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags