Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
260 views
in Technique[技术] by (71.8m points)

sql - PostgreSQL optimize query performance that contains Window function with CTE

Here the column amenity_category and parent_path is JSONB column with value like ["Tv","Air Condition"] and ["20000","20100","203"] respectively. Apart from that other columns are normal varchar and numeric type. I've around 2.5M rows with primary key on id and it is indexed. Basically the initial CTE part is taking time when rp.parent_path match multiple rows.

Sample dataset:

enter image description here

Current query:

WITH CTE AS
(
  SELECT id,
  property_name,
  property_type_category,
  review_score, 
  amenity_category.name, 
  count(*) AS cnt FROM table_name rp, 
  jsonb_array_elements_text(rp.amenity_categories) amenity_category(name)
  WHERE rp.parent_path ? '203' AND number_of_review >= 1
  GROUP BY amenity_category.name,id 
),
CTE2 as
(
  SELECT id, property_name,property_type_category,name,
  ROW_NUMBER() OVER (PARTITION BY property_type_category,
  name ORDER BY review_score DESC),
  COUNT(id) OVER (PARTITION BY property_type_category,
  name ORDER BY name DESC) 
  FROM CTE
)

SELECT id, property_name, property_type_category, name, COUNT 
FROM CTE2
where row_number = 1

Current Output:

enter image description here

So my basic question is is there any other way I can re-write this query or optimize the current query?

question from:https://stackoverflow.com/questions/65540770/postgresql-optimize-query-performance-that-contains-window-function-with-cte

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If it's safe to assume that array elements in amenity_categories are distinct (no duplicate array elements), we can radically simplify to:

SELECT DISTINCT ON (property_type_category, ac.name)
       id, property_name, property_type_category, ac.name
     , COUNT(*) OVER (PARTITION BY property_type_category, ac.name) AS count
FROM   table_name rp, jsonb_array_elements_text(rp.amenity_categories) ac(name)
WHERE  parent_path ? '203'
AND    number_of_review >= 1
ORDER  BY property_type_category, ac.name, review_score DESC;

If review_score can be NULL, make that:

...
ORDER  BY property_type_category, ac.name, review_score DESC NULLS LAST;

This works, because DISTINCT ON is applied as last step (after window functions). See:

parent_path and number_of_review should probably be indexed. Depends on data distribution and selectivity of the WHERE conditions, which you didn't disclose.

About DISTINCT ON:

Assuming id is NOT NULL, count(*) is faster and equivalent to count(id).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...