Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
495 views
in Technique[技术] by (71.8m points)

azure cosmosdb - Does STARTSWITH on Cosmos partition keys optimize "fan-out" of cross-partition queries?

Microsoft makes it clear that cross-partition queries "fan-out" the query to each partition (link):

The following query doesn't have a filter on the partition key (DeviceId). Therefore, it must fan-out to all physical partitions where it is run against each partition's index:

So I am curious if that "fan-out" can be optimized by doing a range query on a partition key, such as STARTSWITH.

To test it, I created a small Cosmos DB with seven documents:

{
    "partitionKey": "prefix1:",
    "id": "item1a"
},
{
    "partitionKey": "prefix1:",
    "id": "item1b"
},
{
    "partitionKey": "prefix1:",
    "id": "item1c"
},
{
    "partitionKey": "prefix1X:",
    "id": "item1d"
},
{
    "partitionKey": "prefix2:",
    "id": "item2a"
},
{
    "partitionKey": "prefix2:",
    "id": "item2b"
},
{
    "partitionKey": "prefix3:",
    "id": "item3a"
}

It has the default indexing policy with partition key "/partitionKey". Then I ran a bunch of queries:

SELECT * FROM c WHERE STARTSWITH(c.partitionKey, 'prefix1')
-- Actual Request Charge: 2.92 RUs
SELECT * FROM c WHERE c.partitionKey = 'prefix1:' OR c.partitionKey = 'prefix1X:'
-- Actual Request Charge: 3.02 RUs
SELECT * FROM c WHERE STARTSWITH(c.partitionKey, 'prefix1:')
SELECT * FROM c WHERE c.partitionKey = 'prefix1:'
-- Each Query Has Actual Request Charge: 2.89 RUs
SELECT * FROM c WHERE STARTSWITH(c.partitionKey, 'prefix2')
SELECT * FROM c WHERE c.partitionKey = 'prefix2:'
-- Each Query Has Actual Request Charge: 2.86 RUs
SELECT * FROM c WHERE STARTSWITH(c.partitionKey, 'prefix3')
SELECT * FROM c WHERE c.partitionKey = 'prefix3:'
-- Each Query Has Actual Request Charge: 2.83 RUs
SELECT * FROM c WHERE c.partitionKey = 'prefix2:' OR c.partitionKey = 'prefix3:'
-- Actual Request Charge: 2.99 RUs

The request charges were consistent when re-running the queries. And the pattern of charge growth seemed consistent with the result set and query complexity, with exception of maybe the 'OR' queries. However, then I tried this:

SELECT * FROM c
-- Actual Request Charge: 2.35 RUs

And the basic fan-out to all partitions is even faster than targeting a specific partition, even with an equality operator. I don't understand how this can be.

All this being said, my sample database is extremely small with only seven documents. The query set is probably not big enough to trust the results.

So, if I had millions of documents, would STARTSWITH(c.partitionKey, 'prefix') be more optimized than fanning out to all partitions?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As you scale, you get fewer "logical partitions" per "physical partition", until eventually each partition key value has its own physical partition.

So:

if I had millions of documents, would STARTSWITH(c.partitionKey, 'prefix') be more optimized than fanning out to all partitions?

Both queries would fan-out across multiple partitions.

And I'm pretty sure that since "Azure Cosmos DB uses hash-based partitioning to spread logical partitions across physical partitions", there's no locality between partition keys with a common prefix, and each STARTSWITH query will have to fan-out across all the physical partitions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...