I'm undecided whether it's better, performance-wise, to use a very commonly shared column value (like Country
) as partition key for a compound primary key or a rather unique column value (like Last_Name
).
Looking at Cassandra 1.2's documentation about indexes I get this:
"When to use an index:
Cassandra's built-in indexes are best on a table
having many rows that contain the indexed value. The more unique
values that exist in a particular column, the more overhead you will
have, on average, to query and maintain the index. For example,
suppose you had a user table with a billion users and wanted to look
up users by the state they lived in. Many users will share the same
column value for state (such as CA, NY, TX, etc.). This would be a
good candidate for an index."
"When not to use an index:
Do not use an index to query a huge volume of records for a small
number of results. For example, if you create an index on a column
that has many distinct values, a query between the fields will incur
many seeks for very few results. In the table with a billion users,
looking up users by their email address (a value that is typically
unique for each user) instead of by their state, is likely to be very
inefficient. It would probably be more efficient to manually maintain
the table as a form of an index instead of using the Cassandra
built-in index. For columns containing unique data, it is sometimes
fine performance-wise to use an index for convenience, as long as the
query volume to the table having an indexed column is moderate and not
under constant load."
Looking at the examples from CQL's SELECT for
"Querying compound primary keys and sorting results", I see something like a UUID being used as partition key... which would indicate that it's preferable to use something rather unique?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…