If you are working on a large data set and are okay with a pretty good approximation, I highly recommend using the command:
nodetool --host <hostname> cfstats
This will dump out a list for each column family looking like this:
Column Family: widgets
SSTable count: 11
Space used (live): 4295810363
Space used (total): 4295810363
Number of Keys (estimate): 9709824
Memtable Columns Count: 99008
Memtable Data Size: 150297312
Memtable Switch Count: 434
Read Count: 9716802
Read Latency: 0.036 ms.
Write Count: 9716806
Write Latency: 0.024 ms.
Pending Tasks: 0
Bloom Filter False Postives: 10428
Bloom Filter False Ratio: 1.00000
Bloom Filter Space Used: 18216448
Compacted row minimum size: 771
Compacted row maximum size: 263210
Compacted row mean size: 1634
The "Number of Keys (estimate)" row is a good guess across the cluster and the performance is a lot faster than explicit count approaches.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…