I have the desire to find the columns that have not been updated for more than a specific time period.
So I want to do a scan against the columns with a timerange.
The normal behaviour of HBase is that you then get the latest value in that time range (which is not what I want).
As far as I understand the way HBase should work is that if you set the maximum number of versions for the values in a column family to '1' it should retain only the last value that was put into the cell.
What I found is different.
If I do the following commands into the hbase shell
create 't1', {NAME => 'c1', VERSIONS => 1}
put 't1', 'r1', 'c1', 'One', 1000
put 't1', 'r1', 'c1', 'Two', 2000
put 't1', 'r1', 'c1', 'Three', 3000
get 't1', 'r1'
get 't1', 'r1' , {TIMERANGE => [0,1500]}
the result is this:
get 't1', 'r1'
COLUMN CELL
c1: timestamp=3000, value=Three
1 row(s) in 0.0780 seconds
get 't1', 'r1' , {TIMERANGE => [0,1500]}
COLUMN CELL
c1: timestamp=1000, value=One
1 row(s) in 0.1390 seconds
Why does the second query return a value eventhough I've set the max versions to only 1?
The HBase version I currently have installed here is HBase 0.94.6-cdh4.4.0
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…