So databases are for situations where you have a large complicated schema that is constantly changing. You only have one "table" with a hand-full of simple numeric fields. I would do it this way:
Prepare a C/C++ struct to hold the record format:
struct StockPrice
{
char ticker_code[2];
double stock_price;
timespec when;
etc
};
Then calculate sizeof(StockPrice[N]) where N is the number of records. (On a 64-bit system) It should only be a few hundred gig, and fit on a $50 HDD.
Then truncate a file to that size and mmap (on linux, or use CreateFileMapping on windows) it into memory:
//pseduo-code
file = open("my.data", WRITE_ONLY);
truncate(file, sizeof(StockPrice[N]));
void* p = mmap(file, WRITE_ONLY);
Cast the mmaped pointer to StockPrice*, and make a pass of your data filling out the array. Close the mmap, and now you will have your data in one big binary array in a file that can be mmaped again later.
StockPrice* stocks = (StockPrice*) p;
for (size_t i = 0; i < N; i++)
{
stocks[i] = ParseNextStock(stock_indata_file);
}
close(file);
You can now mmap it again read-only from any program and your data will be readily available:
file = open("my.data", READ_ONLY);
StockPrice* stocks = (StockPrice*) mmap(file, READ_ONLY);
// do stuff with stocks;
So now you can treat it just like an in-memory array of structs. You can create various kinds of index data structures depending on what your "queries" are. The kernel will deal with swapping the data to/from disk transparently so it will be insanely fast.
If you expect to have a certain access pattern (for example contiguous date) it is best to sort the array in that order so it will hit the disk sequentially.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…