Date,Locality,District,New Cases,Hospitalizations,Deaths
5/21/2020,Accomack,Eastern Shore,709,40,11
5/21/2020,Albemarle,Thomas Jefferson,142,19,4
5/21/2020,Alleghany,Alleghany,9,4,0
5/21/2020,Amelia,Piedmont,22,7,1
5/21/2020,Amherst,Central Virginia,25,3,0
5/21/2020,Appomattox,Central Virginia,25,1,0
5/21/2020,Arlington,Arlington,1763,346,89
... // skipped down to the next day
5/20/2020,Accomack,Eastern Shore,709,39,11
5/20/2020,Albemarle,Thomas Jefferson,142,18,4
5/20/2020,Alleghany,Alleghany,10,4,0
5/20/2020,Amelia,Piedmont,21,7,1
5/20/2020,Amherst,Central Virginia,25,3,0
5/20/2020,Appomattox,Central Virginia,24,1,0
5/20/2020,Arlington,Arlington,1728,334,81
5/20/2020,Augusta,Central Shenandoah,88,4,1
... // continued
I have data for a State in the US like the above in a CSV and would like to do some data analysis on it so that I can send it through a rest API. The data analysis that I would like to do are various aggregations, such as: total cases across the state by date, total cases for the entire state , total cases grouped by district, total cases for a district by date, total cases for a county by date, etc. Just all the basic groupby's that one could do with this data.
Now, my problem is figuring out how to properly store this data in java, without a database. I have one successful implementation using a list of Row objects, where each Row
object contains just one row in the CSV. Then using java's Stream api
I have been able to filter and get some of these statistics. I then package these statistics into a single Row
object or a List<Row>
and send it to the API to be parsed into JSON. This has worked ok, but I feel that this is not the best way.
Is there some other more object-oriented way to utilize the Date
, District
, County
, Cases
column.
I was thinking of doing something like this :
class State {
List<District> districtList;
String name;
}
class District {
List<County> countyList;
String name;
}
class County {
LocalDate date;
String name;
int cases;
// more stuff
}
Then I would create one State
object with a list of District
objects, each with a list of many County
objects, one per date.
Does this seem like overkill? Is there some other clean way to read this dataset into a data structure that allows for easily aggregating summary information.
The way that I'm currently doing it now works, but I am looking for a better way!
See Question&Answers more detail:
os