Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
178 views
in Technique[技术] by (71.8m points)

oop - How to best store data from CSV in java class? A single list of Row objects, or a single object with nested objects?

Date,Locality,District,New Cases,Hospitalizations,Deaths
5/21/2020,Accomack,Eastern Shore,709,40,11
5/21/2020,Albemarle,Thomas Jefferson,142,19,4
5/21/2020,Alleghany,Alleghany,9,4,0
5/21/2020,Amelia,Piedmont,22,7,1
5/21/2020,Amherst,Central Virginia,25,3,0
5/21/2020,Appomattox,Central Virginia,25,1,0
5/21/2020,Arlington,Arlington,1763,346,89
... // skipped down to the next day
5/20/2020,Accomack,Eastern Shore,709,39,11
5/20/2020,Albemarle,Thomas Jefferson,142,18,4
5/20/2020,Alleghany,Alleghany,10,4,0
5/20/2020,Amelia,Piedmont,21,7,1
5/20/2020,Amherst,Central Virginia,25,3,0
5/20/2020,Appomattox,Central Virginia,24,1,0
5/20/2020,Arlington,Arlington,1728,334,81
5/20/2020,Augusta,Central Shenandoah,88,4,1
... // continued

I have data for a State in the US like the above in a CSV and would like to do some data analysis on it so that I can send it through a rest API. The data analysis that I would like to do are various aggregations, such as: total cases across the state by date, total cases for the entire state , total cases grouped by district, total cases for a district by date, total cases for a county by date, etc. Just all the basic groupby's that one could do with this data.

Now, my problem is figuring out how to properly store this data in java, without a database. I have one successful implementation using a list of Row objects, where each Row object contains just one row in the CSV. Then using java's Stream api I have been able to filter and get some of these statistics. I then package these statistics into a single Row object or a List<Row> and send it to the API to be parsed into JSON. This has worked ok, but I feel that this is not the best way.
Is there some other more object-oriented way to utilize the Date, District, County, Cases column.

I was thinking of doing something like this :

class State {
     List<District> districtList;
     String name;
}

class District {
     List<County> countyList;
     String name;
}

class County {
     LocalDate date;
     String name;
     int cases;
     // more stuff
}

Then I would create one State object with a list of District objects, each with a list of many County objects, one per date.

Does this seem like overkill? Is there some other clean way to read this dataset into a data structure that allows for easily aggregating summary information.

The way that I'm currently doing it now works, but I am looking for a better way!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

From your description, your approach seems sound, and properly object-oriented. However, without additional information (e.g. specific aggregations which may dictate otherwise), it seems odd you would have multiple "duplicate" 'County' objects in your District objects. For example:

[{"date":"5/21/2020","name":"Accomack"},
 {"date":"5/20/2020","name":"Accomack"}]

From an object-oriented view, it seems you'd want an additional level of aggregation, by "Date" (with each date containing a list of 'County' rows).

One consideration: if your aggregations align better with a database approach, I would think each row from the source data should be kept and queried AS/IS, filtered and sorted via Stream lambdas.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...