I was looking to keep only those rows that have data for both 2015 and 2016 in the following dataset (World Happiness Report):
Initially the data for both years was separate in two different datasets, but I have put them together since I want to use the data against another dataset with information for those same two years.
I have come to this solution:
happiness_final_list = []
for i, element in enumerate(happiness_gby_1516.index.get_level_values(0)): # iterate over 'country'
index elements
if len(happiness_gby_1516.loc[element].index.get_level_values(0)) == 2:
happiness_final_list.append(happiness_gby_1516.iloc[[i]])
happiness_final = pd.concat(happiness_final_list)
happiness_final.head()
Was wondering if there is a simpler way, more straight forward and in line with the common use of pandas, to get the number of entries for the second level. As in for example index into the amount of second level entries related to the first level (i.e. for each country, how many years) or does it have to be made with combined booleans or with some other more convoluted expression as it is my solution.
Now I have a dataset of 302 rows, while I had originally one of 315, and the data is keeping the desired structure of the multi-index, so it looks that something has indeed been filtered out:
As I said, is there any more straight forward way in which is pandas is equipped to resolve this need? Or even just a better one for some other reason. I have a feeling there should be.
Thank you.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…