Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
179 views
in Technique[技术] by (71.8m points)

r - Producing an output using the information in the column of a data (character operations (sum e.t.c))

I have problem with "What are the top 5 countries in research (considering Countries_Unique_Count attribute)?" question.

The question here is asking us to output the most repetitive 5 in that column and how many times in total, using the column named Countries_Unique_Count header.

For example ,
( This example is given using a small portion of column )

{'United States': 1}
{'Brazil': 1}
{'Sweden': 1}
{'United States': 1}
{'Brazil': 1}
{'USA': 1}
{'Tunisia': 1}
{'Brazil': 1}
{'Germany': 1}
{'Japan': 1, 'Canada': 1, 'Germany': 1, 'Italy': 1, 'Iran': 1}
{'Brazil': 1}
{'United States': 1}
{'Tunisia': 1}
{'Brazil': 1}
{'Tunisia': 1}
{'United States': 1}
{'Germany': 1}

Expected output ,

Brazil -> 5
United States -> 4
Tunisia -> 3
Germany -> 2
Sweden -> 1

Here is my code to get that column in R.
NOTE: data2 excell file is not an important file, I just used it to extract the data.

library(readxl)
my_data2 <- read_excel("data2.xlsx")
a <- my_data2$Countries_Unique_Count
print(a)

Here the all column output in R

I will only share some of it due to the character limit in the post.

   [1] "{'United States': 1}"                                                                                                                                                                                                                                        
   [2] "{'Sweden': 1}"                                                                                                                                                                                                                                               
   [3] "{'USA': 1}"                                                                                                                                                                                                                                                  
   [4] "{'Brazil': 1}"                                                                                                                                                                                                                                               
   [5] "{'Germany': 1, 'United States': 1}"                                                                                                                                                                                                                          
   [6] "{'Japan': 1, 'Canada': 1, 'Germany': 1, 'Italy': 1, 'Iran': 1}"                                                                                                                                                                                              
   [7] "{'Austria': 1}"                                                                                                                                                                                                                                              
   [8] "{'Poland': 1}"                                                                                                                                                                                                                                               
   [9] "{'Tunisia': 1}"                                                                                                                                                                                                                                              
  [10] "{'United States': 1}"                                                                                                                                                                                                                                        
  [11] "{'Germany': 1}"                                                                                                                                                                                                                                              
  [12] "{'Spain': 1, 'Norway': 1}"                                                                                                                                                                                                                                   
  [13] "{'United States': 1}"                                                                                                                                                                                                                                        
  [14] "{'Kuwait': 1}"                                                                                                                                                                                                                                               
  [15] "{'United States': 1}"                                                                                                                                                                                                                                        
  [16] "{'India': 1}"                                                                                                                                                                                                                                                
  [17] "{'Belgium': 1}"                                                                                                                                                                                                                                              
  [18] "{'SungKyunKwan Univ.': 1, 'Sejong Univ.': 1, 'Kwangwoon Univ.': 1, 'Chungcheong College': 1}"                                                                                                                                                                
  [19] "{'R.L.': 1}"                                                                                                                                                                                                                                                 
  [20] "{'United Kingdom': 1, 'Germany': 1, 'Belgium': 1}"                                                                                                                                                                                                           
  [21] "{'Spain': 1, 'Austria': 1}"                                                                                                                                                                                                                                  
  [22] "{'France': 1}"                                                                                                                                                                                                                                               
  [23] "{'Thailand': 1}"                                                                                                                                                                                                                                             
  [24] "{'Australia': 1, 'India': 1}"                                                                                                                                                                                                                                
  [25] "{'China': 1}"                                                                                                                                                                                                                                                
  [26] "{'Germany': 1}"                                                                                                                                                                                                                                              
  [27] "{'Ireland': 1}"                                                                                                                                                                                                                                              
  [28] "{'Sweden': 1}"                                                                                                                                                                                                                                               
  [29] "{'Myanmar': 1}"                                                                                                                                                                                                                                              
  [30] "{'United States': 1}"                                                                                                                                                                                                                                        
  [31] "{'Japan': 1}"                                                                                                                                                                                                                                                
  [32] "{'China': 1}"                                                                                                                                                                                                                                                
  [33] "{'Canada': 1}"                                                                                                                                                                                                                                               
  [34] "{'Canada': 1, 'Ireland': 1, 'Poland': 1}"                                                                                                                                                                                                                    
  [35] "{'Germany': 1, 'Spain': 1, 'Finland': 1}"                                                                                                                                                                                                                    
  [36] "{'United Arab Emirates': 1, 'Canada': 1}"                                                                                                                                                                                                                    
  [37] "{'Germany': 1}"                                                                                                                                                                                                                                              
  [38] "{'United Kingdom': 1, 'United States': 1}"                                                                                                                                                                                                                   
  [39] "{'Egypt': 1}"                                                                                                                                                                                                                                             

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This data in json format. To make it readable you should replace ' by " as follows firstly,

data2 <- gsub("'", '"', data)

Then we can read it by,

library(jsonlite)


out <- do.call(rbind,lapply(data2, function(x)
             do.call(rbind,fromJSON(x))))

                    [,1]
United States          1
Sweden                 1
USA                    1
Brazil                 1
Germany                1
United States          1
Japan                  1
Canada                 1
Germany                1
Italy                  1
Iran                   1
Austria                1
Poland                 1
Tunisia                1

From this point on, it is nothing but the data arrangement. You can use whatever you want. I prefer the Base R as usual,

out <- rowsum(out[,1], row.names(out))

Finally,

head(out[order(-out[,1]),],5)

gives,

United States       Germany       Belgium       Austria        Brazil 
            5             4             2             1             1 

Data:

data <- c("{'United States': 1}", "{'Sweden': 1}", "{'USA': 1}", "{'Brazil': 1}", "{'Germany': 1, 'United States': 1}", "{'Japan': 1, 'Canada': 1, 'Germany': 1, 'Italy': 1, 'Iran': 1}", "{'Austria': 1}", "{'Poland': 1}", "{'Tunisia': 1}", "{'United States': 1}", "{'Germany': 1}", "{'Spain': 1, 'Norway': 1}", "{'United States': 1}", "{'Kuwait': 1}", "{'United States': 1}", "{'India': 1}", "{'Belgium': 1}", "{'SungKyunKwan Univ.': 1, 'Sejong Univ.': 1, 'Kwangwoon Univ.': 1, 'Chungcheong College': 1}", "{'R.L.': 1}", "{'United Kingdom': 1, 'Germany': 1, 'Belgium': 1}" )

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...