Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
264 views
in Technique[技术] by (71.8m points)

r - Simple frequency tables using data.table

I'm looking for a way to do simple aggregates / counts via data.table.

Consider the iris data, which has 50 observations per species. To count the observations per species I have to summaries over a column other than species, for example "Sepal.Length".

library(data.table)
dt = as.data.table(iris)
dt[,length(Sepal.Length), Species]

I find this confusing because it looks like I'm doing something on Sepal.Length at first glance, when really it's only Species that matters.

This is what I would prefer to say, but I don't get valid output:

dt[,length(Species), Species]

Correct input and output, but clunky code:

> dt[,length(Sepal.Length), Species]
Species V1
1:     setosa 50
2: versicolor 50
3:  virginica 50

Incorrect input and output, but nicer code:

> dt[,length(Species), Species]
Species V1
1:     setosa  1
2: versicolor  1
3:  virginica  1

Is there an elegant way around this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

data.table has a couple of symbols that can be used within the j expression. Notably

  • .N will give you the number of number of rows in each group.

see ?data.table under the details for by

Advanced: When grouping by by or by i, symbols .SD, .BY and .N may be used in the j expression, defined as follows.

....

.N is an integer, length 1, containing the number of rows in the group.

For example:

dt[, .N ,by = Species]

     Species  N
1:     setosa 50
2: versicolor 50
3:  virginica 50

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...