There are two layers that are closely related: geom_bar()
and geom_col()
. The key difference is how they aggregate the data by default.
For geom_bar()
, the default behavior is to count the rows for each x value. It doesn't expect a y-value, since it's going to count that up itself -- in fact, it will flag a warning if you give it one, since it thinks you're confused. How aggregation is to be performed is specified as an argument to geom_bar()
, which is stat = "count"
for the default value.
If you explicitly say stat = "identity"
in geom_bar()
, you're telling ggplot2
to skip the aggregation and that you'll provide the y values. This mirrors the natural behavior of geom_col()
below.
In the case of geom_col()
, it won't try to aggregate the data by default. From the docs, "geom_col()
uses stat_identity()
: it leaves the data as is". So, it expects you to already have the y values calculated and to use them directly. And geom_col()
doesn't have an argument to change that behavior - it's always going to plot your y values that you provide, and you need to provide them.
If you have y values, you could use either syntax, but I find geom_col()
more direct.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…