Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
321 views
in Technique[技术] by (71.8m points)

r - Error with ggplot2 mapping variable to y and using stat="bin"

I am using ggplot2 to make a histogram:

geom_histogram(aes(x=...), y="..ncount../sum(..ncount..)")

and I get the error:

Mapping a variable to y and also using stat="bin".
  With stat="bin", it will attempt to set the y value to the count of cases in each group.
  This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
  If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
  If you want y to represent values in the data, use stat="identity".
  See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)

What causes this in general? I am confused about the error because I'm not mapping a variable to y, just histogram-ing x and would like the height of the histogram bar to represent a normalized fraction of the data (such that all the bar heights together sum to 100% of the data.)

edit: if I want to make a density plot geom_density instead of geom_histogram, do I use ..ncount../sum(..ncount..) or ..scaled..? I'm unclear about what ..scaled.. does.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The confusion here is a long standing one (as evidenced by the verbose warning message) that all starts with stat_bin.

But users don't typically realize that their confusion revolves around stat_bin, since they typically encounter problems while using either geom_bar or geom_histogram. Note the documentation for each: they both use stat = "bin" (in current ggplot2 versions this stat has been split into stat_bin for continuous data and stat_count for discrete data) by default.

But let's back up. geom_*'s control the actual rendering of data into some sort of geometric form. stat_*'s simply transform your data. The distinction is a bit confusing in practice, because adding a layer of stat_bin will, by default, invoke geom_bar and so it can seem indistinguishable from geom_bar when you're learning.

In any case, consider the "bar"-like geom's: histograms and bar charts. Both are clearly going to involve some binning of data somewhere along the line. But our data could either be pre-summarised or not. For instance, we might want a bar plot from:

x
a
a
a
b
b
b

or equivalently from

x  y
a  3
b  3

The first hasn't been binned yet. The second is pre-binned. The default behavior for both geom_bar and geom_histogram is to assume that you have not pre-binned your data. So they will attempt to call stat_bin (for histograms, now stat_count for bar charts) on your x values.

As the warning says, it will then try to map y for you to the resulting counts. If you also attempt to map y yourself to some other variable you end up in Here There Be Dragons territory. Mapping y to functions of the variables returned by stat_bin (..count.., etc.) should be ok and should not throw that warning (it doesn't for me using @mnel's example above).

The take-away here is that for geom_bar if you've pre-computed the heights of the bars, always remember to use stat = "identity", or better yet use the newer geom_col which uses stat = "identity" by default. For geom_histogram it's very unlikely that you will have pre-computed the bins, so in most cases you just need to remember not to map y to anything beyond what's returned from stat_bin.

geom_dotplot uses it's own binning stat, stat_bindot, and this discussion applies here as well, I believe. This sort of thing generally hasn't been an issue with the 2d binning cases (geom_bin2d and geom_hex) since there hasn't been as much flexibility available in the analogous z variable to the binned y variable in the 1d case. If future updates start allowing more fancy manipulations of the 2d binning cases this could I suppose become something you have to watch out for there.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...