The pricise way in which what you want can be accomplished depends on the data stracture you are using.
With DataFrames.jl
you can proceed as follows. With your data in df
julia> df
120×3 DataFrame
│ Row │ age │ groups │ bins │
│ │ Int64 │ String │ Categorical… │
├─────┼───────┼────────┼──────────────────┤
│ 1 │ 7 │ Group1 │ Q1: [0.0, 16.8) │
│ 2 │ 2 │ Group1 │ Q1: [0.0, 16.8) │
│ 3 │ 8 │ Group1 │ Q1: [0.0, 16.8) │
│ 4 │ 4 │ Group1 │ Q1: [0.0, 16.8) │
│ 5 │ 9 │ Group1 │ Q1: [0.0, 16.8) │
│ 6 │ 12 │ Group1 │ Q1: [0.0, 16.8) │
│ 7 │ 5 │ Group2 │ Q1: [0.0, 16.8) │
│ 8 │ 1 │ Group2 │ Q1: [0.0, 16.8) │
│ 9 │ 16 │ Group2 │ Q1: [0.0, 16.8) │
│ 10 │ 13 │ Group2 │ Q1: [0.0, 16.8) │
│ 11 │ 1 │ Group2 │ Q1: [0.0, 16.8) │
?
│ 109 │ 75 │ Group2 │ Q5: [71.4, 89.0] │
│ 110 │ 82 │ Group2 │ Q5: [71.4, 89.0] │
│ 111 │ 80 │ Group2 │ Q5: [71.4, 89.0] │
│ 112 │ 80 │ Group2 │ Q5: [71.4, 89.0] │
│ 113 │ 86 │ Group2 │ Q5: [71.4, 89.0] │
│ 114 │ 77 │ Group2 │ Q5: [71.4, 89.0] │
│ 115 │ 88 │ Group2 │ Q5: [71.4, 89.0] │
│ 116 │ 75 │ Group2 │ Q5: [71.4, 89.0] │
│ 117 │ 87 │ Group2 │ Q5: [71.4, 89.0] │
│ 118 │ 79 │ Group2 │ Q5: [71.4, 89.0] │
│ 119 │ 83 │ Group2 │ Q5: [71.4, 89.0] │
│ 120 │ 74 │ Group2 │ Q5: [71.4, 89.0] │
- We first calculate the the number of observation in each group/bin cell
df2 = combine(groupby(df, [:groups, :bins]), :age => length => :num)
The column :num
has the number of obs in each cell.
- We calculate the number of observation in each group and then join the data frame with this info to
df2
. We calculate the proportion and sort by bin/group
df3 = combine(groupby(df, :groups), :age => length => :den)
df4 = join(df3, df2, on = :groups)
df4[:proportion] = df4.num./df4.den
sort!(df4, [:bins, :groups])
julia> df4
10×5 DataFrame
│ Row │ groups │ den │ bins │ num │ proportion │
│ │ String │ Int64 │ Categorical… │ Int64 │ Float64 │
├─────┼────────┼───────┼──────────────────┼───────┼────────────┤
│ 1 │ Group1 │ 43 │ Q1: [0.0, 16.8) │ 6 │ 0.139535 │
│ 2 │ Group2 │ 77 │ Q1: [0.0, 16.8) │ 18 │ 0.233766 │
│ 3 │ Group1 │ 43 │ Q2: [16.8, 36.6) │ 10 │ 0.232558 │
│ 4 │ Group2 │ 77 │ Q2: [16.8, 36.6) │ 14 │ 0.181818 │
│ 5 │ Group1 │ 43 │ Q3: [36.6, 52.4) │ 8 │ 0.186047 │
│ 6 │ Group2 │ 77 │ Q3: [36.6, 52.4) │ 16 │ 0.207792 │
│ 7 │ Group1 │ 43 │ Q4: [52.4, 71.4) │ 11 │ 0.255814 │
│ 8 │ Group2 │ 77 │ Q4: [52.4, 71.4) │ 13 │ 0.168831 │
│ 9 │ Group1 │ 43 │ Q5: [71.4, 89.0] │ 8 │ 0.186047 │
│ 10 │ Group2 │ 77 │ Q5: [71.4, 89.0] │ 16 │ 0.207792 │