Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
582 views
in Technique[技术] by (71.8m points)

Find a function to return value based on condition using R

I have a table with values

KId sales_month quantity_sold
100        1    0
100        2    0
100        3    0
496        2    6
511        2    10
846        1    4
846        2    6
846        3    1
338        1    6
338        2    0    

now i require output as

KId sales_month quantity_sold result
100           1     0         1
100           2     0         1
100           3     0         1
496           2     6         1
511           2     10        1
846           1     4         1
846           2     6         1
846           3     1         0
338           1     6         1
338           2     0         1

Here, the calculation has to go as such if quantity sold for the month of march(3) is less than 60% of two months January(1) and February(2) quantity sold then the result should be 1 or else it should display 0. Require solution to perform this.

Thanks in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If I understand well, your requirement is to compare sold quantity in month t with the sum of quantity sold in months t-1 and t-2. If so, I can suggest using dplyr package that offer the nice feature of grouping rows and mutating columns in your data frame.

resultData <- group_by(data, KId) %>% 
    arrange(sales_month) %>% 
    mutate(monthMinus1Qty = lag(quantity_sold,1), monthMinus2Qty = lag(quantity_sold, 2)) %>% 
    group_by(KId, sales_month) %>%
    mutate(previous2MonthsQty = sum(monthMinus1Qty, monthMinus2Qty, na.rm = TRUE)) %>%  
    mutate(result = ifelse(quantity_sold/previous2MonthsQty >= 0.6,0,1)) %>%
    select(KId,sales_month, quantity_sold, result)

The result is as below: example of data frame processed by the above code

Adding

select(KId,sales_month, quantity_sold, result)

at the end let us display only columns we care about (and not all these intermediate steps).

I believe this should satisfy your requirement. NA is the result column are due to 0/0 division or no data at all for the previous months. Should you need to expand your calculation beyond one calendar year, you can add year column and adjust group_by() arguments appropriately. For more information on dplyr package, follow this link


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...