I have a data frame of products (apple, pear, banana) sold across different locations (cities) within different categories (food and edibles).
I would like to count how many times any given pair of products appeared together in any category.
This is an example dataset I'm trying to make this to work on:
category <- c('food','food','food','food','food','food','edibles','edibles','edibles','edibles', 'edibles')
location <- c('houston, TX', 'houston, TX', 'las vegas, NV', 'las vegas, NV', 'philadelphia, PA', 'philadelphia, PA', 'austin, TX', 'austin, TX', 'charlotte, NC', 'charlotte, NC', 'charlotte, NC')
item <- c('apple', 'banana', 'apple', 'pear', 'apple', 'pear', 'pear', 'apple', 'apple', 'pear', 'banana')
food_data <- data.frame(cbind(category, location, item), stringsAsFactors = FALSE)
For example, the pair "apple & banana" appeared together in the "food" category in "las vegas, NV", but also in the "edibles" category in "charlotte, NC". Therefore, the count for the "apple & banana" pair would be 2.
My desired output is count of pairs like this:
(unordered) count of apple & banana
2
(unordered) count of apple & pear
4
Anyone have an idea for how to accomplish this? Relatively new to R and have been confused for a while.
I'm trying to use this to calculate affinities between different items.
Additional clarification on output:
My full dataset consists of hundreds of different items. Would like to get a data frame where the first column is the pair and the second column is the count for each pair.
See Question&Answers more detail:
os