I'm a beginner in python and recommendation sytstems.
I want to implement a recommendation system in python with this tutorial:
https://towardsdatascience.com/solving-business-usecases-by-recommender-system-using-lightfm-4ba7b3ac8e62
But when I run the project, it crashes because of memory limit:
MemoryError: Unable to allocate 71.5 GiB for an array with shape (162541, 59047) and data type float64
I know that this is because of dataFrame size (100k rows, 25 columns).
the code that generates this dataFrame:
def create_interaction_matrix(df, user_col, item_col, rating_col, norm=False, threshold=None):
'''
Function to create an interaction matrix dataframe from transactional type interactions
Required Input -
- df = Pandas DataFrame containing user-item interactions
- user_col = column name containing user's identifier
- item_col = column name containing item's identifier
- rating col = column name containing user feedback on interaction with a given item
- norm (optional) = True if a normalization of ratings is needed
- threshold (required if norm = True) = value above which the rating is favorable
Expected output -
- Pandas dataframe with user-item interactions ready to be fed in a recommendation algorithm
'''
interactions = df.groupby([user_col, item_col])[rating_col]
.sum().unstack().reset_index().
fillna(0).set_index(user_col)
if norm:
interactions = interactions.applymap(lambda x: 1 if x > threshold else 0)
return interactions
But I have no idea to solve it.
- Is there any alternative dataType or anything to pandas.dataFrame that uses less RAM (at least less than 4GB)? Or any solution for generating smaller dataFrame?
- Is there any solution for save/load dataFrame in Hard Drive? or in memory with less RAM usage?
question from:
https://stackoverflow.com/questions/65922412/pandas-dataframe-cause-crash-because-of-memory-limit 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…