Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
163 views
in Technique[技术] by (71.8m points)

python - Pandas dataFrame cause crash because of memory limit

I'm a beginner in python and recommendation sytstems.

I want to implement a recommendation system in python with this tutorial:

https://towardsdatascience.com/solving-business-usecases-by-recommender-system-using-lightfm-4ba7b3ac8e62

But when I run the project, it crashes because of memory limit:

MemoryError: Unable to allocate 71.5 GiB for an array with shape (162541, 59047) and data type float64

I know that this is because of dataFrame size (100k rows, 25 columns). the code that generates this dataFrame:

def create_interaction_matrix(df, user_col, item_col, rating_col, norm=False, threshold=None):
'''
Function to create an interaction matrix dataframe from transactional type interactions
Required Input -
    - df = Pandas DataFrame containing user-item interactions
    - user_col = column name containing user's identifier
    - item_col = column name containing item's identifier
    - rating col = column name containing user feedback on interaction with a given item
    - norm (optional) = True if a normalization of ratings is needed
    - threshold (required if norm = True) = value above which the rating is favorable
Expected output - 
    - Pandas dataframe with user-item interactions ready to be fed in a recommendation algorithm
'''
interactions = df.groupby([user_col, item_col])[rating_col] 
    .sum().unstack().reset_index(). 
    fillna(0).set_index(user_col)
if norm:
    interactions = interactions.applymap(lambda x: 1 if x > threshold else 0)
return interactions

But I have no idea to solve it.

  • Is there any alternative dataType or anything to pandas.dataFrame that uses less RAM (at least less than 4GB)? Or any solution for generating smaller dataFrame?
  • Is there any solution for save/load dataFrame in Hard Drive? or in memory with less RAM usage?
question from:https://stackoverflow.com/questions/65922412/pandas-dataframe-cause-crash-because-of-memory-limit

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...