I'm using pandas
on a web server (apache + modwsgi + django) and have an hard-to-reproduce bug which now I discovered is caused by pandas not being thread-safe.
After a lot of code reduction I finally found a short standalone program which can be used to reproduce the problem. You can see it below.
The point is: contrary to the answer of this question this example shows that pandas can crash even with very simple operations which do not modify a dataframe. I'm not able to imagine how this simple code snippet could possibly be unsafe with threads...
The question is about using pandas and numpy in a web server. Is it possible? How am I supposed to fix my code using pandas? (an example of lock usage would be helpful)
Here is the code which causes a Segmentation Fault:
import threading
import pandas as pd
import numpy as np
def let_crash(crash=True):
t = 0.02 * np.arange(100000) # ok con 10000
data = pd.DataFrame({'t': t})
if crash:
data['t'] * 1.5 # CRASH
else:
data['t'].values * 1.5 # THIS IS OK!
if __name__ == '__main__':
threads = []
for i in range(100):
if True: # asynchronous
t = threading.Thread(target=let_crash, args = ())
t.daemon = True
t.start()
threads.append(t)
else: # synchronous
let_crash()
for t in threads:
t.join()
My environment: python 2.7.3, numpy 1.8.0, pandas 0.13.1
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…