Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
248 views
in Technique[技术] by (71.8m points)

Performance difference when running same cell twice in python notebook pandas df.map

I am currently using a python notebook to run the following function that I would like to map to a pandas series.

def get_number_of_activated_pixels(image_path):
    im = io.imread(image_path)
    n_activated = (im > 0).sum()
    return n_activated

The function simply reads an image path into a numpy array using skimage's io, and then returns the number of non 0 pixels.

When I try using the df.map function to apply the function on the series containing the paths I get drastically different performance when I run the same cell for the second time.

I am using the snippet below in the cell:

start = timer()
test = test_df.map(get_number_of_activated_pixels)
end = timer()
print(end - start) # Time in seconds

When I run the cell for the first time it takes about 100 seconds, as for when I run the same cell for the second time, it runs in only 18 seconds.

What can I attribute this huge difference in performance to? Is python doing some caching behind the scenes? If so, can someone please elaborate what is going on?

question from:https://stackoverflow.com/questions/65646633/performance-difference-when-running-same-cell-twice-in-python-notebook-pandas-df

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...