Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
177 views
in Technique[技术] by (71.8m points)

Python - debugging an interference between multiprocessing, importing and an external C library

I am facing a somewhat dead-end in a piece of code I am running, that seems to involve python threading, importing and an external C library.

The gist of the problem is the following. I need to calculate an approximate inverse of a large number of big sparse matrices. Fortunately for me, there is a scikits.sparse.cholmod external library that basically exposes a widely-known and used C library "SuiteSparse".

While it does has some native support for parallelism, I've discovered that for modern multi-core processors it's easy to get some speed-ups by adding an explicit parallelization on top of implicit parallelization. In order to do this, I define a function that wraps an object that contains the matrix that needs to be inverted, cholmod-wrapping function as well as a function that process the inverted matrices and store the results the database and d calls the cholmod and then send it down Python's multiprocessing Pool with a map:

with Pool(processes=pool_size) as pool:
  try:
    pool.map(my_function, arguments_puck)
  except Exception as e:
    # log and raise the exception

It all works rather nicely for the first pass. The problem is that if I try to perform intermediate analysis and the need to re-launch the pool within my code, the second spawning fails, but silently. Specifically, at the line where cholmod is called, during a second pool.map call, all the processes hang silently and indefinitely.

This issue is not present if I do not use the multiprocessing module.

This issue is not present as well if I do invoke cholmod afterwards in the main process.

I tried to trace the problem by adding thread exception handles (threading.excepthook = function_that_will_log_and_raise) as well as enabling warning capturing (logging.captureWarnings(True)). None of them gets anything that would suggest a lock might happen.

I tried to hammer blindly at the problem forcing a scikits.sparse.cholmod re-import before every new call with an importlib.reload(chmd) after an import scikits.sparse.cholmod as chmd.

I tried as well to force all threads in pull termination by adding an pool.terminate() after the try-except in the Pool with context.

At this point, I am not sure how to proceed with debugging. I can't really set flags for the cholmod function to see what exactly fails because it is a thin wrapper around C/C++ code. My hunch is that there is some interference between the threads cholmod spawns internally and the threads that are generated by the python multiprocessing, but I have no idea how to proceed with further tracking down the problem.

Is there any possible way to figure out what's going on without having to dive into the C/C++ code?

question from:https://stackoverflow.com/questions/66067207/python-debugging-an-interference-between-multiprocessing-importing-and-an-ext

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...