Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
941 views
in Technique[技术] by (71.8m points)

multithreading - import inside of a Python thread

I have some functions that interactively load python modules using __import__

I recently stumbled upon some article about an "import lock" in Python, that is, a lock specifically for imports (not just the GIL). But the article was old so maybe that's not true anymore.

This makes me wonder about the practice of importing in a thread.

  1. Are import/__import__ thread safe?
  2. Can they create dead locks?
  3. Can they cause performance issues in a threaded application?

EDIT 12 Sept 2012

Thanks for the great reply Soravux. So import are thread safe, and I'm not worrying about deadlocks, since the functions that use __import__ in my code don't call each others.

Do you know if the lock is acquired even if the module has already been imported ? If that is the case, I should probably look in sys.modules to check if the module has already been imported before making a call to __import__.

Sure this shouldn't make a lot of difference in CPython since there is the GIL anyway. However it could make a lot of difference on other implementations like Jython or stackless python.

EDIT 19 Sept 2012

About Jython, here's what they say in the doc:

http://www.jython.org/jythonbook/en/1.0/Concurrency.html#module-import-lock

Python does, however, define a module import lock, which is implemented by Jython. This lock is acquired whenever an import of any name is made. This is true whether the import goes through the import statement, the equivalent __import__ builtin, or related code. It’s important to note that even if the corresponding module has already been imported, the module import lock will still be acquired, if only briefly.

So, it seems that it would make sense to check in sys.modules before making an import, to avoid acquiring the lock. What do you think?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Update: Since Python 3.3, import locks are per-module instead of global, and imp is deprecated in favor of importlib. More information on the changelog and this issue ticket.

The original answer below predates Python 3.3

Normal imports are thread-safe because they acquire an import lock prior to execution and release it once the import is done. If you add your own custom imports using the hooks available, be sure to add this locking scheme to it. Locking facilities in Python may be accessed by the imp module (imp.lock_held()/acquire_lock()/release_lock()). Edit: This is deprecated since Python 3.3, no need to manually handle the lock.

Using this import lock won't create any deadlocks or dependency errors aside from the circular dependencies that are already known (module a imports module b which imports module a). Edit: Python 3.3 changed for a per-module locking mechanism to prevent those deadlocks caused by circular imports.

There exist multiple ways to create new processes or threads, for example fork and clone (assuming a Linux environment). Each way yields different memory behaviors when creating the new process. By default, a fork copies most memory segments (Data (often COW), Stack, Code, Heap), effectively not sharing its content between the child and its parent. The result of a clone (often called a thread, this is what Python uses for threading) shares all memory segments with its parent except the stack. The import mechanism in Python uses the global namespace which is not placed on the stack, thus using a shared segment between its threads. This means that all memory modifications (except for the stack) performed by an import in a thread will be visible to all its other related threads and parent. If the imported module is Python-only, it is thread-safe by design. If an imported module uses non-Python libraries, make sure those are thread-safe, otherwise it will cause mayhem in your multithreaded Python code.

By the way, threaded programs in Python suffers the GIL which won't allow much performance gains unless your program is I/O bound or rely on C or external thread-safe libraries (since they should release the GIL before executing). Running in two threads the same imported Python function won't execute concurrently because of this GIL. Note that this is only a limitation of CPython and other implementations of Python may have a different behavior.

To answer your edit: imported modules are all cached by Python. If the module is already loaded in the cache, it won't be run again and the import statement (or function) will return right away. You don't have to implement yourself the cache lookup in sys.modules, Python does that for you and won't imp lock anything, aside from the GIL for the sys.modules lookup.

To answer your second edit: I prefer having to maintain a simpler code than trying to optimize calls to the libraries I use (in this case, the standard library). The rationale is that the time required to perform something is usually way more important than the time required to import the module that does it. Furthermore, the time required to maintain this kind of code throughout the project is way higher than the time it will take to execute. It all boils down to: "programmer time is more valuable than CPU time".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...