The speed depends on the TLS implementation.
Yes, you are correct that TLS can be as fast as a pointer lookup. It can even be faster on systems with a memory management unit.
For the pointer lookup you need help from the scheduler though. The scheduler must - on a task switch - update the pointer to the TLS data.
Another fast way to implement TLS is via the Memory Management Unit. Here the TLS is treated like any other data with the exception that TLS variables are allocated in a special segment. The scheduler will - on task switch - map the correct chunk of memory into the address space of the task.
If the scheduler does not support any of these methods, the compiler/library has to do the following:
- get current ThreadId
- Take a semaphore
- Lookup the pointer to the TLS block by the ThreadId (may use a map or so)
- Release the semaphore
- Return that pointer.
Obviously doing all this for each TLS data access takes a while and may need up to three OS calls: Getting the ThreadId, Take and Release the semaphore.
The semaphore is btw required to make sure no thread reads from the TLS pointer list while another thread is in the middle of spawning a new thread. (and as such allocate a new TLS block and modify the datastructure).
Unfortunately it's not uncommon to see the slow TLS implementation in practice.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…