After reading up on this answer and "Linux Kernel Development" by Robert Love and, subsequently, on the clone()
system call, I discovered that processes and threads in Linux are (almost) indistinguishable to the kernel. There are a few tweaks between them (discussed as being "more sharing" or "less sharing" in the quoted SO question), but I do still have some questions yet to be answered.
I recently worked on a program involving a couple of POSIX threads and decided to experiment on this premise. On a process that creates two threads, all threads of course get a unique value returned by pthread_self()
, however, not by getpid()
.
A sample program I created follows:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <pthread.h>
void* threadMethod(void* arg)
{
int intArg = (int) *((int*) arg);
int32_t pid = getpid();
uint64_t pti = pthread_self();
printf("[Thread %d] getpid() = %d
", intArg, pid);
printf("[Thread %d] pthread_self() = %lu
", intArg, pti);
}
int main()
{
pthread_t threads[2];
int thread1 = 1;
if ((pthread_create(&threads[0], NULL, threadMethod, (void*) &thread1))
!= 0)
{
fprintf(stderr, "pthread_create: error
");
exit(EXIT_FAILURE);
}
int thread2 = 2;
if ((pthread_create(&threads[1], NULL, threadMethod, (void*) &thread2))
!= 0)
{
fprintf(stderr, "pthread_create: error
");
exit(EXIT_FAILURE);
}
int32_t pid = getpid();
uint64_t pti = pthread_self();
printf("[Process] getpid() = %d
", pid);
printf("[Process] pthread_self() = %lu
", pti);
if ((pthread_join(threads[0], NULL)) != 0)
{
fprintf(stderr, "Could not join thread 1
");
exit(EXIT_FAILURE);
}
if ((pthread_join(threads[1], NULL)) != 0)
{
fprintf(stderr, "Could not join thread 2
");
exit(EXIT_FAILURE);
}
return 0;
}
(This was compiled [gcc -pthread -o thread_test thread_test.c
] on 64-bit Fedora; due to the 64-bit types used for pthread_t
sourced from <bits/pthreadtypes.h>
, the code will require minor changes to compile on 32-bit editions.)
The output I get is as follows:
[bean@fedora ~]$ ./thread_test
[Process] getpid() = 28549
[Process] pthread_self() = 140050170017568
[Thread 2] getpid() = 28549
[Thread 2] pthread_self() = 140050161620736
[Thread 1] getpid() = 28549
[Thread 1] pthread_self() = 140050170013440
[bean@fedora ~]$
By using scheduler locking in gdb
, I can keep the program and its threads alive so I can capture what top
says, which, just showing processes, is:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28602 bean 20 0 15272 1112 820 R 0.4 0.0 0:00.63 top
2036 bean 20 0 108m 1868 1412 S 0.0 0.0 0:00.11 bash
28547 bean 20 0 231m 16m 7676 S 0.0 0.4 0:01.56 gdb
28549 bean 20 0 22688 340 248 t 0.0 0.0 0:00.26 thread_test
28561 bean 20 0 107m 1712 1356 S 0.0 0.0 0:00.07 bash
And when showing threads, says:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28617 bean 20 0 15272 1116 820 R 47.2 0.0 0:00.08 top
2036 bean 20 0 108m 1868 1412 S 0.0 0.0 0:00.11 bash
28547 bean 20 0 231m 16m 7676 S 0.0 0.4 0:01.56 gdb
28549 bean 20 0 22688 340 248 t 0.0 0.0 0:00.26 thread_test
28552 bean 20 0 22688 340 248 t 0.0 0.0 0:00.00 thread_test
28553 bean 20 0 22688 340 248 t 0.0 0.0 0:00.00 thread_test
28561 bean 20 0 107m 1860 1432 S 0.0 0.0 0:00.08 bash
It seems to be quite clear that programs, or perhaps the kernel, have a distinct way of defining threads in contrast to processes. Each thread has its own PID according to top
- why?
See Question&Answers more detail:
os