chrisb's answer gives you all you need to know, but if you are game for gory details...
But first, the takeaways from the lengthy analysis bellow in a nutshell:
For free functions, there is not much difference between cpdef
and rolling it out with cdef
+def
performance-wise. The resulting c-code is almost identical.
For bound methods, cpdef
-approach can be slightly faster in the presence of inheritance-hierarchies, but nothing to get too excited about.
Using cpdef
-syntax has its advantages, as the resulting code is clearer (at least to me) and shorter.
Free functions:
When we define something silly like:
cpdef do_nothing_cp():
pass
the following happens:
- a fast c-function is created (in this case it has a cryptic name
__pyx_f_3foo_do_nothing_cp
because my extension is called foo
, but you actually have only to look for the f
prefix).
- a python-function is also created (called
__pyx_pf_3foo_2do_nothing_cp
- prefix pf
), it does not duplicate the code and call the fast function somewhere on the way.
- a python-wrapper is created, called
__pyx_pw_3foo_3do_nothing_cp
(prefix pw
)
do_nothing_cp
method definition is issued, this is what the python-wrapper is needed for, and this is the place where is stored which function should be called when foo.do_nothing_cp
is invoked.
You can see it in the produced c-code here:
static PyMethodDef __pyx_methods[] = {
{"do_nothing_cp", (PyCFunction)__pyx_pw_3foo_3do_nothing_cp, METH_NOARGS, 0},
{0, 0, 0, 0}
};
For a cdef
function, only the first step happens, for a def
-function only steps 2-4.
Now when we load module foo
and invoke foo.do_nothing_cp()
the following happens:
- The function pointer bound to name
do_nothing_cp
is found, in our case the python-wrapper pw
-function.
pw
-function is called via function-pointer, and calls the pf
-function (as C-functionality)
pf
-function calls the fast f
-function.
What happens if we call do_nothing_cp
inside the cython-module?
def call_do_nothing_cp():
do_nothing_cp()
Clearly, cython doesn't need the python machinery to locate the function in this case - it can directly use the fast f
-function via a c-function call, bypassing pw
and pf
functions.
What happens if we wrap cdef
function in a def
-function?
cdef _do_nothing():
pass
def do_nothing():
_do_nothing()
Cython does the following:
- a fast
_do_nothing
-function is created, corresponding to the f
- function above.
- a
pf
-function for do_nothing
is created, which calls _do_nothing
somewhere on the way.
- a python-wrapper, i.e.
pw
function is created which wraps the pf
-function
- the functionality is bound to
foo.do_nothing
via function-pointer to the python-wrapper pw
-function.
As you can see - not much difference to the cpdef
-approach.
The cdef
-functions are just simple c-function, but def
and cpdef
function are python-function of the first class - you could do something like this:
foo.do_nothing=foo.do_nothing_cp
As to performance, we cannot expect much difference here:
>>> import foo
>>> %timeit foo.do_nothing_cp
51.6 ns ± 0.437 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit foo.do_nothing
51.8 ns ± 0.369 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
If we look at the resulting machine code (objdump -d foo.so
), we can see that the C-compiler has inlined all calls for the cpdef-version do_nothing_cp
:
0000000000001340 <__pyx_pw_3foo_3do_nothing_cp>:
1340: 48 8b 05 91 1c 20 00 mov 0x201c91(%rip),%rax
1347: 48 83 00 01 addq $0x1,(%rax)
134b: c3 retq
134c: 0f 1f 40 00 nopl 0x0(%rax)
but not for the rolled out do_nothing
(I must confess, I'm a little bit surprised and don't understand the reasons yet):
0000000000001380 <__pyx_pw_3foo_1do_nothing>:
1380: 53 push %rbx
1381: 48 8b 1d 50 1c 20 00 mov 0x201c50(%rip),%rbx # 202fd8 <_DYNAMIC+0x208>
1388: 48 8b 13 mov (%rbx),%rdx
138b: 48 85 d2 test %rdx,%rdx
138e: 75 0d jne 139d <__pyx_pw_3foo_1do_nothing+0x1d>
1390: 48 8b 43 08 mov 0x8(%rbx),%rax
1394: 48 89 df mov %rbx,%rdi
1397: ff 50 30 callq *0x30(%rax)
139a: 48 8b 13 mov (%rbx),%rdx
139d: 48 83 c2 01 add $0x1,%rdx
13a1: 48 89 d8 mov %rbx,%rax
13a4: 48 89 13 mov %rdx,(%rbx)
13a7: 5b pop %rbx
13a8: c3 retq
13a9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
This could explain, why cpdef
version is slightly faster, but anyway the difference is nothing compared to the overhead of a python-function-call.
Class-methods:
The situation is a little bit more complicated for class methods, because of the possible polymorphism. Let's start out with:
cdef class A:
cpdef do_nothing_cp(self):
pass
At first sight, there is not that much difference to the case above:
- A fast, c-only,
f
-prefix-version of the function is emitted
- A python (prefix
pf
) version is emitted, which calls the f
-function
- A python wrapper (prefix
pw
) wraps the pf
-version and is used for registration.
do_nothing_cp
is registered as a method of class A
via tp_methods
-pointer of the PyTypeObject
.
As can be seen in the produced c-file:
static PyMethodDef __pyx_methods_3foo_A[] = {
{"do_nothing", (PyCFunction)__pyx_pw_3foo_1A_1do_nothing_cp, METH_NOARGS, 0},
...
{0, 0, 0, 0}
};
....
static PyTypeObject __pyx_type_3foo_A = {
...
__pyx_methods_3foo_A, /*tp_methods*/
...
};
Clearly, the bound version has to have the implicit parameter self
as an additional argument - but there is more to it: The f
-function performs a function-dispatch if called not from the corresponding pf
function, this dispatch looks as follows (I keep only the important parts):
static PyObject *__pyx_f_3foo_1A_do_nothing_cp(CYTHON_UNUSED struct __pyx_obj_3foo_A *__pyx_v_self, int __pyx_skip_dispatch) {
if (unlikely(__pyx_skip_dispatch)) ;//__pyx_skip_dispatch=1 if called from pf-version
/* Check if overridden in Python */
else if (look-up if function is overriden in __dict__ of the object)
use the overriden function
}
do the work.
Why is it needed? Consider the following extension foo
:
cdef class A:
cpdef do_nothing_cp(self):
pass
cdef class B(A):
cpdef call_do_nothing(self):
self.do_nothing()
What happens when we call B().call_do_nothing()
?
- `B-pw-call_do_nothing' is located and called.
- it calls
B-pf-call_do_nothing
,
- which calls
B-f-call_do_nothing
,
- which calls
A-f-do_nothing_cp
, bypassing pw
and pf
-versions.
What happens when we add the following class C
, which overrides the do_nothing_cp
-function?
import foo
def class C(foo.B):
def do_nothing_cp(self):
print("I do something!")
Now calling C().call_do_nothing()
leads to:
call_do_nothing' of the
C-class being located and called which means,
pw-call_do_nothing' of the B
-class being located and called,
- which calls
B-pf-call_do_nothing
,
- which calls
B-f-call_do_nothing
,
- which calls
A-f-do_nothing
(as we already know!), bypassing pw
and pf
-versions.
And now in the 4. step, we need to dispatch the call in A-f-do_nothing()
in order to get the right C.do_nothing()
call! Luckily we have this dispatch in the function at hand!
To make it more complicated: what if the class C
were also a cdef
-class? The dispatch via __dict__
would not work, because cdef-classes don't have __dict__
?
For the cdef-classes, the polymorphism is implemented similar to C++'s "virtual tables", so in B.call_do_nothing()
the f-do_nothing
-function is not called directly but via a pointer, which depends on the class of the object (one can see those "virtual tables" being set up in __pyx_pymod_exec_XXX
, e.g. __pyx_vtable_3foo_B.__pyx_base
). Thus the __dict__
-dispatch in A-f-do_nothing()
-function is not needed in case of pure cdef-hierarchy.
As to performance, comparing cpdef
with cdef
+def
I get:
cpdef def+cdef
A.do_nothing 107ns 108ns
B.call_nothing 109ns 116ns
so the difference isn't that large with, if someone, cpdef
being slightly faster.