Cython+: Extending Cython with Python-like GIL-free Abstractions

TL;DR

  1. Python's official interpreter CPython is throttled by the Global Interpreter Lock
  2. CPython's C API offers a way to plug C/C++ extensions modules into Python
  3. Cython makes writing C extensions easier by providing a unified syntax for Python and C
  4. C/C++ is much lower-level than Python, subject to hard errors impossible in Python
  5. We want to extend Cython with higher-level abstractions both GIL-free and thread-safe

1. The Trouble with Python

Python is very popular, but can sometimes be slow, in particular with regards to multithreading.

Python is a very popular programming language, due to qualities like:

  • being easy to learn, teach and use,
  • a high productivity,
  • a large and diverse ecosystem,

Due in part to its interpreted nature, Python tends to be much slower and less energy efficient than compiled languages like C. Python makes up for this with its strong support for integration with C and C++ for performance-critical parts. In fact, it's why Python is often called a "glue language".

On top of that, CPython (Python's most widely used implementation) is limited by its Global Interpreter Lock that throttles concurrent execution of Python bytecode to a single thread at a time.

2. Performance with CPython's Python/C API

CPython's Python/C API is the secret behind Python's performant builtins and many libraries.

The established way to overcome Python's performance limitations is to write C extension modules and extension types with CPython's Python/C API. In fact, optimising Python code is often a matter of delegating the work to builtin functions already implemented in a compiled language like C, as this anecdote by Guido van Rossum neatly illustrates. In a way, that's still indirectly relying on the Python/C API, because the Python/C API is also how built-in types and built-in functions are implemented internally in CPython.

Extension modules are what powers numpy, scikit-learn, and many other emblematic Python libraries.

Extension modules are also one of the reasons the CPython implementation is so important to Python: the only alternative interpreter that supports the Python/C API is Pypy. It seems very hard to remove the GIL from CPython without breaking the Python/C API.

3. A Unified Python/C Syntax with Cython

Cython makes writing C extensions for Python easier.

Cython takes the Python/C API one step further by providing a unified syntax for Python and C.

Cython as-a-language is a superset of Python with support for calling C functions and declaring C types. That is made possible thanks to the Python/C API.

Cython as-a-compiler is an optimising compiler that lets generated C/C++ code bypass the evaluation loop of the CPython interpreter via direct calls to the Python/C API and tries to accelerate Python instructions with the help of compile-time type annotations and inferences. Sufficient typing informations will let Cython bypass the Python interpreter entirely in favor of C equivalents.

On top of that, Cython allows defining extension types similarly to Python classes, thanks to an additional type of class declared with the keywords cdef class.

In particular, Cython's compiler takes care of all Python reference counts by inserting Py_INCREF and Py_DECREF instructions as needed.

All this makes writing C extensions and wrapping C libraries much easier. In fact, in the anecdote from the above section, using Cython is one way Guido van Rossum could have written his proposed C extension optimisation.

Cython has partial supoort for C++ as well and comes with wrappings for some interfaces of the C++ standard library.

Cython suggests a way to bypass the CPython Global Interpreter Lock by using C. Of course, while the GIL is released the Python/C API is strictly of-limits, leaving only plain C or C++.

4. The Trouble with C and C++

C and C++ expose the programmer to low-level errors that are nonexistant in Python.

C and C++ are much lower-level languages than Python.

They expose the programmer to hard-to-debug low-level errors that are nonexistent in Python, such as:

Their programming experiences differ a lot from Python's:

  • classes: C doesn't have any, C++'s method resolution is very different from Python's
  • typing: C and C++ are statically typed while Python is dynamically typed
  • undefined behavior: C and C++ admit undefined behavior, Python doesn't
  • error handling: C doesn't have exceptions and tends to use error codes instead

So while Cython provides a unified syntax, the semantics and experience remain very different, and this becomes very apparent in Cython as soon as the GIL is released.

5. Introducing Cython+

We wish to extend Cython with GIL-free compile-time abstractions that feel more like Python.

Since 2018, the Cython+ project consortium has been working on an experimental extension to the Cython compiler: Cython+.

Concretely, Cython+ already integrates proofs-of-concept for:

  • Python-compatible, reference-counted, GIL-free objects
  • Asynchronous execution using active objects
  • A work-stealing M:N scheduler inspired by Go's
  • Thread-safety through an ownership type system

Our goal is to extend Cython with GIL-free compile-time abstractions while balancing these three objectives:

  • A programming experience close to Python's
  • Performance, especially through GIL-free concurrency
  • Safety, in particular thread-safety

One interesting challenge is coming up with a static typing approach that is consistent with Python's object model, so that objects can have a sort of "dual-citizenship": be both statically-typed and Python objects.

Since GIL-free concurrency is a driving goal, we want to introduce features to facilitate concurrency that are both flexible and intuitive: active objects, promises, asynchronous I/O.

Check out our roadmap.