Dual-Citizenship Objects in Cython+

TL;DR

  • Cython+ introduces statically typed GIL-free classes that compile to C++
  • These classes are also extension types, giving our objects "dual-citizenship"
  • This creates a challenge: being consistent with Python's object model

Introducing cypclass

cypclass is a proof-of-concept fully GIL-free class model with reference counting.

Cython+'s experiments revolve around a new kind of class, introduced in our proof-of-concept by the keyword cypclass.

Cypclasses compile to C++ classes under the hood.

Like in Python, memory is managed with a reference count. But in order to be thread-safe without the GIL, our reference count is updated atomically. This gives us a form of GIL-free garbage collection.

Memory allocation is also GIL-free: under the hood, memory is allocated with C++'s new and freed with delete.

Calling a method or accessing an attribute just compiles to the same operation in C++, again GIL-free.

Granting Dual-Citizenship

Each cypclass is also a Python extension type.

Each cypclass object is also a Python object thanks to CPython's extension type mechanism.

Concretly, this means:

  • for each cypclass there is an associated Python type object PyTypeObject
  • each cypclass object has a pointer to its associated Python type object
  • each cypclass object has a Python reference count as defined in PyObject
  • each cypclass method has wrapper method that can be called from Python
  • each cypclass attribute has an associated Python descriptor

All together, this allows cypclass objects to be seen as generic Python objects by the CPython interpreter.*

Inside our Cython+ proof-of-concept, objects start out as statically typed, and can be cast to generic Python objects at any time. Python objects can be cast back to cypclass if they started out as one.

You might notice that each cypclass object has now two reference counts: the atomically updated one, and the one required to be a Python object. More on this later.

Cython already has its own cdef class for extension types, and they also have a sort of partial duality between static typing and Python objects, but critically, allocating memory, freeing memory and reference counting all require the GIL.

Inheritance and Subtyping

Python's Method Resolution Order is quite different from C++'s, yet we want consistent behavior.

In our proof-of-concept, each object is statically typed. To provide some runtime polymorphism, objects can be cast to their base classes in the inheritance tree. Method calls are dispatched at runtime so that an object retains the same behavior even when cast to a base class. Under the hood every method is a C++ virtual method.

All this becomes more complicated when we introduce multiple inheritance.. For one thing, the Python C API generally does not support multiple inheritance in extension types - more on that in the next section. For another, Python and C++ handle multiple inheritance and method resolution very differently.

Focusing for now on the second thing: in C++ when several base classes define methods with the same name, the method call on an object of the derived class is ambiguous and won't compile. In Python, the elegant C3 linearisation algorithm is used to produce a univocal Method Resolution Order (MRO).

Worse, in C++ if we cast the object with the ambiguous method to one of its base classes, the method call will now resolve to the method defined in that base. But if we cast it to another base it will resolve to the method defined in that other base!

To solve this, we have to compute the C3 linearisation ourselves and introduce a method in the derived class that merely unambiguously calls the method of the correct base class according to C3. This way the object always behaves as expected.

In the future we might separate subtyping from inheritance: make inheritance only about code-reuse, and provide runtime polymorphism by a different subtyping mechanism more along the lines of structural subtyping. This could feel closer to Python's duck-typing and increase code-reusability, now free from subtyping constraints. Also it would avoid the overhead of dynamic dispatch when it's not needed.

Exporting Multiple Inheritance to Python-land

How we managed to define CPython extension types with multiple base classes.

In C, a form of single inheritance can be achieved by embedding a "base" struct as the first field of a derived struct, thus giving the second struct the memory layout of the first, plus whatever additional fields it adds after that. This lets pointers to the derived struct be cast to the first one, providing some kind of subtyping.

In fact, Cython does exactly this under the hood for its cdef class extension types.

The Python/C API does not generally support extension types with multiple base classes, because this inheritance-in-C trick obviously doesn't work with multiple bases.

Or does it? The real problem is that the beginning of the memory layout of the derived struct must match the layout of all the bases, and that's impossible as soon as two bases have conflicting layouts. But it is possible when all the bases' layouts can be "stacked" on top of each other, combining all bases into a single layout.

Luckily, CPython's inheritance sanity checks are refined enough to accept extension types with multiple bases as long as the bases do not have layout conflicts. CPython determines this just by knowing that all extension types start with the basic PyObject layout, which extension types are each type's bases, and the size of each extension type's layout.

So we pretended to the CPython interpreter that all our classes have no fields in addition to the basic PyObject(https://docs.python.org/3/c-api/structures.html#c.PyObject) layout. Of course each class is actually free to define attributes and take up more space in memory, but that is nothing we need CPython to be aware of. All we need is for CPython to be aware of the inheritance tree so that its own Method Resolution Order algorithm produces the correct ordering.

Cooperative Dual Reference Counting

How we make two separate reference counts work together.

Our objects have two distinct reference counts: the "atomic" count and the reference count imposed by CPython.

Our design allows an object to be used concurrently with the GIL by the Python interpreter and without the GIL in multiple other threads.

That's because Python is only aware of the Python reference count and uses it in a way that would not be thread-safe without the GIL, so the other threads must necessarily use a separate reference count without the GIL, and it must be updated atomically because there might be several concurrent threads.

When a cypclass object is cast to a generic Python object, we increment the Python reference count. When a Python object is cast back, we increment the "atomic" reference count. Straightforward and easy enough.

The hard part is to find a way to free the memory only when both reference counts drop to zero. Reference counting in Python cannot be customised at all, but extension types can customise what happens when the reference count drops to zero (usually, how the memory should be freed). And we already intervene when a cypclass object is cast to a generic Python object.

So here is the scheme we came up with:

  • When we cast to a Python object and the Python reference count rises from zero to one, we artificially increment the "atomic" reference count.
  • When the Python reference count drops to zero, we just decrement the "atomic" reference count and free the memory only when it also drops to zero.

The artificial increment ensures that the "atomic" reference count never drops to zero as long as the Python reference count doesn't, and thus we just need to free the memory when the "atomic" count drops to zero.

In the future, this dual reference counting design might change to take advantage of the ownership type system we're working on. The type system could allow reference counting to be non-atomic in some conditions, avoiding unnecessary overhead. In which case it could be combined with Python's reference count.