A Modular Actor Framework for Cython+
TL;DR
- We introduce active objects as a way to structure concurrent programs
- We believe in decoupling program structure from parallelism
- We present a proof-of-concept modular design with plug-in implementations
Active Objects for Concurrency
With GIL-freedom comes the responsibility of providing a way to write concurrent programs.
Now that we have GIL-free objects, we introduce active objects inspired by Actalk. They provide a way to call cypclass
methods asynchronously.
Active objects are related to the actor paradigm of concurrent programmation.
In the actor world, actors represent fundamental units of computation: they can interact with each other only through asynchronous message passing, but can locally act sequentially to change their own private state in response to messages received. Thread-safety arises from each actor handling received messages one at a time.
Actors are usually associated with some kind of queue to store received messages until they are processed. Sending messages is often translated into calling methods asynchronously: the message is the method to be called and the arguments to be passed.
In the original actor model everything is an actor and every interaction is purely asynchronous.
Actors can be combined with promises that represent the future result of an asynchronous computation, allowing holder of the promise (presumably the caller) to wait until execution completes. This form of blocking synchronisation is foreign to the initial actor model.
We have chosen an approach that combines standard objects and actor-like active objects: ordinary cypclass
objects can be "activated" to become active objects which interact only through asynchronous method calls.
Core Design Principles
Concurrency is about structure; parallelism is about execution.
We agree with Go's view that concurrency and parallelism are related but separate things: the first is about the structure of the program, and the second is about its execution. Decoupling these two things promotes writing concurrently-structured programs that can flexibly adapt to take advantage of the computing resources available at execution. We want to provide abstractions that facilitate writing concurrent programs. We believe that active objects are in this regard a good abstraction.
Following this view, we believe that structuring concurrency should be decoupled from defining behavior: the caller should decide when calls are asynchronous (not the function). In other words, we are not adopting the async/await paradigm, because it splits the language into two separate kinds of functions. The designer of an object then only has to be concerned with the behavior of the object; the user decides if the object should be used asynchronously.
Our activable objects follow this direction: the same cypclass
implementation can be used directly or asynchronously.
In keeping with decoupling structure and execution, we opt for a modular design to allow alternative scheduler implementations to be plugged-in. The first advantage of this approach is that it makes experimenting with our own ideas easier. It also means users will be able to tailor implementations to fit their needs in terms of performance and features. And it might even promote a better understanding of how things work under the hood.
The scheduler's job is to dispatch all the actors concurently executing asynchronous calls onto the available computing resources. Keeping things modular means decoupling it from the programming interfaces available in the language.
The rest of this article is about how we design such an interface. How the scheduler implementation works will be the topic of a separate article.
A Modular Actor Interface
Our current proof-of-concept coordinates five components:
- activable objects: user-defined objects that can be activated
- messages: function objects that represent asynchronous method calls
- queues: mailbox objects associated to an actor to handle incoming messages
- result objects: promises associated to an asynchronous call
- sync objects: modifiers affecting the execution of an asynchronous call
This is how defining an active object might look like for a programmer:
cdef cypclass Hello activable:
__init__(self):
self._active_queue_class = consume MyMailboxQueue()
self._active_result_class = MyResultConstructor
void hello(self):
puts("hello")
The _active_queue_class
magic attribute is meant to hold a queue object (and not a class, despite the name) into which messages can be inserted.
The consume
keyword is part of our proof-of-concept ownership type system for thread-safety, inspired by Pony's consume
and related to Rust's ownership concepts. Here it tells the type-checker to make sure this is the only reference to the queue object. More on this type system in another article.
The activable
keyword tells Cython+ to insert the two magic attributes, and to generate all the necessary code to turn method calls on the active object into function objects and insert them in the queue object.
We introduce the type qualifier active
to designate activated objects. The activated versions of the methods all take an additional argument in first position to provide an optional sync object, or NULL
instead. The original idea for the sync
object is to provide a way to defer executing the asynchronous call if some condition is not met, but we haven't really used it.
cdef active Hello h
# ...
h.hello(NULL)
The _active_result_class
is meant to hold a constructor function to create result objects that will be associated to each asynchronous call, if the method returns a value.
This part is still a bit clunky because it means all methods of the same cypclass
need to use the same generic type of result object, instead of one specialized for the actual return type of each method, and that involves ugly uses of void *
. But implementing a classic actor model without any promises was sufficient for our experiments, so we haven't used this interface much so far. It is definitely slated for improvement.
Towards a Better Active Object Protocol
In the future, we contemplate reworking this programming interface with inspiration from Project Verona's concurrent ownership concept.
The idea is to think of an active object as a concurrent access manager that encapsulates an underlying object and offers only one way to access it: asynchronously, through requests to schedule work on it. The particularity is that instead of limiting possible requests to asynchronous method calls based on the methods defined by the underlying object, we can schedule arbitrary work in the form of asynchronous blocks of code: a kind of specialised lambda function.
It could look something like this (all following snippets are just mockups):
actor = Actor(point)
when actor as p:
# this block executed asynchronously
if p.x or p.y:
p.rotate(30)
The Actor
class here would only need to define an entry point for scheduling work, maybe something like:
cdef cypclass Actor:
__when__(self, callable):
# put the callable in a queue, execute it later, or now, or never
The compiler would transparently handle turning the asynchronous block into a function object and pass it to the actor's __when__
method.
Such a design would have multiple advantages over the current one, such as
- Reducing the interface with the scheduler to a single point: a class with a
__when__
method
- Removing the need for the
activable
keyword: any object can be encapsulated
It opens up better ways to handle promises and sync objects:
Promises could be created explicitly as needed, in a way actually more akin to channels:
promise = Promise[Point]()
when actor as p:
# this block executed asynchronously
if p.x or p.y:
p.rotate(30)
promise.put(p)
# ...
# now wait until the result is available
point = promise.get()
As for sync objects, they could simply use the same protocol as active objects to apply arbitrary modifiers to the execution of an asynchronous block:
cdef cypclass RandomlyConditionalActor:
Actor actor
__init__(self, Actor actor):
self.actor = actor
__when__(self, callable):
when self.actor as x:
# evaluate at the last moment whether
# the asynchronous block should actually be run
if heads_or_tails():
callable()
when RandomlyConditionalActor(actor) as x:
# this block maybe executed asynchronously based on a future coin flip
puts("heads!")
The point of such sync objects is to encapsulate factorisable behaviors such as synchronisation with other actors, e.g., notifying another actor when the execution completes. Manually notifying another actor every time an asynchronous block is executed would rapidly become very hard to maintain.
This way, sync objects are just another kind of active object with a specific behavior.
In fact, promises could implement this active object protocol as well, so as to allow scheduling work to be done after the promise is fulfilled:
promise = Promise[Point]()
when actor as p:
# this block executed asynchronously
if p.x or p.y:
p.rotate(30)
promise.put(p)
# ...
# don't wait, just schedule the next thing to do
when promise as point:
point.translate(10, 20)