fedora's inter-app messaging infrastructure

Fedora is huge. It has a lot of live apps and services that work in tandem to make the project work. How does one manage their communication? Where this might be needed is say when a code is pushed, the news can be passed of to the CI servers/QA servers to begin their testing, the static analyzers to static scan the code and report the vulnerabilities, the build server(Koji) to build it.

This can be of immense value in pushing code faster and more confidently. This, without having to manually wait for the previous step to complete. The automatic pipeline can change things.

Fedora solves this problem with a unified message bus, which can be used to pass messages between applications, computer systems even if the projects use different languages and reside on different platforms. This is implemented in the project fedmsg.

Fedmsg is a unified message bus where all the projects can dump their messages and interested parties can listen for them and react accordingly. To be specific, there are topics that can be subscribed to. Example: pkgdb could push messages to topic org.fedoraproject.service.pkgdb whenever an account’s email address changes.

The fedmsg has a broker-less architecture internally, but let’s invoke the Principle of Data Abstraction and treat it as a black box that gives us a stream of messages from all of fedora’s apps.

Fedmsg bus

Then there is datagrepper. It is a Python app that can be used to query the fedmsg bus with HTTP GET requests. One can pass in filters, limits, topics etc to retrieve the desired messages. It is a very powerful service and has been used very effectively by the fedora community.

There are various command line tools that are build on top of datagrepper and here is a short summary of them:

fedora-stats-tools
- this is a command line tool that queries datagrepper and produces SVG charts
- accepts various parameters to filter down to only the required data
thisweekinfedora
- a command line tool also that produces reports on the weekly activities of the fedora community
- gives in-depth details like top contributors, badges earned in the last week etc
fedstats-gsoc/gsoc-stats
- most feature laden of all the plugins
- accepts a lot of options to filter based on user, start/end dates, category etc and generates neat SVG graphics from the data

Datagrepper

These tools make good use of datagrepper, but one also wonders if they over-use datagrepper in some ways.

Datagrepper is just an HTTP endpoint service and has no facilities to persist data.
All the tools mentioned above have their own logic to create create graphics, reports etc.
They function more or less as single run tools that don’t allow dashboards to be built on top of them. This is because they have no facilities for caching their results.

Technically, this is not entirely true, releng-dash is a dashboard build on top of datagrepper but to render the pages, it has to make a lot of requests to datagrepper. This makes it slow and inefficient (not to mention harsh on datagrepper)
All tools rely on datagrepper for all their data and would not work if it is down.
Any change in the datagrepper API would break all the tools

To tackle all these problems, the community built statscache. Just like datagrepper, it too sits on the fedmsg bus and listens to all the messages.

Where it differs from datagrepper however is in the plugin framework it provides. When it receives any message, it passes it to the plugins. The plugins are managed by the statscache daemon, they just implement the message filtering and persisting configuration. Yes, configuration, not logic because that is handled by third party tools like SqlAlchemy.

Statscache shines in this respect. Some of it’s features:

extensibility - it supports plugins that can subscribe to certain “topics” on the fedmsg bus
fault tolerance - if statscache goes down, it falls back on querying datagrepper for the lost data and passing it to the plugins, so no data is ever lost
rest api - it also has a flask front end that can be queried for the data which each plugin receives. It is similar to datagrepper
ordering - statscache guarantees that the messages it sends to plugins are in order.

Statscache

Here, the plugins are a part of statscache and can reap the benefits that statscache provides; data persistence(caching), data ordering, fault tolerance, extensible architecture.

This is where I come in (hopefully!). I applied for the Google Summer of Code project that aims to refactor and extend the statscache ecosystem. It would be my duty to go from datagrepper to statscache. Can’t wait to begin!