There are now several open source federated learning (FL) platforms, including pySyft/pyGrid, NVIDIA Clara, OpenFL, Flower and FATE. Bitfount enables more than just machine learning use cases, but it’s still important to understand why we decided to build another platform supporting FL and what is so different about Bitfount. In this blog post, I’ll explain a bit about our aims with Bitfount, and how they led to some key architectural differences from other FL frameworks on the market.
Bitfount’s envisions a world with seamless, but secure, data collaboration. This meant that we first prioritised ease of use:
- Setup of a first analysis on remote data needs to be possible in <5min.
- Information Security (InfoSec) teams need to be able to easily assess that the platform fits their policies.
- Information Governance (IG) teams need to be able to easily assess that the platform meets their requirements.
The first consequence of this focus was a significant change in the orchestration model. Instead of the standard client-server architecture that all of the frameworks listed above use, we made the decision to use a messaging architecture.
In order to easily integrate with Information Governance (IG) workflows we also deeply integrated governance into the platform. This second distinction means that access control policies can be easily managed through a web interface and then picked up by the data processing components to manage who can do what on which datasets.
Before describing our architecture it is helpful to settle on some terminology. There are two fundamental services that generally run in any federated data science platform*.
- The Processor of Data (in Bitfount terminology, a Pod): This is the service that runs the computation on local data and sends relevant statistics back. Other platforms call this a variety of names including Worker, Client, Node and Satellite.
- The Modelling Service: This is the service that collects statistics from different Pods and sometimes applies any necessary decryption or aggregations. Other platforms often call this the Server, or sometimes, Domain.
*Note: There are some FL algorithms where the Pods take turns in acting as the modelling service.
The client-server approach
The classic design for federated learning (FL) and analytics use cases is that the Pod and Modelling Service maintain a direct internet connection. Standard FL operates via the Pod connecting to the Modelling Service to get the most recent iteration of a machine learning model, computing gradients using its local data and then sending the data back to the modelling service which aggregates the gradients to give an updated model. Federated analytics works similarly, with statistics from different Pods being aggregated centrally in some way.
A client-server architecture to achieve this can be set up with either the Modelling Service or the Pod acting as the server. When the Modelling Service acts as the server, Pods initiate the connection, request the latest model, calculate gradients and send them back. When the Pod acts as the server, the Modelling Service initiates the connection, sends the latest model and expects back the relevant gradient updates.
This architecture leads to several challenges. Most importantly, the service acting as the server must be accessible from the other service’s network. In practice, this means setting it up on the public internet, which is very difficult to get IT teams to approve.
The messaging approach
To alleviate the IT concerns and make initial setup as simple as possible, Bitfount utilises an alternative, messaging-based, architecture. This means that both the Pod and Modelling Service can sit behind firewalls and use only outgoing Internet connections. Infrastructure setup for getting started is minimal, and users can get going in minutes.
This architecture has the potential drawback of needing additional network transfers, but in practice the potential latency effects can be mitigated and is worthwhile for the additional ease-of-use and stability of using standard infrastructure. In our case, any additional network overhead is incurred by Bitfount, as opposed to being borne by the data custodian or data scientist.
In many practical situations of data collaboration, the technical concerns are less significant than concerns about information governance. To streamline this process, Bitfount includes an access management system that is deeply integrated into the entire platform. Whenever a Pod receives a request for data usage, it checks with this access manager to ensure that the user has the required access rights.
The result of all this is that data custodians can manage access to their data, depending on how they’d like the data to be used. We’ve written about this approach in more detail in our usage-based access control policies blog post.
To facilitate ease of setup, we host an access manager that can be used if the data custodian trusts Bitfount to coordinate the access controls. Some data custodians may not be comfortable with that level of trust, so we’ve also made the access manager something that could be run on the data custodian’s own infrastructure. In that configuration, data custodians and data scientists don’t need to trust Bitfount at all, while still being able to execute secure, end-to-end encrypted, federated analyses.
There are several other architectural differences in our platform relating to extensibility and security, which we will post more about as they come to fruition. For now, we’re very excited to already be testing out our platform with some early beta users. Please do visit us at bitfount.com to join our waitlist or book a demo.