Federated Learning and the Future of Edge Computing

The use of federated learning to train models on distributed training data is a privacy-first approach to machine learning. With the application of machine learning techniques to domains as diverse as game-playing, image recognition, and predictive text, access to troves of data is an increasingly valuable component in the effective development of high-value algorithms. Data is often likened to the new oil, an ode to the ways in which information has become a commodity. The value of data, much like oil, is dependent on how much of it you have. Deep learning methods that define the state-of-the-art across a number of domains required millions of training examples: AlexNet, the canonical image recognition benchmark, was trained on the ImageNet dataset with over 14 million images. GPT-2, a text generation model released by OpenAI, was trained on 8 billion webpages and over 40GB of text from the internet. In machine learning, the quality of a model is largely dependent on the quality and quantity of the data it was trained on.

While the comparison of data to oil is apt with respect to value, data, unlike oil, is not a finite resource. The earth has a finite reserve of oil, but we produce data at astronomical rates. By one measure, we generate approximately 2.5 quintillion bytes of data every day. Data is generated at every juncture of our connected lives – from the articles we read, to the movies we watch – companies use this data to fuel the algorithms meant to keep our attention. Attention means more engagement and more potential ad-revenue. To this point, commercial efforts have been wildly successful. Some of the most valuable companies in the world, including Facebook, Amazon, and Google, now recognize machine learning as a core component of their modern business. While companies have taken to aggregating user data, they have also fallen prey to hackers. Data breaches hurt consumer trust and ultimately tarnish the brand of the company that was breached. The centralization of data has created targets with a myriad of attack vectors – no system is unhackable. While data is an indispensable part of many modern software oriented businesses, there needs to be a better way to perform necessary data operations without centralizing the data itself. Taking Murphy’s law and understanding that what can happen almost certainly will happen, often at the least fortuitous time, data infrastructure needs to be designed to reduce the surface area that is exposed to malicious actors. The decentralization of data not only protects users from having their data stolen, it is also a proactive step towards respecting user privacy and building more robust and secure systems.

Decentralized Data Infrastructure #

The infrastructure built around data ultimately informs how secure that data is. If data is collected and stored across a distributed database cluster, the core premise underlying the integrity of that data is directly tied to the promise that no individual machines in the cluster can be compromised. The problem with this model, which is replicated across the software industry, is that there is no guarantee that any given system is safe from malicious activity. There are steps that can be taken to reduce risk, such as banning internet connections, disallowing the use of external media drives, and physical isolation, but these essentially make most systems useless or tremendously difficult to use.

A more efficient approach around building data infrastructure requires considering the nature of the data. One of the most fundamental questions in this regard is, where is the data coming from? In the cases where the data comes from edge devices, edge computing provides a fresh perspective on security. Edge computing refers to data processing that takes place on devices in close proximity to the users that produce that data. Mobile phones are a prime example of edge devices that produce troves of sensitive individualized data. Be it Facebook collecting data while you use their mobile application or your choice of health app keeping track of your workouts, mobile phones are a treasure chest of personal data. Applications tend to collect data in order to create a more personalized experience for users – if Amazon is going to be in the business of recommending you things to buy (which it is), it’s more convenient for you if those recommendations are relevant to your preferences. The same is true across services – Netflix’s recommendation engine is only as good as its ability to keep you on the platform. The ability to perform the data operations that make services as convenient as we’ve been conditioned to expect them to be on the phone itself would obviate the need to centralize user data.

Federated Learning #

Federated learning is a privacy-first machine learning technique that utilizes edge computing in the process of training models on distributed data. In traditional training loops for deep neural networks, data typically resides in a central store. Training deep learning models requires quick access to training data to perform the proper gradient operations necessary to optimize the model’s parameters.

Screen Shot 2020-01-18 at 5.04.42 PM.png

In the most common sense, input data is fed into a model for the forward pass and predictions are computed, a backwards pass is used to compute loss via a gradient computation, and then the model’s parameters are updated. This loop makes updates to the model’s parameters without regard to where the input data is sourced from. Federated learning algorithms don’t depend on data existing at a specific location – instead, the data itself never leaves edge devices.

Screen Shot 2020-01-18 at 5.02.52 PM.png

Each pass of the training loop in a federated learning system assumes a decentralized data source. Specifically, instead of gradient computations and updates happening on a centralized server, these computations happen on edge devices and updates are communicated back to a central server for aggregation. The general training loop in the federated setting has two primary components: client devices at the edge of the network and scalable computing infrastructure for gradient aggregation. Google’s federated averaging algorithm works as such: At some iteration i of the training loop, the global model W is sent to a random subset of edge devices j – if this is the first training loop (i.e, i = 0), the model sent to the j devices is pre-trained or randomly initialized. Each edge device has k training examples that are local to that device (if the edge device is a mobile phone, these data points would be the data accessible on the phone). Each edge device computes a gradient g with respect to the global model W it was passed from the central server and the k data points that exist on-device. Once an edge device completes the gradient update, it sends g to the centralized server. After receiving all gradient updates g from the j edge devices that were included in this training loop, a new global model is computed on the central server by aggregating all of the gradient updates. This process is repeated until convergence or some other specified training criteria is met.

Conclusions #

Federated learning is a step towards building scalable, decentralized machine learning models that present less risk than the centralized systems that currently underlie many production-grade machine learning deployments today. Google recently released work around the use of federated learning to improve Gboard – the Google keyboard. While it is possible to develop performant algorithms using federated learning, there are a number of challenges associated with breaking from the common centralized paradigm. For starters, the Google team notes that Stochastic Gradient Descent (SGD), a commonly used optimization algorithm in machine learning, is dependent on training data being readily accessible via a centralized location in the cloud. The development of alternative optimization algorithms, such as Google’s federated averaging algorithm, highlight how it is possible to obviate the need of data centralization – research actually shows that the federated averaging algorithm uses 10-100x less communication than a federated implementation using SGD.

While federated learning is relatively niche, it is a promising research area given the socio-political nature of our most sensitive modern data platforms. The convenience gained from services built around data-fueled algorithms cannot coexist with modern expectations around security. In short, the centralization of sensitive data is rife with security risks that can’t be mitigated. As more applications of machine learning turn to the federated approach, privacy and functionality could potentially finally exist in harmony.


Now read this

Leveraging AWS to Scale R&D Workflows

Originally posted to Indigo Ag’s Engineering Blog. In order to identify and deliver commercially viable products for our growers, Indigo’s Research and Development teams analyze bacterial and fungal microbes through bioinformatic... Continue →