Federated Learning Glossary: Key Terms and Concepts

Federated learning is a rapidly growing subfield of machine learning that enables decentralized model training across multiple data sources without moving or aggregating the data centrally. This approach has gained significant attention in sectors like healthcare, finance, and edge computing due to its privacy-preserving nature and scalability. Below is a glossary of key terms and concepts associated with federated learning to help practitioners, researchers, and enthusiasts navigate the space.

Core Concepts

Federated Learning (FL)
A machine learning technique that allows multiple decentralized clients (e.g., mobile devices, hospitals, banks) to collaboratively train a shared model while keeping all the training data local.

Client
An individual participant in a federated learning system that performs local training on its own dataset and contributes updates to the global model.

Server (Aggregator)
A central coordinating entity that collects model updates from clients, aggregates them, and redistributes the updated global model.

Round
A complete cycle of training and communication in federated learning, typically involving client selection, local model training, aggregation, and global model update.

Model Update
The parameters (such as gradients or weights) computed locally by each client and shared with the server for aggregation.

Privacy and Security

Differential Privacy (DP)
A technique used to add statistical noise to model updates, protecting individual data points in the local dataset from being inferred by malicious actors.

Secure Aggregation
A cryptographic protocol that enables the server to compute the sum of model updates from clients without learning any individual update.

Homomorphic Encryption
An encryption method that allows computations to be performed on encrypted data, producing an encrypted result that, when decrypted, matches the result of operations performed on the plaintext.

Trusted Execution Environment (TEE)
A secure hardware-based enclave where sensitive computations can be performed, providing an added layer of privacy for client-side or server-side operations.

Anonymization
The process of removing personally identifiable information from data or model updates to protect user privacy.

Algorithms and Optimization

Federated Averaging (FedAvg)
A fundamental federated learning algorithm where each client trains a local model and the server averages the model weights to update the global model.

FedProx
An extension of FedAvg that introduces a proximal term in the loss function to better handle heterogeneity across clients’ data.

Personalized Federated Learning
Approaches aimed at creating models that are customized to each client’s specific data distribution, rather than a single global model.

Gradient Clipping
A method used to bound the gradient values during training, which can prevent model divergence and reduce the risk of leaking private data.

Model Compression
Techniques like quantization or pruning used to reduce the size of model updates, making federated learning more efficient for resource-constrained devices.

Communication and Systems Design

Communication Overhead
The bandwidth and latency costs associated with sending model updates between clients and the server during each training round.

Bandwidth Constraints
Limitations in data transmission capacity, especially relevant in mobile or edge-based federated learning setups.

Client Dropout
The phenomenon where clients selected for training fail to complete or participate in a given round due to connectivity or computation issues.

Device Heterogeneity
Variations in hardware capabilities, such as memory, processing power, or network reliability, across the participating client devices.

Straggler Effect
A performance bottleneck caused by slow or delayed clients that hold up synchronization during training.

Data Considerations

Non-IID Data
Data that is not independently and identically distributed across clients, which is a common and challenging characteristic in federated learning scenarios.

Data Silos
Isolated data repositories maintained by separate institutions or individuals, often due to legal, regulatory, or organizational constraints.

Data Distribution Skew
Significant differences in the type, amount, or statistical properties of data across clients.

Testing and Evaluation

Cross-Device Federated Learning
A federated learning setting that involves numerous low-power, low-resource devices (e.g., smartphones, IoT devices), often with intermittent connectivity.

Cross-Silo Federated Learning
A setting involving a smaller number of reliable clients (e.g., hospitals, banks) with large datasets and stable infrastructure.

Client Sampling
The process of selecting a subset of clients to participate in each training round to reduce communication cost and accommodate system constraints.

Fairness in Federated Learning
Ensuring that the resulting global model performs well across diverse clients, especially those with limited data or underrepresented populations.

Conclusion

Understanding federated learning requires familiarity with a broad range of technical, privacy, and system-related concepts. As adoption grows in privacy-sensitive domains, mastering these terms will be essential for engineers, data scientists, policymakers, and product developers working in distributed AI environments.