Introduction: The Scaling Challenge for Large Clusters
As Kubernetes clusters expand to accommodate tens of thousands of nodes, controllers that monitor high-cardinality resources—like Pods—often encounter a critical performance bottleneck. In a traditional setup, each replica of a horizontally scaled controller receives the entire event stream from the API server. This means every replica must deserialize, process, and then discard objects it doesn't manage, leading to wasted CPU, memory, and network resources. Scaling out the controller doesn't reduce per-replica overhead; it actually increases the total load because each new replica duplicates the same full stream.
Introducing Server-Side Sharded List and Watch
Kubernetes v1.36 introduces an alpha feature (KEP-5866) called server-side sharded list and watch. With this feature enabled, the API server filters events at the source, ensuring each controller replica receives only the slice of the resource collection it is responsible for. This fundamentally changes how controllers handle large-scale workloads, moving the filtering logic from the client side into the API server itself.
Why Client-Side Sharding Falls Short
Some controllers, such as kube-state-metrics, already support horizontal sharding. In this model, each replica is assigned a portion of the keyspace and discards objects that don't belong to it. While this approach works functionally, it doesn't reduce the volume of data flowing from the API server. The core issues include:
- N replicas × full event stream: Every replica deserializes and processes every event, then throws away what it doesn't need.
- Network bandwidth scales with replicas, not shard size: Each replica still receives the entire stream, so network usage grows linearly with the number of replicas.
- Wasted CPU deserialization: The CPU cycles spent on deserialization are entirely wasted for the discarded fraction of events.
Server-side sharding solves these problems by moving the filtering upstream into the API server. Each replica tells the API server which hash range it owns, and the API server only sends matching events.
How Server-Side Sharded List and Watch Works
Specifying a Shard with shardSelector
The feature adds a shardSelector field to ListOptions. Clients specify a hash range using the shardRange() function. For example:
shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')
The API server computes a deterministic 64-bit FNV-1a hash of the specified field and returns only objects whose hash falls within the range [start, end). This applies to both list responses and watch event streams.
Deterministic Hashing Across API Servers
The hash function produces the same result across all API server instances, making the feature safe to use in multi-replica API server deployments. Currently supported field paths are object.metadata.uid and object.metadata.namespace.
Using Sharded Watches in Controllers
Controllers typically use informers to list and watch resources. To shard the workload, each replica injects the shardSelector into the ListOptions used by its informers via WithTweakListOptions. Below is a Go code example demonstrating how to set this up:
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/informers"
)
shardSelector := "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')"
factory := informers.NewSharedInformerFactoryWithOptions(client, resyncPeriod,
informers.WithTweakListOptions(func(opts *metav1.ListOptions) {
opts.ShardSelector = shardSelector
}),
)
For a 2-replica deployment, the selectors split the hash space in half. For example, Replica 0 might use the lower half of the hash space:
"shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')"
Replica 1 would then cover the upper half to ensure complete coverage.
Benefits and Considerations
Server-side sharded list and watch offers several advantages:
- Reduced data volume: Each replica receives only the events it needs, cutting network and memory usage dramatically in large clusters.
- Linear scalability: Adding more replicas doesn't multiply the load on the API server; each replica pulls only its shard.
- Lower CPU overhead: No wasted deserialization of irrelevant objects, freeing CPU for actual processing.
However, as an alpha feature, it is subject to change and should be tested thoroughly before production use. Also, note that sharding is most effective for controllers that can evenly distribute objects across hash ranges, such as those monitoring Pods or other high-cardinality resources.
Conclusion
Server-side sharded list and watch represents a significant step forward in scaling Kubernetes controllers. By moving event filtering into the API server, it eliminates the inefficiencies of client-side sharding and enables truly horizontal scalability without wasted resources. As the feature matures, it will likely become a standard tool for operators managing large clusters. For more details, refer to the KEP-5866.