Modelplane: Open-Source AI Inference Control Plane

▼ Summary
– Upbound released Modelplane, an open-source control plane that manages fleet-wide coordination for AI inference across clouds, neoclouds, and on-premise systems, with v0.1.0 under an Apache 2.0 license.
– Modelplane builds on Crossplane, running as a control plane on its own cluster to reconcile a fleet toward a declared state, handling provisioning, scheduling, scaling, and traffic routing.
– The software divides work into two roles: platform teams define the GPU fleet and hardware classes, while developers deploy models via declarative manifests to receive a single OpenAI-compatible endpoint.
– The gateway applies cost, compliance, and sovereignty policies, supporting regulated and sovereign enterprises that run inference on their own governed infrastructure.
– Modelplane is neutral about the serving engine, allowing any container-based engine and deployment topology through one API, and is available for free on GitHub.
Organizations that run open-weight AI models on their own hardware typically manage GPU fleets scattered across cloud providers, neoclouds, and on-premise data centers. Each of these fleets requires careful handling of model placement, replica scaling, infrastructure provisioning, weight distribution, and traffic routing. Until now, teams have built this coordination layer manually, one operator at a time.
Upbound, the company behind the Crossplane project, has released Modelplane, an open-source control plane designed to manage fleet-wide coordination for AI inference. The software installs directly into a user’s own environment and orchestrates models, the serving stack, and the underlying infrastructure. It operates across cloud, neocloud, and on-premise systems, scaling from a single GPU to multi-node deployments. The first public version, v0.1.0, is released under the Apache 2.0 license.
Built on Crossplane
Modelplane is built on Crossplane, a Cloud Native Computing Foundation (CNCF) graduated project used by organizations such as Apple, Nike, SAP, IBM, and Akamai to run internal platforms. Modelplane functions as a control plane on its own cluster, positioned above the inference clusters that serve models. The system continuously reconciles the fleet toward a state declared by the operator, provisioning clusters, scheduling deployments onto compatible clusters, scaling replicas, caching weights, and routing traffic through a single gateway.
“Kubernetes became the standard control plane for compute. Crossplane extended that model to cloud infrastructure,” said Bassam Tabbara, CEO and founder of Upbound. He added that AI inference “needs the same layer.”
Two roles and one API
The software divides work into two distinct roles. Platform teams create resources that define the GPU fleet and the hardware classes available on it, all fronted by an inference gateway. Developers declare a model, its engine, and a replica count, and receive a single OpenAI-compatible endpoint. This endpoint supports weighted canary and A/B rollouts across the replicas it selects.
A developer deploys a model using a declarative manifest that specifies the model name, the serving engine image, and the GPU memory a node must provide. Modelplane then schedules the replica onto a cluster with free, compatible GPUs and composes the necessary resources between the two roles on the operator’s behalf. The system remains neutral about the serving engine, so one API can serve any container-based engine and any deployment topology. The engine flags a developer writes handle parallelism, quantization, and KV transfer.
Security and policy controls
The gateway routes inference requests and applies policies related to cost, compliance, and sovereignty, with a fallback option to managed providers. This control point is especially important for regulated and sovereign enterprises, where inference must run inside infrastructure a company governs directly for security, sovereignty, or compliance reasons. The repository ships with a published security policy, a code of conduct, and contribution guidelines.
Upbound identifies three primary user groups for the project: neoclouds and AI factories building managed inference services on their own hardware, regulated and sovereign enterprises, and AI-native companies with large inference spend that are transitioning to open-weight models.
Modelplane is available for free on GitHub.
(Source: Help Net Security)