Cloud · 4 min read

ThnkBIG Partners with Run:ai

THNKBIG Team

Engineering Insights

June 16, 2023

THNKBIG has partnered with Run:ai, the leading AI compute orchestration platform for Kubernetes. This partnership extends our AI infrastructure practice with a best-in-class GPU management layer for organizations running large-scale AI and machine learning workloads.

What Is Run:ai?

Run:ai is a GPU orchestration platform that sits on top of Kubernetes and adds AI-specific workload management capabilities. It extends the Kubernetes scheduler with AI compute awareness: fractional GPU allocation, workload queuing and preemption, dynamic resource allocation across teams, and GPU utilization reporting at a granularity that standard Kubernetes metrics can't provide.

NVIDIA acquired Run:ai in 2024, cementing its position as the enterprise standard for GPU orchestration on Kubernetes. Organizations using NVIDIA H100 and A100 infrastructure at scale increasingly standardize on Run:ai for its proven ability to maximize GPU cluster ROI.

Why Run:ai Changes GPU Economics

Fractional GPU sharing — allocate 0.25, 0.5, or 0.75 of a GPU to containers, enabling multiple inference workloads on a single expensive GPU without MIG reconfiguration
Workload preemption — high-priority training jobs can preempt lower-priority background workloads automatically, maximizing utilization without manual intervention
Fair-share scheduling — allocate GPU budget quotas to teams or projects; any team that's under quota borrows from those above quota, increasing overall utilization
Utilization dashboards — real-time and historical GPU utilization, queue depth, and cost-per-workload reporting for FinOps teams
Multi-cluster federation — manage GPU pools across multiple Kubernetes clusters, cloud accounts, and regions from a single control plane

Run:ai Use Cases

Research and Training Environments

Research teams frequently have bursty, unpredictable GPU demand. A team may need 20 GPUs for a training run, then nothing for 3 days. Run:ai's fair-share scheduling and workload queuing ensures experiments get resources as soon as they're available, while guaranteeing that no team can monopolize the cluster indefinitely.

Production Inference Serving

Production inference requires predictable, low-latency GPU access. Run:ai provides guaranteed GPU quotas for production serving endpoints — ensuring that training batch jobs can't starve inference pods during peak periods. Combined with Kubernetes HPA and KEDA, Run:ai enables true autoscaling inference infrastructure that right-sizes GPU allocation to request volume.

What This Partnership Means for THNKBIG Clients

As a Run:ai partner, THNKBIG provides end-to-end AI infrastructure services: Kubernetes GPU cluster design, Run:ai deployment and configuration, MLOps pipeline integration, and ongoing infrastructure optimization. Our team holds Run:ai implementation expertise and can accelerate deployment from weeks to days for organizations starting fresh or migrating from manual GPU management approaches.

Contact us to learn how Run:ai on Kubernetes can improve GPU utilization and reduce AI infrastructure costs for your organization.

THNKBIG has announced a partnership with Run:ai to help enterprises better manage and scale GPU resources for AI workloads. Run:ai’s Atlas platform provides GPU cluster management, intelligent workload scheduling, and GPU virtualization (including fractional GPU sharing and support for NVIDIA MIG), enabling near-100% GPU utilization as cited in The Forrester Wave: AI Infrastructure, Q4 2021.

By combining Run:ai’s GPU orchestration with THNKBIG’s deep Kubernetes operational expertise, organizations can:

Make shared GPU infrastructure practical and efficient for AI/ML teams.
Reduce GPU idle time by 60–70% when using Run:ai with Kubernetes.
Provide data science teams with reliable, on-demand access to pooled GPU resources.

Run:ai’s policy-based scheduling treats GPUs as a shared resource pool instead of dedicated machines, allowing multiple training and inference workloads to safely share a single GPU and run concurrently. This reduces contention between teams and significantly improves utilization.

THNKBIG deploys Run:ai into existing Kubernetes environments, sets up namespace-level GPU quotas aligned to team priorities, and integrates Run:ai’s scheduler with tools like MLflow, Kubeflow, or custom training pipelines. They also design monitoring for GPU utilization, job queues, and cost attribution by team and project, making the shared GPU model sustainable in production.

For US enterprises building or scaling AI infrastructure, the THNKBIG–Run:ai partnership delivers cost-effective, highly utilized GPU clusters that are accessible to all data science teams. Learn more about THNKBIG’s AI/MLOps practice at /solutions/ai-mlops/ or reach out via /contact/ to discuss GPU infrastructure needs.

Explore Our Solutions

Kubernetes Consulting Cloud-Native Architecture DevOps Consulting AI & MLOps Cloud Migration Observability

Ready to make AI operational?

Whether you're planning GPU infrastructure, stabilizing Kubernetes, or moving AI workloads into production — we'll assess where you are and what it takes to get there.

Schedule an Infrastructure Assessment Call Us Directly

US-based team · All US citizens · Continental United States only

ThnkBIG Partners with Run:ai

What Is Run:ai?

Why Run:ai Changes GPU Economics

Run:ai Use Cases

Research and Training Environments

Production Inference Serving

What This Partnership Means for THNKBIG Clients

Related Reading

Observability vs Data Governance: A Strategic Insight for IT and Cloud Operations Leadership

Achieve Rock Bottom Cloud Costs with Kubecost

Why US Companies Should NOT Offshore IT

Ready to make AI operational?