How it Works
How it Works
GPUSprint is a lightweight visualization layer. The real magic happens inside your cluster where our open-source daemon securely parses GPU metrics, allowing you to get all metrics in completely open formats to build customized dashboards as you see fit.
The Architecture
1. DaemonSet
Our open-source Go agent runs securely as a DaemonSet on every accelerator node in your Kubernetes cluster. It binds to the host to access low-level PCI device telemetry directly from NVIDIA, AMD, Tenstorrent, and Cloud TPUs.
2. Metrics Export
The daemon formats raw hardware data into standardized OpenTelemetry (OTLP) and Prometheus metrics. It pushes these data points to your existing metrics sink (e.g., Datadog, Prometheus, or GCP Cloud Monitoring).
3. SaaS or Custom Dashboard
Because the data is stored in standard OpenTelemetry and Prometheus formats, you can use our GPUSprint UI, or completely customize your own dashboards in Grafana, Datadog, or whatever you choose. We correlate Kubernetes Pod metadata with raw hardware metrics out-of-the-box.
Your Metrics. Completely Open.
Every metric GPUSprint collects is exported in industry-standard open formats. You are never locked in. Pipe the data wherever you want - build your own Grafana boards, run custom PromQL queries, wire it into your alerting system, or feed it into your own ML pipelines.
Prometheus / OpenMetrics example
gpusprint_gpu_utilization_percent{
vendor="NVIDIA",
model="H100",
pod="trainer-job-0",
namespace="ai-workloads",
cluster="us-central1-a"
} 87.4
Works with your existing stack
Grafana + Prometheus
Full PromQL query access, unlimited custom boards
Datadog
Push via OTLP receiver, use alongside your existing infra metrics
GCP Cloud Monitoring
Native OTLP ingestion with Google-managed retention
Custom / Bring Your Own
Any OTLP-compatible backend works out of the box
100% Open-Source Agent
We believe security infrastructure shouldn't be a black box. Our node agent is written in Go and completely open-source. You can review the code, build it yourself, and verify exactly what telemetry leaves your cluster.
Why a separate DaemonSet?
Deep Hardware Support: Official driver exporters often only support one vendor. Our agent unifies NVIDIA (NVML), AMD (ROCm), Tenstorrent, and TPUs in one process.
K8s Context: Hardware exporters know nothing about Kubernetes. Our daemon explicitly ties PCI device processes to specific Pods and Namespaces.
Low Overhead: Written in highly optimized Go, minimizing CPU footprint on expensive accelerator hosts.