How it Works

GPUSprint is a lightweight visualization layer. The real magic happens inside your cluster where our open-source daemon securely parses GPU metrics, allowing you to get all metrics in completely open formats to build customized dashboards as you see fit.

The Architecture

1. DaemonSet

Our open-source Go agent runs securely as a DaemonSet on every accelerator node in your Kubernetes cluster. It binds to the host to access low-level PCI device telemetry directly from NVIDIA, AMD, Tenstorrent, and Cloud TPUs.

2. Metrics Export

The daemon formats raw hardware data into standardized OpenTelemetry (OTLP) and Prometheus metrics. It pushes these data points to your existing metrics sink (e.g., Datadog, Prometheus, or GCP Cloud Monitoring).

3. SaaS or Custom Dashboard

Because the data is stored in standard OpenTelemetry and Prometheus formats, you can use our GPUSprint UI, or completely customize your own dashboards in Grafana, Datadog, or whatever you choose. We correlate Kubernetes Pod metadata with raw hardware metrics out-of-the-box.

Your Metrics. Completely Open.

Every metric GPUSprint collects is exported in industry-standard open formats. You are never locked in. Pipe the data wherever you want - build your own Grafana boards, run custom PromQL queries, wire it into your alerting system, or feed it into your own ML pipelines.

Prometheus / OpenMetrics example

gpusprint_gpu_utilization_percent{

vendor="NVIDIA",

model="H100",

pod="trainer-job-0",

namespace="ai-workloads",

cluster="us-central1-a"

} 87.4

Works with your existing stack

Grafana + Prometheus

Full PromQL query access, unlimited custom boards

Datadog

Push via OTLP receiver, use alongside your existing infra metrics

GCP Cloud Monitoring

Native OTLP ingestion with Google-managed retention

Custom / Bring Your Own

Any OTLP-compatible backend works out of the box

100% Open-Source Agent

We believe security infrastructure shouldn't be a black box. Our node agent is written in Go and completely open-source. You can review the code, build it yourself, and verify exactly what telemetry leaves your cluster.

View the source code

Why a separate DaemonSet?

Deep Hardware Support: Official driver exporters often only support one vendor. Our agent unifies NVIDIA (NVML), AMD (ROCm), Tenstorrent, and TPUs in one process.
K8s Context: Hardware exporters know nothing about Kubernetes. Our daemon explicitly ties PCI device processes to specific Pods and Namespaces.
Low Overhead: Written in highly optimized Go, minimizing CPU footprint on expensive accelerator hosts.