LLM Inference Analytics

Prefill/decode phases, KV cache efficiency, fragmentation, and disaggregation insights

Healthy

Requests (1h)

Cache Hit Rate

—

HBM Utilization

—

TTFT Improvement

—

Potential via disaggregation

Phase Bottleneck Distribution

Prefill Bound0 (0%)

Compute-bound on prompt encoding

Decode Bound0 (0%)

Memory-bound on token generation

Balanced0 (0%)

Phases are balanced

Underutilized0 (0%)

Resources not fully used

0.0%

Phase Imbalance

0.0%

Disagg Benefit Ratio

KV Cache Health

Hit Rate

Fragmentation

HBM Usage

Cache Opportunity

unknown

Optimization Opportunities

High Memory / Low Hit Rate0

Fragmentation Issues0

Small Block Overhead0

Normal0

Insights (0)

No actionable insights right now. Insights will appear when inference workloads generate phase timing data.

Cache Recommendations (0)

No cache recommendations available. These appear when KV cache snapshots are collected from running inference workloads.