LLM Inference Analytics

Prefill/decode phases, KV cache efficiency, fragmentation, and disaggregation insights

Healthy

Requests (1h)

0

Cache Hit Rate

HBM Utilization

TTFT Improvement

Potential via disaggregation

Phase Bottleneck Distribution
Prefill Bound0 (0%)

Compute-bound on prompt encoding

Decode Bound0 (0%)

Memory-bound on token generation

Balanced0 (0%)

Phases are balanced

Underutilized0 (0%)

Resources not fully used

0.0%

Phase Imbalance

0.0%

Disagg Benefit Ratio

KV Cache Health
0%

Hit Rate

0%

Fragmentation

0%

HBM Usage

Cache Opportunity

unknown

Optimization Opportunities

High Memory / Low Hit Rate0
Fragmentation Issues0
Small Block Overhead0
Normal0
Insights (0)

No actionable insights right now. Insights will appear when inference workloads generate phase timing data.

Cache Recommendations (0)

No cache recommendations available. These appear when KV cache snapshots are collected from running inference workloads.