LLM Inference Analytics
Prefill/decode phases, KV cache efficiency, fragmentation, and disaggregation insights
Requests (1h)
0
Cache Hit Rate
—
HBM Utilization
—
TTFT Improvement
—
Potential via disaggregation
Phase Bottleneck Distribution
Prefill Bound0 (0%)
Compute-bound on prompt encoding
Decode Bound0 (0%)
Memory-bound on token generation
Balanced0 (0%)
Phases are balanced
Underutilized0 (0%)
Resources not fully used
0.0%
Phase Imbalance
0.0%
Disagg Benefit Ratio
KV Cache Health
0%
Hit Rate
0%
Fragmentation
0%
HBM Usage
Cache Opportunity
unknown
Optimization Opportunities
High Memory / Low Hit Rate0
Fragmentation Issues0
Small Block Overhead0
Normal0
Insights (0)
No actionable insights right now. Insights will appear when inference workloads generate phase timing data.
Cache Recommendations (0)
No cache recommendations available. These appear when KV cache snapshots are collected from running inference workloads.