JVM Observability Platform GitHub

Argus vs Traditional JVM Tools

Argus's value proposition is brutally simple: replace the JFR-capture-then-GUI-then-grep workflow with one CLI command. Below are five real production scenarios where Argus collapses 5–10 manual steps into one or two commands.

Scenario Traditional Path Argus Steps Saved
GC pressure under traffic jstat -gc + log grep + interpretation argus doctor <PID> 1 cmd vs 5
Direct memory leak jcmd VM.native_memory baseline + detail.diff + scan 200 lines argus nmt <PID> --diff baseline.json scan → banner + top growers
Lock contention jstack × 3 + manual waiting to lock aggregation argus pool <PID> 1 cmd vs 3 jstacks + grep
ZGC allocation stalls jcmd JFR.start (30s) → JMC GUI → ZGC tab → async-profiler -e alloc (30s) argus zgc <PID> (30s, includes top-5 alloc sites) 1 cmd vs JFR + GUI + separate profiler
ZGC trend across deploy Author custom Prometheus rule comparing committed vs SoftMax argus zgc <PID> --save=pre.txt then --diff=pre.txt 1 cmd vs DIY metric pipeline

The ZGC workflow is the clearest illustration. The traditional path requires four jcmd calls, a GUI tool, and a second profiler session — then you manually correlate two output streams. Argus captures the same signals in one command:

# Traditional: 4 commands + GUI
$ jcmd 12345 JFR.start name=zgc duration=30s filename=/tmp/zgc.jfr settings=profile
$ jcmd 12345 JFR.dump  name=zgc filename=/tmp/zgc.jfr
$ jcmd 12345 JFR.stop  name=zgc
$ jmc /tmp/zgc.jfr   # then click ZGC > Allocation Stalls tab
$ async-profiler -e alloc -d 30s 12345  # for call-site identification

# Argus: 1 command, 30 seconds, includes call sites
$ argus zgc 12345
ZGC Diagnosis (PID 12345, JDK 21, Generational)
Allocation Stalls  ✘ 7 stalls in 30s (max 508.8ms)
  Top alloc sources during capture (n=14,832 events)
    1. com.example.JsonParser.parseObject(JsonParser.java:142)  38.2%
    2. java.util.HashMap.resize(HashMap.java:711)               22.1%

Full methodology and 9 scenarios with measured wallclock times: comparison study →

For twelve full end-to-end production incidents — including humongous allocations, TTSP, OOMKilled, JIT deoptimization storms, false sharing, and direct buffer exhaustion — see Real-World Scenarios.