Web Agent Evaluation Methods
| Benchmark | Graph Analysis | Multi-path | Scope | Judge |
|---|---|---|---|---|
| WebShop | No | No | Final message | Rule-based |
| Mind2Web | No | Yes | Single trajectory | Rule-based |
| WebArena | No | No | Single trajectory | LLM + rules |
| VisualWebArena | No | No | Final message | LLM + rules |
| WebVoyager | No | No | Final message | LLM |
| WebGraphEval (ours) | Yes | Yes | Trajectory ensemble | LLM + structural signals |
