Benchmarks

Two benchmarks measuring the efficiency of using procurement.txt versus traditional web scraping. Both benchmarks task an agent with the same procurement workflow: find M10 hex bolts, get a quote for 500 units, place an order if the price is under $0.45/unit, and retrieve tracking information. Both approaches complete the task — procurement.txt does it faster, cheaper, and with less data.

Benchmark 1: Structured Agent

deterministic

A scripted Python agent (no LLM) running against a local mock merchant server. Two variants: a scraping agent that navigates HTML pages, and a procurement.txt agent that fetches /procurement.txt, discovers the OpenAPI spec, and uses JSON APIs. Both complete the full workflow.

Task completion time

Without

0.051s

With procurement.txt

0.024s

~2x faster

Data transferred

Without

~140 KB HTML vs JSON

With procurement.txt

~15 KB HTML vs JSON

~10x less bandwidth

HTTP requests

Without

7 requests

With procurement.txt

8 requests

Similar request count

Response payload

Without

HTML pages

With procurement.txt

JSON responses

Structured data, no parsing overhead

Run-by-run results

Run	Scraping agent	procurement.txt agent	Notes
1	0.051s, 7 reqs, ~140 KB	0.025s, 7 reqs, ~15 KB	procurement.txt agent needed 1 retry to narrow catalog search
2	0.051s, 7 reqs, ~140 KB	0.025s, 8 reqs, ~15 KB	Both completed full workflow
3-6	~0.051s, 7 reqs, ~140 KB	~0.024s, 8 reqs, ~15 KB	Consistent results across all runs

6 runs with both agents. The procurement.txt agent uses slightly more HTTP requests (8 vs 7) due to catalog search pagination, but transfers ~10x less data overall.

Key observation

Both agents found human escalation channels, but through different paths. The scraping agent found phone and email from the HTML footer. The procurement.txt agent found live-chat and email from the structured Escalation field — a richer, more machine-parseable result.

Benchmark 2: LLM Agent

claude-sonnet-4-20250514

A Claude Sonnet agent benchmarked on the same procurement task with real API calls, token consumption, and cost tracking. Two conditions: one system prompt instructing the agent to browse and scrape HTML, and another instructing it to check for /procurement.txt first. Both approaches complete the task, but the efficiency difference is significant.

Average cost per run

Without

$1.23

With procurement.txt

$0.20

~6x cheaper

Average tokens consumed

Without

~341K

With procurement.txt

~58K

~5.8x fewer tokens

Average elapsed time

Without

~713s (~12 min vs ~3.5 min)

With procurement.txt

~213s (~12 min vs ~3.5 min)

~3.3x faster

Average tool result data

Without

~120 KB

With procurement.txt

~21 KB

~5.7x less data processed

Run-by-run results

Run	Without procurement.txt	With procurement.txt
1	4 turns · 53K tokens · $0.21 · 62s	11 turns · 64K tokens · $0.22 · 92s
2	12 turns · 295K tokens · $1.10 · 460s	10 turns · 57K tokens · $0.19 · 161s
3	16 turns · 499K tokens · $1.82 · 945s	10 turns · 55K tokens · $0.19 · 371s
4	12 turns · 518K tokens · $1.78 · 1383s	10 turns · 57K tokens · $0.19 · 227s

4 runs per condition. Without procurement.txt, the worst-case run consumed 518K tokens ($1.78) and took 23 minutes to complete the same task.

Why scraping is less efficient

Without procurement.txt, the LLM agent receives large HTML pages (18–28 KB each) containing navigation, styling, and other content irrelevant to the procurement task. The agent must parse these pages, extract form fields, and reason about page structure — all of which consumes tokens. With procurement.txt, the agent works with compact JSON API responses (~21 KB total vs ~120 KB), spending tokens on the actual task rather than on interpreting page layout.

Tool usage patterns

Without procurement.txt

http_getextract_form_fieldsparse_htmlhttp_post

With procurement.txt

http_getparse_procurement_txthttp_post

The procurement.txt path uses fewer, simpler tools — no HTML parsing or form extraction needed.

Summary

Across both benchmarks, agents using procurement.txt consumed significantly less data (~10x less bandwidth in the structured test, ~5.7x less tool result data in the LLM test) and completed tasks faster (~2x in the structured test, ~3.3x in the LLM test).

The LLM benchmark showed the most dramatic efficiency gains: a 6x cost reduction ($0.20 vs $1.23 average per run) and 5.8x fewer tokens consumed. The scraping approach works, but it forces the agent to spend most of its time and budget parsing HTML rather than executing the procurement workflow. Providing structured, machine-readable metadata lets agents focus on the task itself.