Evaluate AI response quality with Prophecy benchmarking

Benchmarking in Prophecy Gov gives you a structured way to compare AI research across multiple municipalities and data categories at once. Instead of asking individual questions one by one, you describe what you want to compare, and the AI builds a table — with each row representing a municipality and each column representing a data point you care about. Every cell contains a researched value with citations linking back to the sources the AI used.

What a benchmark is

A benchmark is a comparison table made up of three parts:

Municipalities — the cities or government entities you want to compare (for example, your neighboring cities or peer governments)
Attributes — the data points or questions you want answered for each municipality (for example, population, annual budget, or permit processing time)
Cells — the AI-researched values at each municipality/attribute intersection, each backed by citations

The AI researches each cell independently, searching the web and your uploaded documents for the most accurate, up-to-date answer it can find.

Why benchmarking is useful

Compare peer cities

See how your city stacks up against neighboring or comparable municipalities across the metrics that matter to your department.

Evaluate AI quality

Review the sources and explanations behind each cell to assess how well the AI researched each answer.

Research at scale

Get a full comparison table in one request instead of running dozens of individual searches or queries.

Export for reports

Export completed benchmarks to CSV and paste results directly into staff reports or presentations.

How benchmarks work

Benchmarking runs in two phases: Setup — You describe what you want to compare in plain language. The AI proposes a list of municipalities and attributes for your table. You can edit, add, or remove rows and columns before the research begins. Research — Once you confirm the table structure, the AI researches each cell in parallel. Results stream in as they are completed. Each cell includes a short value for the table and an explanation with citations you can inspect.

Research runs in the background. You can leave the page and come back — completed cells are saved automatically as they finish.

Who can create benchmarks

Any Prophecy Gov user can create benchmarks. Benchmarks are private by default — only you can see and edit your own. City administrators can view all benchmarks created by members of their city, which is useful for reviewing research quality or reusing work across the team.

Navigating to benchmarks

Benchmarks are created and accessed through the AI chat interface. To start a new benchmark, open a chat and ask the assistant to build a comparison table — for example:

“Compare our city’s permit processing times, annual budget, and population to five comparable cities in California.”

The assistant will recognize the request as a benchmarking task and open the benchmarking interface inline in the chat. A split-screen view appears with the conversation on the left and the live comparison table on the right. Previously created standalone benchmarks are accessible from the sidebar under your chat history. These legacy benchmarks are read-only — you can browse the table and export the data, but you cannot edit or re-run research on them.

You can also start a benchmark by asking the AI a question in a regular chat. If the assistant determines that a comparison table would best answer your question, it may suggest creating a benchmark from the conversation.

Get Started

Core Features

Integrations

Administration

Benchmarking

Evaluate AI response quality with Prophecy benchmarking

What a benchmark is

Why benchmarking is useful

Compare peer cities

Evaluate AI quality

Research at scale

Export for reports

How benchmarks work

Who can create benchmarks

Navigating to benchmarks

Get Started

Core Features

Integrations

Administration

Benchmarking

​What a benchmark is

​Why benchmarking is useful

Compare peer cities

Evaluate AI quality

Research at scale

Export for reports

​How benchmarks work

​Who can create benchmarks

​Navigating to benchmarks

What a benchmark is

Why benchmarking is useful

How benchmarks work

Who can create benchmarks

Navigating to benchmarks