GEO Reporting: Track AI Citations with Claude Code

Key Takeaways

LLM referral traffic is reported to convert at higher rates than organic search in early industry studies, making AI citation reporting a revenue conversation, not a vanity metrics conversation
AI Overviews now appear across a significant share of Google queries, and the share is growing as Google expands the feature to more query types and markets
Four core GEO metrics matter to leadership: citation rate, mention sentiment, platform coverage share, and competitor citation gap
Claude Code can automate the full data collection and report generation pipeline for under $20/month in API costs
Weekly cadence is the right default for GEO reporting: frequent enough to catch content freshness decay, not so frequent that you're chasing noise
See the AI visibility tools comparison for the full SaaS vs. build decision if you're starting from scratch

Your CMO is asking why you aren't in ChatGPT. Your client wants to know if "all this AI content stuff" is working. Your agency's quarterly business review is in three weeks.

You have citation data scattered across API logs, manual query tests, and half a spreadsheet. You do not have a report.

This is the GEO reporting gap. Most SEO teams have started some form of AI visibility monitoring. Almost none have built a system that produces stakeholder-ready numbers on a repeatable schedule.

Early reports suggest LLM-referred traffic may convert at meaningfully higher rates than traditional organic, though the data is preliminary and varies by industry. The business case for GEO reporting is forming quickly. The problem is the infrastructure to produce those reports reliably.

This guide covers the metrics that matter, how to collect them programmatically with Claude Code, and how to automate weekly output your leadership will read without needing a translation.

Why GEO Needs Its Own Reporting

GEO reporting is the practice of measuring and communicating brand citation performance across AI-powered search surfaces. It is distinct from traditional SEO reporting because AI engines do not return ranked lists. They either cite your brand in a synthesized answer or they do not.

That binary outcome demands different metrics, different data collection methods, and a different reporting structure than the rank/traffic/impressions model SEOs have used for a decade.

Leadership at most organizations now has a simple question: "Are we showing up in AI search?" They are not asking about Domain Authority or average position. They want to know if ChatGPT recommends their product when a potential customer asks for it. Without a structured GEO reporting system, the only honest answer is "we don't know."

The stakes are real. Gartner projected 25% of traditional search volume would shift to AI chatbots by end of 2026 (Gartner, February 2024). Google AI Overviews are expanding in coverage as Google continues rolling the feature out globally. The reporting infrastructure needs to match the channel's growth.

What Leadership Actually Wants to See

Executive stakeholders need three things from GEO reports: a number that shows current status, a trend that shows direction, and a comparison that shows competitive position. Everything else is supporting detail.

Translating GEO performance into those three elements is the reporting job. A citation rate of 23% means nothing without context. A citation rate of 23% versus a competitor's 41% across the same 50 branded queries creates a conversation about investment.

The Reporting Problem Is a Systems Problem

Most SEOs testing AI visibility run ad hoc queries manually. They open ChatGPT, search their brand, screenshot the result, repeat across three platforms, and call it monitoring. That approach does not scale, does not produce time-series data, and cannot be delegated or automated.

Building a repeatable GEO reporting system means moving from manual spot-checks to scripted data collection, structured storage, and automated report generation. Claude Code handles all three stages.

What to Track: The GEO Metrics That Matter

A GEO reporting framework consists of four primary metrics that translate AI citation performance into business-readable numbers. Each metric answers a different question that stakeholders ask.

The four metrics are: citation rate (are we mentioned at all), mention sentiment (how we are mentioned when we are), platform coverage (which AI engines we appear in versus which we are absent from), and competitor citation share (our presence relative to direct competitors across the same prompt set).

Citation Rate

Citation rate is the percentage of relevant prompts where your brand appears in the AI response. It is the baseline GEO metric.

Calculate it across a fixed prompt set that represents how real users ask for products or services in your category. The same prompt set, run weekly, produces trend data.

Metric	Definition	Target Benchmark
Prompted citation rate	Brand mentions / total prompts run	20%+ for established brands
Unprompted citation rate	Mentions in category questions (no brand name)	10%+ indicates strong authority
First-mention rate	% of times brand appears as the first citation	Tracks perceived authority position

Mention Sentiment

Mention sentiment tracks whether AI engines describe your brand positively, neutrally, or negatively when they cite it. A citation that says "Brand X has mixed reviews for customer support" is worse than no citation for some use cases.

Score each mention as positive, neutral, or negative. Track the ratio over time. Sudden shifts in sentiment often precede or follow product changes, PR events, or competitor moves that have altered how AI training data perceives your brand.

Platform Coverage

Platform coverage measures your presence across the main AI search surfaces: ChatGPT, Perplexity, Google AI Overviews, and Gemini. Citation patterns diverge significantly by platform.

ChatGPT favors encyclopedic and long-form content. Perplexity cites Reddit and community sources heavily. Google AI Overviews tend to favor YouTube and structured content. Citation patterns differ enough across platforms that a brand absent from Perplexity but visible in ChatGPT has a specific content gap to address.

Competitor Citation Share

Competitor citation share is the ratio of your brand citations to total brand citations (yours plus competitors) across the same prompt set. It is the metric that gets budget approved.

If three competitors are mentioned 180 times across 50 prompts and your brand is mentioned 20 times, your citation share is 10%. That number, tracked weekly, shows whether GEO investment is moving the needle.

CC for SEO Command Center

Pre-built Claude Code skills for technical audits, keyword clustering, and GSC/GA4 analysis.

Join the Waitlist

Be the first to get access

Building a Citation Monitoring Script with Claude Code

A citation monitoring system built with Claude Code queries AI engines programmatically, parses responses for brand mentions, classifies sentiment, and saves structured output to JSON. For a 50-prompt test set, the pipeline is fast enough to run as a scheduled job without manual intervention.

This section covers the working implementation. The AI search visibility guide covers the broader strategy context if you need it.

Project Structure

geo-reporting/
├── config.json              # Brand info, competitors, prompt library
├── monitor.py               # Main monitoring script
├── sentiment.py             # Mention classifier
├── report.py                # Report generator
├── data/
│   ├── citations/           # Weekly JSON outputs
│   └── reports/             # Generated markdown reports
└── CLAUDE.md                # Context for Claude Code

The Config File

Start with a config.json that defines your brand, competitors, and prompt library. This file drives the entire system.

{
  "brand": {
    "name": "Acme Corp",
    "aliases": ["Acme", "acmecorp.com"],
    "competitors": ["CompetitorA", "CompetitorB", "CompetitorC"]
  },
  "prompts": {
    "category": [
      "What is the best project management software for agencies?",
      "Which tools do SEO agencies use for client reporting?",
      "What software do marketing agencies use to manage projects?"
    ],
    "comparison": [
      "CompetitorA vs CompetitorB vs Acme Corp - which is better?",
      "What are the alternatives to CompetitorA for small agencies?"
    ],
    "branded": [
      "What do people say about Acme Corp?",
      "Is Acme Corp good for agency use?"
    ]
  },
  "platforms": ["openai", "perplexity", "google_aio"]
}

Separate prompts into three categories: category (no brand name, tests organic authority), comparison (includes competitors), and branded (direct brand queries). Each category tells a different story in the report.

The Monitoring Script

#!/usr/bin/env python3
"""
geo-reporting/monitor.py
Query AI platforms, detect citations, save structured output.
"""

import json
import datetime
import re
from openai import OpenAI
from anthropic import Anthropic

def load_config(path="config.json"):
    with open(path) as f:
        return json.load(f)

def query_chatgpt(prompt, client):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=800
    )
    return response.choices[0].message.content

def detect_mentions(response_text, brand_config):
    """Return dict of brand + competitors with mention presence and position."""
    results = {}
    all_entities = [brand_config["name"]] + brand_config["aliases"] + brand_config["competitors"]

    for entity in all_entities:
        pattern = re.compile(re.escape(entity), re.IGNORECASE)
        matches = list(pattern.finditer(response_text))
        results[entity] = {
            "mentioned": len(matches) > 0,
            "count": len(matches),
            "first_position": matches[0].start() if matches else None
        }
    return results

def classify_sentiment(response_text, brand_name, client):
    """Use Claude to classify how the brand is mentioned."""
    prompt = f"""
    In the following AI response, find all mentions of "{brand_name}".
    For each mention, classify the sentiment as: positive, neutral, negative, or not_mentioned.
    Return JSON only: {{"sentiment": "positive|neutral|negative|not_mentioned", "context": "brief quote"}}

    Response to analyze:
    {response_text}
    """
    result = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=300,
        messages=[{"role": "user", "content": prompt}]
    )
    try:
        return json.loads(result.content[0].text)
    except json.JSONDecodeError:
        return {"sentiment": "parse_error", "context": ""}

def run_monitoring_cycle(config):
    openai_client = OpenAI()  # uses OPENAI_API_KEY env var
    anthropic_client = Anthropic()  # uses ANTHROPIC_API_KEY env var

    results = {
        "run_date": datetime.date.today().isoformat(),
        "brand": config["brand"]["name"],
        "platform_results": {}
    }

    all_prompts = []
    for category, prompts in config["prompts"].items():
        for prompt in prompts:
            all_prompts.append({"category": category, "prompt": prompt})

    platform_results = []
    for item in all_prompts:
        response_text = query_chatgpt(item["prompt"], openai_client)
        mentions = detect_mentions(response_text, config["brand"])
        sentiment = classify_sentiment(response_text, config["brand"]["name"], anthropic_client)

        platform_results.append({
            "prompt_category": item["category"],
            "prompt": item["prompt"],
            "response": response_text,
            "mentions": mentions,
            "sentiment": sentiment
        })
        print(f"  Done: {item['prompt'][:60]}...")

    results["platform_results"]["chatgpt"] = platform_results

    # Save output
    date_str = datetime.date.today().isoformat()
    output_path = f"data/citations/{date_str}.json"
    with open(output_path, "w") as f:
        json.dump(results, f, indent=2)

    print(f"\nSaved: {output_path}")
    return results

if __name__ == "__main__":
    config = load_config()
    run_monitoring_cycle(config)

Run it from the terminal:

cd geo-reporting
python3 monitor.py

Output lands in data/citations/2026-03-06.json. Each run overwrites nothing. Weekly runs accumulate time-series data automatically.

Required Dependencies and API Keys

pip install openai anthropic

# Set environment variables (add to .zshrc or .bashrc)
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

API cost for a 50-prompt weekly run: approximately $0.40-0.80 with GPT-4o. Claude handles the sentiment classification for another $0.10-0.20. Total: under $1 per weekly monitoring cycle.

Automating Weekly GEO Reports

Once monitoring data accumulates across multiple weeks, the report generation step converts JSON files into a stakeholder-ready markdown summary. Claude Code reads all citation files in the data/citations/ directory, calculates trend metrics, and writes a formatted report.

The automation goal is zero manual work between "data exists" and "report is ready." The report script handles aggregation, trend calculation, and formatting.

Report Generation Script

#!/usr/bin/env python3
"""
geo-reporting/report.py
Read accumulated citation data, calculate metrics, generate markdown report.
"""

import json
import glob
import datetime
from anthropic import Anthropic

def load_all_runs(data_dir="data/citations"):
    files = sorted(glob.glob(f"{data_dir}/*.json"))
    runs = []
    for f in files:
        with open(f) as fp:
            runs.append(json.load(fp))
    return runs

def calculate_citation_rate(run, brand_name):
    """Brand mentions / total prompts as percentage."""
    results = run.get("platform_results", {}).get("chatgpt", [])
    if not results:
        return 0
    mentioned = sum(
        1 for r in results
        if r["mentions"].get(brand_name, {}).get("mentioned", False)
    )
    return round((mentioned / len(results)) * 100, 1)

def calculate_competitor_share(run, brand_name, competitors):
    """Brand citations / (brand + competitor citations) as percentage."""
    results = run.get("platform_results", {}).get("chatgpt", [])
    if not results:
        return 0

    brand_mentions = sum(
        1 for r in results
        if r["mentions"].get(brand_name, {}).get("mentioned", False)
    )
    competitor_mentions = sum(
        sum(1 for comp in competitors
            if r["mentions"].get(comp, {}).get("mentioned", False))
        for r in results
    )
    total = brand_mentions + competitor_mentions
    if total == 0:
        return 0
    return round((brand_mentions / total) * 100, 1)

def build_report_prompt(runs, config):
    brand = config["brand"]["name"]
    competitors = config["brand"]["competitors"]

    weekly_metrics = []
    for run in runs[-8:]:  # Last 8 weeks
        citation_rate = calculate_citation_rate(run, brand)
        comp_share = calculate_competitor_share(run, brand, competitors)
        weekly_metrics.append({
            "week": run["run_date"],
            "citation_rate": citation_rate,
            "competitor_share": comp_share
        })

    return f"""
    You are writing a weekly GEO report for stakeholders at a B2B SaaS company.
    Brand: {brand}
    Competitors tracked: {', '.join(competitors)}

    Weekly metrics (last 8 weeks):
    {json.dumps(weekly_metrics, indent=2)}

    Write a stakeholder-ready GEO performance report in markdown. Include:
    1. Executive summary (3-4 sentences, numbers first)
    2. This week's metrics table (citation rate, citation share, sentiment breakdown)
    3. 4-week trend table showing direction
    4. Top 3 observations (specific, not generic)
    5. Recommended actions (2-3 max, prioritized by impact)

    Tone: clear, direct, no jargon. Write for a CMO who tracks these numbers monthly.
    Format: clean markdown tables, bold key numbers.
    """

def generate_report(config):
    runs = load_all_runs()
    if not runs:
        print("No citation data found. Run monitor.py first.")
        return

    client = Anthropic()
    prompt = build_report_prompt(runs, config)

    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1500,
        messages=[{"role": "user", "content": prompt}]
    )

    report_text = response.content[0].text
    date_str = datetime.date.today().isoformat()
    output_path = f"data/reports/geo-report-{date_str}.md"

    with open(output_path, "w") as f:
        f.write(f"# GEO Report: {date_str}\n\n")
        f.write(report_text)

    print(f"Report saved: {output_path}")
    print("\n--- REPORT PREVIEW ---\n")
    print(report_text[:1000])

if __name__ == "__main__":
    from monitor import load_config
    config = load_config()
    generate_report(config)

Run both scripts back to back:

python3 monitor.py && python3 report.py

The generated markdown report lands in data/reports/. Copy it into Google Docs or Notion for distribution. The format is clean enough to send directly to a CMO without reformatting.

Scheduling with Cron

Add a weekly cron job to automate the full pipeline:

# Run every Monday at 8am
crontab -e

# Add this line:
0 8 * * 1 cd /path/to/geo-reporting && python3 monitor.py && python3 report.py >> data/cron.log 2>&1

The log file captures any API errors or timeouts for debugging. Check it weekly until the pipeline is stable, then ignore it.

Get Weekly Claude Code SEO Tips

Workflows, skills, and tactics for SEO professionals using Claude Code.

No spam. Unsubscribe anytime.

Benchmarking Against Competitors

Competitor benchmarking is the component that converts a GEO report from "interesting data" to "here is why we need budget." Tracking your citation rate in isolation tells leadership you are monitoring. Tracking it against competitors tells them where you stand.

The monitoring script already captures competitor mentions in the mentions field of each result. The benchmarking layer aggregates those mentions into a share-of-voice calculation across the same prompt set.

Competitor Share of Voice Table

For each weekly report, generate a share-of-voice table across all tracked competitors:

| Brand          | Citations (50 prompts) | Citation Rate | Share of Voice |
|----------------|------------------------|---------------|----------------|
| CompetitorA    | 31                     | 62%           | 44%            |
| CompetitorB    | 22                     | 44%           | 31%            |
| Acme Corp      | 14                     | 28%           | 20%            |
| CompetitorC    | 4                      | 8%            | 6%             |

That table, with a 4-week trend column, answers the CMO's question in ten seconds.

Prompt-Level Breakdown

The more detailed view shows which prompt categories your brand wins versus where competitors dominate. Category prompts (no brand names) reveal organic authority. Comparison prompts show head-to-head positioning.

# Ask Claude Code to analyze the breakdown from your JSON data
claude

> Read data/citations/2026-03-06.json and calculate citation rates by prompt category
> (category, comparison, branded) for Acme Corp and all competitors.
> Format as a markdown table.

Claude Code reads the JSON, runs the calculation, and returns the table without any additional scripting. No Python required for ad hoc breakdowns.

Tracking Freshness Decay

Practitioners have observed that citation rates can decline noticeably as content ages, particularly in Perplexity, which weights recency heavily in its ranking signals. Track the publish dates of pages being cited in your monitoring runs. When citation rates drop for specific prompt clusters, check whether the sourced content has aged significantly before assuming a strategic problem.

Add a content_age_check field to your prompt library config that maps each prompt to the most relevant page URL. The monitoring script can flag when citation rates drop on prompts where your primary page is older than 30 days.

FAQ

What is the difference between GEO reporting and traditional SEO reporting?

Traditional SEO reporting tracks ranked positions, organic traffic, and impression counts from search engines. GEO (Generative Engine Optimization) reporting tracks whether AI engines cite your brand in synthesized answers. The data sources, collection methods, and metrics are entirely different. SEO tools show you where you rank in a list. GEO reporting shows you whether an AI model recommends your brand when a user asks for a product or service recommendation.

How many prompts should I track per week?

Start with 30-50 prompts split across category, comparison, and branded query types. This gives statistically meaningful rates without inflating API costs. Expand the prompt library once you have 4-6 weeks of baseline data and know which prompt categories drive the most variance. A prompt library over 150 prompts per platform starts generating more noise than signal at the weekly cadence.

Which AI platforms should I prioritize for GEO monitoring?

Start with ChatGPT and Google AI Overviews. ChatGPT has the largest user base of any AI chat platform and accounts for the majority of measured AI referral traffic in available studies. Google AI Overviews have broad reach through existing Google search users. Perplexity is the third priority for B2B brands because its user base skews toward research-heavy queries where buyer intent is high. Gemini and Grok are lower priority unless your brand operates in categories those platforms favor.

How do I access Google AI Overviews programmatically?

Google does not offer a direct API for AI Overviews responses. Use a SERP API like DataForSEO (approximately $0.01 per query) or SerpApi (from $75/month) that captures AI Overview content in structured JSON alongside standard SERP results. The AI visibility tools comparison covers the full API options with current pricing.

How do I present GEO reports to clients who have never seen this data before?

Lead with the business number, not the methodology. "Your brand appeared in 23% of relevant AI search queries this week, compared to 41% for CompetitorA" lands immediately. Explain the methodology in a footnote or appendix for clients who ask. Most will not. Keep the primary report to one page with three tables: this week's metrics, the 4-week trend, and the competitor share of voice. Attach the full prompt list and raw data as supplemental for transparency.

What does a good citation rate look like?

There is no established industry benchmark yet because the channel is new. An internal benchmark built from your own 8-12 week baseline is more useful than any published figure. As a rough orientation: established consumer brands with strong digital presence typically see 25-40% citation rates across category prompts. B2B SaaS brands in competitive categories often start at 10-20% and move upward with content investment. What matters is the trend direction and the competitor gap, not the absolute number.

The GEO reporting infrastructure described here is a half-day build for most technical SEOs. After that, the weekly cycle runs with minimal human time: cron triggers the scripts, data accumulates, the report generates, and you spend that time on analysis instead of collection.

The full Claude Code for SEO setup includes pre-built versions of the monitoring and report scripts as part of the SEO Command Center kit, with prompt libraries across 12 B2B and e-commerce verticals already included.