Inicio/Blog/Technical SEO

Programmatic SEO with Claude Code

Generate hundreds of SEO-optimized pages from data using Claude Code. Build templates, validate content quality, and deploy programmatic SEO at scale.

Última actualización: 2026-03-0614 min read

Key Takeaways

  • Template quality is the constraint: Generating 10,000 pages is easy. Generating 10,000 pages Google won't treat as thin content is the hard part.
  • Claude Code closes the quality gap by generating unique, contextually-aware content per page rather than variable substitution in a string template.
  • Data preparation determines page quality before a single line of content gets written — garbage in, garbage out applies at every scale.
  • Automated validation catches failure modes early: Check uniqueness scores, meta tag completeness, schema validity, and internal link saturation before deployment.
  • Monitoring at scale requires a different mental model: Track page cohorts by template type, not individual URLs. See which Claude Code SEO skills apply to rank analysis across thousands of pages.
  • The biggest risk is Google's helpful content system — bulk-generated pages that add no real-world value get devalued in aggregate, not just individually.

Programmatic SEO is having a second moment. Zapier has built 100,000+ integration pages targeting tool-specific queries. Nomadlist generates city comparison pages from a database. G2 builds product review landing pages at scale. The pattern works — when the underlying data has genuine value per page.

What's changed in 2026 is the tooling. Earlier workflows required custom scripts for every content variation. Claude Code compresses that into a feedback loop where you describe the output format, hand it structured data, and iterate on content quality without switching contexts.

This post covers the full implementation: data structure, template design, content generation with quality controls, pre-deployment validation, and post-launch monitoring. All commands are tested on real datasets.

What Programmatic SEO Looks Like in 2026

Programmatic SEO is a content strategy that uses structured data to generate large volumes of unique, indexable pages targeting long-tail search queries at scale. Each page targets a specific query variant — city + service, product + use case, tool + integration — where no single page would justify manual creation but the collection captures substantial search demand.

The template-plus-data model has been viable for years. The shift in 2026 is quality. Google's helpful content system now evaluates content at a site level, meaning a large batch of thin pages drags down the ranking potential of your strong editorial content. The threshold for "unique value per page" has moved up, and pure variable substitution in templates rarely clears it.

Programmatic SEO now requires two things to coexist: systematic page generation and per-page content that genuinely varies based on the underlying data. Claude Code handles the second part — generating content that reads and performs differently across pages because the input data is actually different, not because you shuffled synonyms.

The sites that fail at programmatic SEO in 2026 are generating pages where the only difference is a city name. The sites that succeed have differentiated data per entity: local stats, specific product features, real pricing differences, actual user reviews. Claude Code helps you write to those differences at scale.

Data Preparation and Template Design

Data preparation is the highest-leverage step in a programmatic SEO build. The quality of your page content is capped by the quality and completeness of your data. Richer data per entity produces more differentiated content and stronger pages.

Start by auditing your data source for the fields that will drive content differentiation. A city-based service page with only city_name, state, and population produces thin content. Add avg_project_cost, local_competitor_count, license_requirements, seasonal_demand_index, and top_local_neighborhoods, and you have material for a page that actually differs city to city.

Structuring Your Data Source

Store your entities in a flat JSON array or SQLite database. JSON works for datasets under 50,000 rows. SQLite handles anything larger and lets you run queries to validate data completeness before generation.

[
  {
    "city": "Austin",
    "state": "TX",
    "state_abbr": "TX",
    "population": 978908,
    "avg_project_cost_usd": 4200,
    "top_neighborhoods": ["South Congress", "East Austin", "Domain"],
    "license_body": "Texas Department of Licensing and Regulation",
    "seasonal_peak": "spring",
    "local_competitor_count": 47,
    "median_income": 71000,
    "climate_zone": "hot-humid"
  }
]

Check data completeness before you write a single template:

# Count records with missing critical fields
cat data/cities.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
missing = [r['city'] for r in data if not r.get('avg_project_cost_usd')]
print(f'{len(missing)} records missing avg_project_cost_usd')
print(missing[:10])
"

Designing the Template

Your template is a prompt structure, not a string with {{placeholders}}. The distinction matters. A string template substitutes values and returns identical sentence structures across pages. A prompt template gives Claude Code the entity data and describes the target output format — Claude generates content that fits the data, not the other way around.

TEMPLATE = """
You are writing a service page for {city}, {state}.

Entity data:
{entity_json}

Write the following sections. Each must use the specific data provided:

1. HEADLINE (H1): Under 60 characters. Include city name and primary service keyword.
2. INTRO (2 paragraphs): Lead with a city-specific fact. Reference local cost data.
3. LOCAL_CONTEXT (1 paragraph): Use the top neighborhoods. Reference seasonal demand.
4. REQUIREMENTS (1 paragraph): Cite the correct licensing body for this state.
5. PRICE_SECTION (2 sentences): State the local average cost. Give a price range ±20%.

Output as JSON with keys: headline, intro, local_context, requirements, price_section.
Do not add content not grounded in the entity data.
"""

The instruction "do not add content not grounded in the entity data" is load-bearing. Without it, Claude generates plausible-but-fabricated local details — exactly what gets programmatic sites penalized.

Content Generation with Quality Controls

Content generation for programmatic SEO requires treating each page as a structured output problem, not a writing task. Claude Code generates content that is verifiably grounded in the source data when you constrain the output format and validate before accepting.

The generation loop runs entity-by-entity, validates output structure, checks for content uniqueness against already-generated pages, and flags anomalies for human review before writing to disk.

CC for SEO Command Center

Pre-built Claude Code skills for technical audits, keyword clustering, and GSC/GA4 analysis.

Join the Waitlist

Be the first to get access

The Generation Script

import json
import anthropic
import hashlib
from pathlib import Path

client = anthropic.Anthropic()

def generate_page(entity: dict, template: str) -> dict:
    prompt = template.format(
        city=entity["city"],
        state=entity["state"],
        entity_json=json.dumps(entity, indent=2)
    )

    message = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )

    raw = message.content[0].text
    # Strip markdown code fences if present
    if raw.startswith("```"):
        raw = raw.split("```")[1]
        if raw.startswith("json"):
            raw = raw[4:]

    return json.loads(raw.strip())

def content_fingerprint(content: dict) -> str:
    # Hash key sections to detect near-duplicate output
    blob = content.get("intro", "") + content.get("local_context", "")
    return hashlib.md5(blob.encode()).hexdigest()

def run_generation(entities_path: str, output_dir: str, template: str):
    entities = json.loads(Path(entities_path).read_text())
    out = Path(output_dir)
    out.mkdir(exist_ok=True)

    seen_fingerprints = set()
    errors = []

    for entity in entities:
        city_slug = entity["city"].lower().replace(" ", "-")
        out_path = out / f"{city_slug}.json"

        if out_path.exists():
            continue  # Resume support

        try:
            content = generate_page(entity, template)
            fp = content_fingerprint(content)

            if fp in seen_fingerprints:
                errors.append({"city": entity["city"], "error": "duplicate_fingerprint"})
                continue

            seen_fingerprints.add(fp)
            content["_entity"] = entity  # Attach source data for validation
            out_path.write_text(json.dumps(content, indent=2))
            print(f"OK: {entity['city']}")

        except json.JSONDecodeError as e:
            errors.append({"city": entity["city"], "error": f"json_parse: {e}"})
        except Exception as e:
            errors.append({"city": entity["city"], "error": str(e)})

    Path("errors.json").write_text(json.dumps(errors, indent=2))
    print(f"\nDone. {len(errors)} errors. See errors.json.")

Run it in batches to manage API rate limits:

python3 generate.py --entities data/cities.json --output generated/ --batch-size 50

Handling Errors at Scale

Generation errors cluster around two patterns: malformed JSON output (Claude sometimes wraps in markdown fences despite instructions) and missing required sections (entity data was incomplete). Both are detectable before you write to disk.

Add a retry with a stricter prompt for JSON errors. Flag incomplete-data entities to a separate queue for human review rather than generating thin content automatically.

SEO Validation Before Deployment

Pre-deployment validation is the difference between a programmatic build that ranks and one that triggers a manual review. Automated checks catch meta tag issues, schema errors, duplicate content, and internal linking gaps before a single URL gets indexed.

Run validation as a separate pass over the generated files, not inline with generation. Separating the steps means you can re-validate without re-generating, and you get a clean audit log per batch.

Validation Script

import json
from pathlib import Path

REQUIRED_FIELDS = ["headline", "intro", "local_context", "requirements", "price_section"]
MIN_INTRO_WORDS = 80
MAX_HEADLINE_CHARS = 60

def validate_page(content: dict, city: str) -> list[str]:
    issues = []

    # Field completeness
    for field in REQUIRED_FIELDS:
        if not content.get(field):
            issues.append(f"MISSING_FIELD:{field}")

    # Headline length
    headline = content.get("headline", "")
    if len(headline) > MAX_HEADLINE_CHARS:
        issues.append(f"HEADLINE_TOO_LONG:{len(headline)}")

    if city.lower() not in headline.lower():
        issues.append("HEADLINE_MISSING_CITY")

    # Intro length
    intro_words = len(content.get("intro", "").split())
    if intro_words < MIN_INTRO_WORDS:
        issues.append(f"INTRO_TOO_SHORT:{intro_words}_words")

    # Grounding check — does city name appear in local content?
    local = content.get("local_context", "")
    if city.lower() not in local.lower():
        issues.append("LOCAL_CONTEXT_NOT_LOCALIZED")

    return issues

def run_validation(generated_dir: str):
    results = []
    files = list(Path(generated_dir).glob("*.json"))

    for f in files:
        content = json.loads(f.read_text())
        city = content.get("_entity", {}).get("city", f.stem)
        issues = validate_page(content, city)
        results.append({"file": f.name, "city": city, "issues": issues, "pass": len(issues) == 0})

    passed = sum(1 for r in results if r["pass"])
    print(f"{passed}/{len(results)} pages passed validation")

    failures = [r for r in results if not r["pass"]]
    Path("validation-report.json").write_text(json.dumps(failures, indent=2))
    return results
python3 validate.py --dir generated/
# 847/900 pages passed validation
# 53 failures written to validation-report.json

Investigate failure patterns in the report. If 40 of 53 failures are INTRO_TOO_SHORT, the issue is the prompt, not the data. Fix the template and regenerate that cohort.

Schema Validation

Generate JSON-LD schema per page and validate it before deployment. For a local service page, the minimum viable schema is LocalBusiness with address, areaServed, priceRange, and aggregateRating if you have review data.

def generate_schema(entity: dict, content: dict) -> dict:
    return {
        "@context": "https://schema.org",
        "@type": "LocalBusiness",
        "name": f"[Service] in {entity['city']}, {entity['state_abbr']}",
        "address": {
            "@type": "PostalAddress",
            "addressLocality": entity["city"],
            "addressRegion": entity["state_abbr"],
            "addressCountry": "US"
        },
        "areaServed": entity["city"],
        "priceRange": f"${int(entity['avg_project_cost_usd'] * 0.8)}-${int(entity['avg_project_cost_usd'] * 1.2)}"
    }

Validate the output with Google's Rich Results Test API or a local schema validator before submitting your sitemap.

Get Weekly Claude Code SEO Tips

Workflows, skills, and tactics for SEO professionals using Claude Code.

No spam. Unsubscribe anytime.

Monitoring and Iteration at Scale

Monitoring thousands of programmatic pages requires thinking in cohorts, not individual URLs. You cannot meaningfully track 10,000 URLs in a spreadsheet. Group pages by template type, data tier, or geographic cluster and measure cohort-level performance.

Set up a monitoring query in Google Search Console that pulls performance data grouped by URL pattern. For a city-based build at /services/{city}/, filter for that path prefix and analyze aggregate impressions, clicks, and average position across the cohort.

Pulling Cohort Data with Claude Code

Set up a GSC fetcher (see the SEO Command Center guide for full setup) and query by URL pattern:

def fetch_programmatic_cohort(service, site_url, url_prefix, start_date, end_date):
    response = service.searchanalytics().query(
        siteUrl=site_url,
        body={
            "startDate": start_date,
            "endDate": end_date,
            "dimensions": ["page", "query"],
            "dimensionFilterGroups": [{
                "filters": [{
                    "dimension": "page",
                    "operator": "contains",
                    "expression": url_prefix
                }]
            }],
            "rowLimit": 25000
        }
    ).execute()
    return response.get("rows", [])

Feed the output to Claude Code and ask:

Analyze this GSC data for our /services/ programmatic pages. Group by state. Which state cohorts have impressions but near-zero clicks? Which have strong CTR but low average position? Identify the 20 pages with the highest impression-to-click gap.

That surfaces two distinct problems: pages with poor title tags (high impressions, low CTR) and pages stuck on page 2-3 (strong CTR per impression, weak position). Each needs a different fix.

Identifying Iteration Priorities

Look for patterns in underperforming pages before rewriting content. Underperformers often cluster around data quality, not content quality. A cohort of pages with consistently low impressions often indicates low search volume for that entity type — not a content problem at all.

Build a simple scoring model in Python:

def score_pages(gsc_rows: list) -> list:
    scored = []
    for row in gsc_rows:
        page = row["keys"][0]
        clicks = row.get("clicks", 0)
        impressions = row.get("impressions", 0)
        position = row.get("position", 100)
        ctr = row.get("ctr", 0)

        # Priority score: high impressions + low CTR = title/meta fix
        # High impressions + low position = content depth fix
        title_fix_score = impressions * (1 - ctr) if impressions > 100 else 0
        content_fix_score = impressions * (position / 10) if position > 10 else 0

        scored.append({
            "page": page,
            "clicks": clicks,
            "impressions": impressions,
            "position": round(position, 1),
            "title_fix_priority": round(title_fix_score),
            "content_fix_priority": round(content_fix_score)
        })

    return sorted(scored, key=lambda x: x["title_fix_priority"], reverse=True)

Run this monthly. Export to JSON, load in Claude Code, and ask it to identify which pages warrant a content depth expansion versus a title/meta rewrite. That's a fundamentally different task at the per-page level — the full skills library covers both workflows.

Protecting Your Strong Content

A large programmatic build creates aggregate risk for your domain's content quality signals. Monitor your non-programmatic pages during and after a major programmatic push. If your editorial content starts losing position, the programmatic pages may be pulling the domain quality signal down.

The fix is triage, not deletion. Pages with zero impressions after 90 days of indexing are candidates for noindex or consolidation. Pages with impressions but zero clicks need title/meta work. Pages with clicks but positions 11-30 need content depth work. Delete only what has no search demand and no traffic pathway.

Track indexed page count in Google Search Console's Page Indexing report. A large spike in "Crawled - currently not indexed" is an early signal that Google's quality threshold for your template isn't being met. Address it at the template level, not page by page.

FAQ

How many programmatic pages can I generate before Google penalizes my site?

There is no published page-count threshold. Google's helpful content guidance evaluates whether pages serve a genuine informational need — not how many pages exist. A smaller set of well-differentiated pages with rich per-entity data will generally outperform a large batch of thin ones. A practical starting point is 100-500 pages: it's enough to get indexing signal without committing your whole domain's quality reputation to an untested template. Monitor indexing rate and impression growth over 60 days before scaling further.

What is the minimum viable data per entity to avoid thin content?

Thin content in a programmatic context means the same factual claims appear on every page with only the entity name swapped. The minimum to avoid this is 4-6 data points that are genuinely different per entity and materially change the content: local pricing, regional regulations, geographic specifics, historical performance data, or demographic context. If your dataset has fewer than 4 differentiating fields per entity, the content will be thin regardless of how much prose Claude generates around it.

Should I use Claude Code for content generation or a dedicated template engine?

Use Claude Code for content that needs to read as genuinely distinct per page — paragraphs, context sections, explanatory copy. Use a template engine (Jinja2, Handlebars, or simple Python f-strings) for structural HTML, meta tags, schema markup, and breadcrumbs. The hybrid approach keeps API costs down and reserves Claude's generation for the parts where it adds actual value.

How long does it take to generate 1,000 pages?

At claude-opus-4-5 with a 1,024-token output limit, generation time per page varies with prompt complexity and API load — expect a range of seconds per page, not minutes. Build in a resumable generation loop (skip already-generated files) and run batches overnight. API costs depend on current Anthropic pricing, prompt length, and output size — check the Anthropic pricing page before budgeting a large run. For cost-sensitive builds, start with Haiku (see the model selection FAQ below).

How do I handle duplicate content across programmatic pages?

Duplicate content in a programmatic build comes from two sources: entities with identical underlying data and generation outputs that converge to similar phrasing. Solve the first with data auditing before generation. Solve the second with the fingerprinting approach above — hash key sections and flag collisions. For near-duplicates that pass the exact-match check, run cosine similarity on the intro paragraphs. Anything above 0.92 similarity warrants regeneration with a more specific prompt constraint.

Which Claude model should I use for programmatic generation at scale?

claude-haiku-4-5 is sufficient for well-constrained templates where the entity data is rich and the output format is strict JSON. It is significantly cheaper and faster. Use claude-opus-4-5 for templates where the content needs genuine reasoning about the data — combining multiple data points into coherent insights rather than restating them. For most city-service or product-integration page types, start with Haiku, evaluate output quality on a 50-page sample, and upgrade to Opus only where Haiku's output consistently underperforms on the validation checks.

Compartir este artículo
LinkedInXThreads
Vytas Dargis
Vytas Dargis

Founder, CC for SEO

Martech PM & SEO automation builder. Bridges marketing, product, and engineering teams. Builds CC for SEO to help SEO professionals automate workflows with Claude Code.

Automatiza tus flujos de trabajo SEO

Skills de Claude Code para auditorías técnicas, clustering de keywords, optimización de contenido y análisis GSC/GA4.

Join the Waitlist