Automate Technical SEO Audits with Claude Code

Key Takeaways

Crawl analysis is the bottleneck, not the crawl itself. Screaming Frog gives you the data; you still spend hours interpreting it. Claude Code closes that gap.
Schema validation at scale means checking JSON-LD across hundreds of pages in one pass, not sampling five and hoping the pattern holds.
Server log analysis reveals where Googlebot is spending time versus where you want it to spend time. The gap is your crawl budget problem.
Orphan page detection on 10K+ page sites requires joining crawl data, sitemap data, and internal link data simultaneously. Claude Code does this in seconds.
Every workflow here runs inside your terminal. No third-party dashboards, no browser tabs, no context switching. See the full setup at CC for SEO Command Center.
The time savings compound. Once your audit scripts are in place, monthly re-runs take a fraction of the time. The first setup requires working through the prompts and validating the output against your site's structure.

Technical audits at scale expose a problem that every senior SEO eventually hits: the tools give you rows of data, but reading that data, cross-referencing it, and turning it into a prioritized fix list still falls on a human. That human is you.

Any senior SEO can confirm the pattern: crawling is fast, but everything after the crawl -- cross-referencing canonicals with hreflang, joining redirect chains with sitemap status, writing up the prioritized fix list -- is where hours go. Crawling is automated. The analysis is not.

Claude Code changes the analysis half. You feed it crawl exports, log files, sitemaps, and schema dumps. It reads them, finds the problems, and writes the prioritized fix list. This guide shows you exactly how.

Why Technical Audits Still Take Too Long

Technical SEO audits are slow not because crawling is slow, but because the post-crawl work is entirely manual.

Tools like Screaming Frog, Sitebulb, and DeepCrawl do a capable job of collecting data: status codes, redirect chains, canonical tags, hreflang tags, duplicate content signals. What they cannot do is reason across that data simultaneously. Screaming Frog can tell you a page returns a 200 and has a non-self-referencing canonical. It cannot tell you that this specific combination, on these 340 pages, is collapsing your crawl budget and suppressing the product category you just spent three months building links to.

That reasoning step is where hours go. You export to CSV, open it in Excel or Sheets, build pivot tables, manually join the canonical report with the hreflang report, and try to hold the whole picture in your head long enough to write a coherent recommendation.

Claude Code replaces that reasoning step. You drop the exports into a project directory and ask questions in plain language. It reads across all the files at once.

# Start Claude Code in your audit project directory
cd ~/seo-audits/client-acme
claude

# Then inside the session:
> I have four CSV exports from Screaming Frog in this directory: internal_all.csv,
> response_codes.csv, canonicals.csv, and hreflang.csv. Find every page that
> returns a 200 status code but has a canonical pointing to a different URL.
> List them with the canonical target. Then check whether those canonical targets
> appear in the hreflang report. Flag any where they don't.

Claude Code reads all four files and outputs the list. No pivot tables. No VLOOKUP.

Crawl-Based Audit with Claude Code

A crawl-based audit with Claude Code uses Screaming Frog or a similar crawler to collect raw data, then feeds that data to Claude Code for cross-referencing and analysis. Claude Code can process multiple export files simultaneously, identify patterns across data sources, and generate a prioritized issue list in seconds.

Start by running your crawl in Screaming Frog with these exports enabled: all internal URLs, response codes, directives (robots/canonical), redirect chains, page titles, and meta descriptions. Save them to one directory.

audit-project/
├── internal_all.csv          # All crawled URLs with metadata
├── response_codes.csv        # Status codes per URL
├── canonicals.csv            # Self-referencing vs. non-self-referencing
├── redirects.csv             # Redirect chains with hop count
├── hreflang.csv              # hreflang annotations per URL
├── sitemap_urls.txt          # Extracted from XML sitemap
└── robots.txt                # Current robots.txt content

Then open Claude Code in that directory and run through the audit in one session:

claude

> Read all CSV files in this directory. Give me a technical audit summary covering:
> 1. All non-200 status codes grouped by type (3xx, 4xx, 5xx) with URL count
> 2. Redirect chains longer than two hops
> 3. Pages with canonical tags pointing outside the domain
> 4. Pages that appear in the sitemap but return non-200 status codes
> 5. Pages with duplicate title tags (group by identical title, show all URLs)
> Output the findings as a markdown report with counts, severity labels
> (Critical / Warning / Info), and recommended fixes per issue type.

Sample output:

## Technical Audit Summary — acme.com

### Critical Issues

**Sitemap URLs returning non-200 status codes** — 47 URLs
Googlebot is being directed to pages that do not resolve correctly.
Action: Update sitemap to include only indexable, 200-status URLs.

**Redirect chains > 2 hops** — 23 chains
Chains of 3-5 redirects found. Longest chain: 5 hops (/old-page-1 → ... → /new-page).
Action: Update source URLs to point directly to final destination.

### Warning Issues

**Non-self-referencing canonicals on 200-status pages** — 340 pages
Pages return 200 but signal to Googlebot to index a different URL.
Review whether these are intentional consolidation signals or misconfigured templates.

**Duplicate title tags** — 18 groups (89 pages total)
[View full list in output below]

This is the same analysis that would otherwise mean multiple CSV exports, pivot tables, and manual cross-referencing. The session could compress that significantly -- how much depends on the complexity of the site and how much follow-up questioning the output requires.

For ongoing monitoring, save the session prompts to a AUDIT_PROMPTS.md in the project directory. Next month's audit re-uses the same prompts against new exports.

CC for SEO Command Center

Pre-built Claude Code skills for technical audits, keyword clustering, and GSC/GA4 analysis.

Be the first to get access

Schema Markup Validation at Scale

Schema validation at scale means checking JSON-LD implementation across all page types in a single automated pass, rather than sampling individual pages through Google's Rich Results Test. Claude Code can parse schema markup directly from HTML files or API responses, validate against schema.org specifications, and flag implementation errors across hundreds of pages simultaneously.

Sampling five product pages through the Rich Results Test tells you those five pages work. It tells you nothing about the other 4,000. At scale, the only way to catch schema drift, template regressions, or partial implementations is to fetch and parse every page type programmatically.

Here is a Python script that fetches pages and extracts their JSON-LD. Claude Code writes this for you if you describe the goal.

import requests
from bs4 import BeautifulSoup
import json

def extract_schema(url: str) -> list[dict]:
    """Extract all JSON-LD blocks from a page."""
    try:
        response = requests.get(url, timeout=10, headers={
            "User-Agent": "Mozilla/5.0 (compatible; SEOAuditBot/1.0)"
        })
        soup = BeautifulSoup(response.text, "html.parser")
        schemas = []
        for tag in soup.find_all("script", type="application/ld+json"):
            try:
                schemas.append(json.loads(tag.string))
            except json.JSONDecodeError:
                schemas.append({"error": "invalid_json", "raw": tag.string[:200]})
        return schemas
    except requests.RequestException as e:
        return [{"error": str(e)}]

# Run against a list of URLs
urls = open("sitemap_urls.txt").read().splitlines()
results = {url: extract_schema(url) for url in urls[:500]}

with open("schema_dump.json", "w") as f:
    json.dump(results, f, indent=2)

Run that script. Then take the output into Claude Code:

claude

> I have a schema_dump.json file containing JSON-LD extracted from 500 pages
> of an ecommerce site. Each key is a URL, each value is an array of schema
> objects found on that page.
>
> Find all pages that:
> 1. Have a Product schema but are missing the "offers" property
> 2. Have a Product schema with offers but no "priceCurrency" field
> 3. Have a BreadcrumbList schema where the "item" array has fewer than 2 entries
> 4. Have an Article schema missing "author" or "datePublished"
> 5. Have a JSON parse error (look for objects with an "error" key)
>
> For each issue, list the affected URLs and what's missing.
> Group by issue type. Give me a count per issue.

The output maps every schema error across all 500 pages. You now have a complete remediation list grouped by issue type, ready to hand to a developer.

For sites with templated schema (generated server-side), this analysis can catch template-level bugs in one pass. Fix the template, re-run the script, verify the fix. The cycle is faster than sampling pages manually through the Rich Results Test.

Server Log Analysis for Crawl Budget

Server log analysis for crawl budget involves parsing raw web server access logs to identify which URLs Googlebot crawls, how frequently, and whether that crawl activity aligns with the site's most valuable pages. Claude Code can process large log files, filter for Googlebot activity, and surface crawl inefficiencies that standard crawl tools cannot detect.

Server logs are the only first-party data source that shows exactly what Googlebot did, not what you think it did. GSC coverage reports give you aggregate data. Log files give you every request, timestamped, with status code and bytes transferred.

Most sites with crawl budget problems share the same pattern: Googlebot is spending cycles on faceted navigation, session-based parameters, or stale paginated URLs while under-crawling the content that actually drives revenue.

Parse your logs with Python first, then bring the cleaned data into Claude Code:

import re
from collections import defaultdict

GOOGLEBOT_PATTERN = re.compile(
    r'(\S+) \S+ \S+ \[([^\]]+)\] "(\S+) (\S+) \S+" (\d+) \S+.*Googlebot'
)

crawl_data = defaultdict(list)

with open("access.log", "r") as f:
    for line in f:
        match = GOOGLEBOT_PATTERN.search(line)
        if match:
            ip, timestamp, method, path, status = match.groups()
            crawl_data[path].append({
                "timestamp": timestamp,
                "status": int(status),
                "method": method
            })

# Save to JSON for Claude Code analysis
import json
with open("googlebot_crawl.json", "w") as f:
    json.dump(dict(crawl_data), f, indent=2)

Then analyze in Claude Code:

claude

> I have googlebot_crawl.json containing 30 days of Googlebot crawl data.
> Each key is a URL path, each value is an array of crawl events with timestamp
> and HTTP status code.
>
> Answer these questions:
> 1. What are the 20 most-crawled URL paths? Show crawl count and status code breakdown.
> 2. Which URL patterns (group by regex/prefix) account for more than 5% of total crawl activity?
> 3. What percentage of Googlebot requests returned 4xx or 5xx status codes?
> 4. Are there any URL paths being crawled more than 50 times in 30 days that
>    are not likely to be content pages? (Look for patterns like ?sessionid=, /cart,
>    /search?, /tag/, page numbers above 10)
> 5. Which paths appear to be crawled fewer than once per week?
>    These may be under-crawled relative to their importance.
>
> Give me a prioritized crawl budget remediation list based on your findings.

Sample output:

## Crawl Budget Analysis — 30 Days

**Total Googlebot requests:** 142,847
**Unique paths crawled:** 8,203

### Crawl Budget Drains (Critical)

| URL Pattern | Crawl Count | % of Total | Issue |
|---|---|---|---|
| /search?q= | 18,442 | 12.9% | Faceted search — should be noindexed |
| /products?sort= | 9,811 | 6.9% | Sort parameters — canonicalize or block |
| /tag/ | 7,204 | 5.0% | Tag archive pages — low-value, crawl frequently |
| /?ref= | 4,103 | 2.9% | UTM/referral parameters leaking into crawl |

**Recommendation:** Add these patterns to robots.txt Disallow or
configure Google Search Console URL parameter handling.
Blocking these patterns redirects Googlebot toward indexable content.

Get Weekly Claude Code SEO Tips

Workflows, skills, and tactics for SEO professionals using Claude Code.

No spam. Unsubscribe anytime.

Internal Linking Analysis

Internal linking analysis with Claude Code involves combining crawl data, sitemap URLs, and raw HTML link extraction to identify orphan pages, broken internal links, and pages receiving disproportionately low link equity. On sites with 10,000 or more pages, this analysis is impractical to do manually and requires automated cross-referencing of multiple data sources.

On large sites, internal linking problems compound silently. A URL gets migrated, the old link stays in a footer template, the redirect chain grows. A category page gets restructured, orphaning 200 product pages from any navigation path. Googlebot stops visiting them. Rankings drop six months later with no obvious cause.

Finding these problems requires three data sources simultaneously: all pages on the site (crawl or sitemap), all internal links found on those pages (crawl export), and all pages that receive zero internal links (the gap between the first two).

claude

> I have three files in this directory:
> - sitemap_urls.txt: 12,847 URLs extracted from the XML sitemap
> - internal_links.csv: Screaming Frog export of all internal links
>   (columns: Source, Destination, Anchor Text, Status Code)
> - crawled_urls.csv: All URLs discovered during the crawl
>
> Do the following analysis:
>
> 1. ORPHAN PAGES: Find all URLs in sitemap_urls.txt that appear in zero rows
>    as a Destination in internal_links.csv. These pages have no internal links
>    pointing to them. List the first 50 with their URL path prefix grouped
>    (to spot patterns like /blog/2019/ or /products/discontinued/).
>
> 2. BROKEN INTERNAL LINKS: Find all rows in internal_links.csv where
>    Status Code is 404 or 301. Group by Source page.
>    Which source pages have the most broken internal links?
>
> 3. LINK EQUITY CONCENTRATION: Which 10 Destination URLs receive the most
>    internal links? Which 10 receive the fewest (but are in the sitemap)?
>
> 4. ANCHOR TEXT ANALYSIS: For the top 20 most-linked Destination URLs,
>    what is the distribution of anchor text? Are any using generic anchors
>    ("click here", "read more", "here") more than 30% of the time?
>
> Output findings as a markdown report with actionable recommendations.

For the orphan page finding specifically, Claude Code can go further. Once it identifies the orphaned URL patterns, ask it to suggest where internal links should be added:

> The orphan page analysis found 340 URLs matching the pattern /resources/guides/*.
> Based on the sitemap structure and URL patterns, suggest which existing pages
> should add internal links to these guides. Prioritize pages that already
> rank for related topics based on the URL structure.

This does not replace keyword-level link targeting, but it identifies the structural gaps that prevent Googlebot from even discovering the content. Fix the structure first.

The full workflow for ongoing internal link monitoring: run the analysis monthly against fresh Screaming Frog exports. Save the Claude Code session prompts to a PROMPTS.md. Each month's analysis re-uses the same prompts against new data, which could cut the time significantly compared to doing it manually from scratch.

For the complete skill file that automates this workflow, including the Python scripts and Claude Code prompts packaged together, see the Claude Code SEO skills library.

FAQ

How is this different from just using Screaming Frog's built-in analysis?

Screaming Frog applies fixed rules to single data sources. Claude Code applies reasoning across multiple data sources simultaneously. Screaming Frog tells you a canonical is non-self-referencing. Claude Code tells you which of those non-self-referencing canonicals are on pages that also appear in your sitemap, fail hreflang validation, and receive zero internal links. That compound analysis is where the actual insight lives.

How large of a site can Claude Code handle?

Claude Code reads files, not URLs. The bottleneck is file size, not page count. CSV exports from a 50,000-page crawl can exceed Claude's context window if you try to load them all at once. For large sites, split exports by URL path prefix (e.g., one CSV per subdirectory or page type) and analyze in batches. The session prompts stay the same.

Do I need to write Python to use these workflows?

No. Describe the goal in plain language inside a Claude Code session and it writes the scripts. "Write a Python script that reads access.log, filters for lines containing 'Googlebot', and saves a JSON file with each URL path as a key and an array of objects as the value." That is enough. It handles the regex, the file I/O, and the JSON serialization.

How do I validate that Claude Code's analysis is accurate?

Spot-check five to ten of the flagged items manually against the source CSV. If the pattern holds, the rest are reliable. Pay particular attention to any count that seems dramatically high or low. LLMs occasionally misread CSV structure, especially files with inconsistent quoting or special characters in URLs. A quick sanity check on the totals takes two minutes.

Can I use this with log files from any web server?

The parsing script above targets common log format (Apache/Nginx default). For IIS or CDN logs (Cloudflare, Fastly, Akamai), the field order differs. Give Claude Code a sample of five to ten log lines and ask it to write a parser that matches your format. It reads the pattern and adapts the regex.

How does this fit into a broader technical audit workflow?

Use this alongside your existing tools, not instead of them. Screaming Frog or Sitebulb for the crawl. Python for log parsing and schema extraction. Claude Code for cross-file reasoning and output generation. The full workflow from crawl to prioritized fix list runs in CC for SEO Command Center, which packages the scripts, prompts, and CLAUDE.md configs for technical audit work. The underlying setup is covered in How to Turn Claude Code Into Your SEO Command Center.