How to Make Your Squarespace Blog Accessible to AI Bots (and Connect It to a Custom GPT)

As AI search tools and custom GPTs become part of how people discover and interact with content, it’s increasingly important for creators to make their writing readable by machines as well as humans.

Squarespace is an elegant platform for human-facing design, but it hides a critical limitation: by default, it blocks nearly all AI crawlers. If you want your blog posts to appear in AI-powered search results or to train your own custom GPT, you’ll need to create a parallel, machine-readable version of your content.

This guide explains how to export your Squarespace blogs, host them publicly on GitHub Pages, and connect them to a custom GPT.

Step 1: Export Your Blog Content from Squarespace

Squarespace doesn’t provide a direct API or public feed that custom GPTs can read, so the first step is to extract your existing content.

  1. Log into your Squarespace dashboard.

  2. Go to Settings → Advanced → Import / Export → Export.

  3. Choose WordPress format and download the .xml file.

This XML file contains every post title, slug, publication date, and full HTML body for your blog.

Step 2: Convert the Exported File into Markdown

AI models read plain text or Markdown far more easily than Squarespace’s HTML-heavy format. You can convert your XML file into Markdown using a simple script or an online converter.

Online option: https://xml2md.vercel.app

Local script example (Python):

from bs4 import BeautifulSoup
from markdownify import markdownify as md

with open("squarespace.xml", "r") as f:
    soup = BeautifulSoup(f, "xml")

for item in soup.find_all("item"):
    title = item.title.text
    slug = item.link.text.split("/")[-2]
    content = md(item.find("content:encoded").text)
    with open(f"{slug}.md", "w") as out:
        out.write(f"# {title}\n\n{content}")

This produces a clean set of .md files—one for each post—ready for publication.

Step 3: Create a GitHub Pages Site for Your Corpus

GitHub Pages is ideal for hosting text that you want to be visible to both humans and AI crawlers.

  1. Go to https://github.com/new and create a new public repository (for example, francesca-tabor-blog).

  2. Add a folder named /posts/ and upload your Markdown files.

  3. In the repository’s Settings → Pages, enable GitHub Pages by selecting:

    • Source: Deploy from a branch

    • Branch: main

    • Folder: / (root)

Your corpus will be live at:

https://yourusername.github.io/posts/

Each Markdown file will be a publicly accessible version of your writing.

Step 4: Make the Corpus AI-Crawlable

At the root of your repository, create a robots.txt file:

User-agent: *
Allow: /
Sitemap: https://yourusername.github.io/sitemap.xml

You can also generate a simple sitemap.xml listing your posts. These two files make it clear to AI crawlers that they have permission to index your content.

Step 5: Link the Corpus to Your Custom GPT

You now have a public, text-based version of your blog suitable for machine consumption. You can integrate it with a custom GPT in several ways.

Option A – Upload directly

When building your Custom GPT in ChatGPT:

  1. Open the Knowledge section.

  2. Upload your Markdown files directly.

  3. Add this instruction:

    “You are trained on Francesca Tabor’s corpus hosted at francesca-tabor.github.io/posts. Cite the relevant post when providing information.”

This creates a self-contained GPT that understands your entire blog archive.

Option B – Reference live URLs

If you prefer not to upload, give the GPT access to live URLs:

“When referencing Francesca Tabor’s writing, use content from
https://francesca-tabor.github.io/posts/. Each file corresponds to one blog article.”

Models with browsing capabilities will pull the data directly from those pages.

Option C – API integration (advanced)

For more technical setups, add a simple index.json file to your repository:

[
  {
    "title": "How to Check if Squarespace Blocks AI Bots",
    "url": "https://francesca-tabor.github.io/posts/how-to-check-squarespace-bots.md",
    "tags": ["ai", "squarespace", "seo"],
    "date": "2025-11-07"
  }
]

This allows external tools and GPT actions to retrieve structured data from your corpus automatically.

Step 6: Keep Squarespace for Humans, GitHub for Machines

You don’t need to abandon Squarespace. Keep it as the polished, design-driven site for human visitors. Your GitHub Pages site simply mirrors your written work in a clean, machine-readable format that AI crawlers and custom GPTs can access.

This dual-layer structure achieves both goals:

  • Squarespace: visual presentation, brand, and conversions.

  • GitHub Pages: structured, permanent, AI-visible knowledge base.

Step 7: Verify Access

Finally, confirm that AI crawlers can see your content.

  1. Visit https://openai.com/gptbot and run the checker using your GitHub Pages URL.

  2. The result should say Allowed.

  3. Optionally, test with a command line:

    curl -I -A "GPTBot" https://francesca-tabor.github.io/posts/
    

If the response is 200 OK, your corpus is accessible.

Conclusion

Squarespace is excellent for presentation but restrictive for AI indexing.
By exporting your content, converting it to Markdown, and publishing it on GitHub Pages, you create a second layer of visibility—one built for AI models, APIs, and custom GPTs.

This approach ensures that your ideas and writing are not only read by people but also understood, referenced, and cited by the next generation of intelligent systems.