Artificial Intelligence

Escaping Walled Chat Gardens—A Chat History Converter

All my AI conversations were trapped inside various bot models, and my PKM had no access to them. Here is a converter that exports your chat history to local files for your knowledge management system.

Rodney Dyer

18 Jan 2026 • 6 min read

Moving your chats from someone elses model to your local pkm.

A Problem: Your AI Conversations Are Trapped

Over the past few years, I've accumulated numerous conversations across multiple AI assistants—brainstorming sessions, technical deep-dives, creative projects, research threads, "help me with this recipe" inquiries, and even troubleshooting the random error code on my LG clothes dryer. These conversations constitute an intellectual archive, yet they're scattered across proprietary platforms with no guarantee of long-term access.

What happens when:

A vendor changes their terms of service.
Want to revisit that dialog about the "fountain pen nib design" I had last fall, but don't remember which AI it was.
A vendor has started putting out content and supporting initiatives you find objectionable.
You want to search across all your AI conversations at once.
Vendor X now has the 'best' model, and everyone should run over there and try it out. Do you start from scratch?

Much like the vendor lock-in on social media platforms (and what the indieweb people fight against), I decided to take a crack at solving this problem by leveraging the latest and greatest in llm tools and building a conversion pipeline that funnels chat exports from multiple providers into my own Personal Knowledge Management (PKM) system. Since it went public, I've been a supporter of Obsidian as a tool for retaining and working with my own PKM system because it is based on Markdown and has local files, so I can work offline and also so I actually own and have control over my content rather than having it sit on someone's server somewhere—see above on vendor lock in.

TL;DR

Here is the GitHub Repository for the Python code to format downloaded chat histories and put them into local markdown suitable for Obsidian. This process has shown that there should be a standard format for exported chat content.

The Inputs: Three Providers, Three Philosophies, Endless Challenges

Each AI provider takes a remarkably different approach to data export, reflecting their broader philosophies about user data ownership.

ChatGPT: The Tree Structure

OpenAI provides a proper data export through their settings panel, delivering a conversations.json file. However, the format reveals ChatGPT's underlying architecture:

{
  "mapping": {
    "node-id-1": {
      "parent": null,
      "children": ["node-id-2"],
      "message": { "author": {"role": "user"}, "content": {...} }
    },
    "node-id-2": {
      "parent": "node-id-1",
      "children": ["node-id-3", "node-id-4"],
      "message": { "author": {"role": "assistant"}, "content": {...} }
    }
  }
}

Conversations aren't linear arrays—they're trees. This accommodates ChatGPT's "regenerate response" and "edit and resubmit" features, where a single user message might branch into multiple assistant responses. Parsing requires tree traversal to reconstruct the canonical conversation path.

Assets add another layer of complexity. Image references appear as sediment:// and file-service:// URIs that must be mapped to actual files scattered across subdirectories.

Overall Impression: The content exported from ChatGPT is an absolute mess. In my opinion, this is very consistent with my impression of the company, "...build fast and get stuff out, and maybe we will be back later to tighten up things. maybe." It works, at least for them, but it does not show that the way they provide content to their user base (and indeed their customers) is not near the top of the priority list.

Claude: The Structured Export

Anthropic takes a more developer-friendly approach with multiple JSON files:

conversations.json — Flat message arrays with sender: "human" | "assistant"
memories.json — Claude's learned context about you
projects.json — Project instructions and attached documents

The message structure is cleaner but includes content type variations:

{
  "content": [
    {"type": "text", "text": "..."},
    {"type": "thinking", "thinking": "..."},
    {"type": "tool_use", "name": "...", "input": {...}},
    {"type": "tool_result", "content": "..."}
  ]
}

Extended thinking, tool use, and artifacts all need special handling. Attachments are embedded as extracted_content within the JSON rather than separate files.

Overall Impressions: This is how it should be done. It is perhaps not surprising, given what we know about Anthropic and its stated philosophies, that this would be properly configured. This is configured for developers and users.

Gemini: The Missing Export

Google's approach is... conspicuously absent. Despite Google Takeout supporting dozens of services, Gemini conversations aren't included in any meaningful way. The Takeout export contains only empty placeholder files for "Gems" and "Scheduled Actions."

The workaround requires manual intervention:

Share each conversation (generates a public URL)
Open in Safari
Save as Web Archive (.webarchive)

This captures the rendered page as an Apple binary plist containing HTML. Parsing requires:

Extracting HTML from the WebMainResource plist entry
Finding user queries in <div class="user-query-container"> elements
Finding responses in <div class="markdown markdown-main-panel"> elements
Downloading generated images from Google's CDN before links expire

It's fragile, manual, and could break with any UI update—but it's currently the only option.

Overall Impressions: This is Google; it has an HTML-available thing, and all its stuff is on its servers. Get over it —we haven't really tried or thought about what you'd need to extract it—and perhaps we will kill the application in the near future anyway. OK, I may still be a little salty about the whole Google Reader thing, but it would be nice if such a small fraction of all that ad cash were put toward creating a convenient way to get our conversations out.

The Parsing Challenges

Challenge 1: HTML to Markdown Conversion

Gemini's responses arrive as rendered HTML. Converting back to markdown requires careful ordering:

# Must process tables BEFORE removing generic tags
text = process_tables(text)      # <table> → | col | col |
text = process_lists(text)       # <li><p>text</p></li> → - text
text = process_paragraphs(text)  # <p> → newlines
text = strip_remaining_tags(text)

Process in the wrong order and you lose structure. Tables become run-on sentences. Lists collapse into paragraphs.

Challenge 2: Image Extraction

Each provider handles images differently:

Provider	Image Storage	Extraction Method
ChatGPT	Separate files with URI references	Map URIs to filesystem paths
Claude	Base64 embedded in JSON	Decode and save
Gemini	External Google CDN URLs	Download with `=s0-rp` param for PNG

Gemini's images are particularly tricky—they're served as JPEGs by default, but appending =s0-rp to the URL requests lossless PNG format from Google's CDN.

Challenge 3: Schema Drift

Export formats change without notice. The system includes schema fingerprinting to detect when a provider modifies their structure:

CLAUDE_SCHEMA_V1 = SchemaFingerprint(
    files={'conversations.json': ['uuid', 'name', 'chat_messages', ...]},
    message_keys=['uuid', 'text', 'sender', 'content', ...],
    content_types={'text', 'tool_use', 'tool_result', 'thinking'},
)

When an export doesn't match the expected schema, the converter warns but attempts to proceed—new fields are usually additive rather than breaking.

The Output: Obsidian-Ready Markdown

All conversations funnel into a consistent format. I use the providers name as a tag item for all content that is input into my repository from an llm and then format the conversation as regular markdown. We attempt to convert ascii math symbols to $\LaTeX$ for proper reading (though this is still not optimal).

---
tags:
  - chatgpt/gemini/claude
relatedTo:
---

> User message appears as a blockquote

Assistant response appears as regular markdown text,
with **formatting**, `code`, and structure preserved.

> Next user message

Next response, with images as Obsidian wiki-links:

![[conversation_img01.png]]

The YAML frontmatter enables Obsidian's tagging and linking features. User messages as blockquotes creates clear visual distinction. Images land in an attachments/ folder with wiki-link references.

The Architecture

providers/                    # Raw exports (input)
├── chatgpt/2026.01.15/
├── claude/2026.01.17/
└── gemini/2026.01.18/

bin/providers/                # Converter implementations
├── base.py                   # Abstract base class
├── chatgpt/converter.py      # Tree traversal, asset mapping
├── claude/converter.py       # Flat arrays, content types
└── gemini/converter.py       # Webarchive parsing, image download

obsidian_export/              # Converted output
├── chatgpt/2026.01.15/
│   ├── markdown/
│   └── attachments/
└── ...

Adding a new provider means implementing one class with two required methods:

class NewProviderConverter(BaseConverter):
    @property
    def provider_name(self) -> str:
        return 'newprovider'

    def convert(self) -> dict:
        # Parse export, write markdown, return stats
        ...

Lessons Learned

Data portability is an afterthought. Only Claude treats export as a first-class feature. ChatGPT's tree structure suggests export was bolted onto an architecture designed for other purposes. Gemini has no real export at all.

Rendered HTML is lossy. Gemini's web archive approach captures the presentation of a conversation, not its structure. Every UI change risks breaking the parser.

Standards would help. A common chat export format—even something as simple as JSON Lines with role and content fields—would make interoperability trivial. Instead, we have three incompatible approaches from three companies who all claim to support "open" AI development.

Local-first wins. Having my conversations in markdown files means I can grep them, back them up, version control them, and know they'll be readable in 20 years. No API access required. No account needed. Just text.

What's Next

Automated scheduling — Run exports and conversions on a regular cadence
Deduplication — Detect conversations that appear in multiple exports
Search integration — Full-text search across all providers
Additional providers — Perplexity, Copilot, and others as exports become available

The goal isn't just archival—it's building a personal knowledge base where AI conversations become reference material, searchable and linkable alongside notes, documents, and research.

Your conversations are valuable. They shouldn't be trapped in someone else's silo.

The converter is available at ChatConverter Repository. Contributions welcome, especially for additional provider support.