Escaping Walled Chat Gardens—A Chat History Converter
All my AI conversations were trapped inside various bot models, and my PKM had no access to them. Here is a converter that exports your chat history to local files for your knowledge management system.
A Problem: Your AI Conversations Are Trapped
Over the past few years, I've accumulated numerous conversations across multiple AI assistants—brainstorming sessions, technical deep-dives, creative projects, research threads, "help me with this recipe" inquiries, and even troubleshooting the random error code on my LG clothes dryer. These conversations constitute an intellectual archive, yet they're scattered across proprietary platforms with no guarantee of long-term access.
What happens when:
- A vendor changes their terms of service.
- Want to revisit that dialog about the "fountain pen nib design" I had last fall, but don't remember which AI it was.
- A vendor has started putting out content and supporting initiatives you find objectionable.
- You want to search across all your AI conversations at once.
- Vendor X now has the 'best' model, and everyone should run over there and try it out. Do you start from scratch?
Much like the vendor lock-in on social media platforms (and what the indieweb people fight against), I decided to take a crack at solving this problem by leveraging the latest and greatest in llm tools and building a conversion pipeline that funnels chat exports from multiple providers into my own Personal Knowledge Management (PKM) system. Since it went public, I've been a supporter of Obsidian as a tool for retaining and working with my own PKM system because it is based on Markdown and has local files, so I can work offline and also so I actually own and have control over my content rather than having it sit on someone's server somewhere—see above on vendor lock in.
TL;DR
Here is the GitHub Repository for the Python code to format downloaded chat histories and put them into local markdown suitable for Obsidian. This process has shown that there should be a standard format for exported chat content.
The Inputs: Three Providers, Three Philosophies, Endless Challenges
Each AI provider takes a remarkably different approach to data export, reflecting their broader philosophies about user data ownership.
ChatGPT: The Tree Structure
OpenAI provides a proper data export through their settings panel, delivering a conversations.json file. However, the format reveals ChatGPT's underlying architecture:
{
"mapping": {
"node-id-1": {
"parent": null,
"children": ["node-id-2"],
"message": { "author": {"role": "user"}, "content": {...} }
},
"node-id-2": {
"parent": "node-id-1",
"children": ["node-id-3", "node-id-4"],
"message": { "author": {"role": "assistant"}, "content": {...} }
}
}
}
Conversations aren't linear arrays—they're trees. This accommodates ChatGPT's "regenerate response" and "edit and resubmit" features, where a single user message might branch into multiple assistant responses. Parsing requires tree traversal to reconstruct the canonical conversation path.
Assets add another layer of complexity. Image references appear as sediment:// and file-service:// URIs that must be mapped to actual files scattered across subdirectories.
Overall Impression: The content exported from ChatGPT is an absolute mess. In my opinion, this is very consistent with my impression of the company, "...build fast and get stuff out, and maybe we will be back later to tighten up things. maybe." It works, at least for them, but it does not show that the way they provide content to their user base (and indeed their customers) is not near the top of the priority list.
Claude: The Structured Export
Anthropic takes a more developer-friendly approach with multiple JSON files:
conversations.json— Flat message arrays withsender: "human" | "assistant"memories.json— Claude's learned context about youprojects.json— Project instructions and attached documents
The message structure is cleaner but includes content type variations:
{
"content": [
{"type": "text", "text": "..."},
{"type": "thinking", "thinking": "..."},
{"type": "tool_use", "name": "...", "input": {...}},
{"type": "tool_result", "content": "..."}
]
}
Extended thinking, tool use, and artifacts all need special handling. Attachments are embedded as extracted_content within the JSON rather than separate files.
Overall Impressions: This is how it should be done. It is perhaps not surprising, given what we know about Anthropic and its stated philosophies, that this would be properly configured. This is configured for developers and users.
Gemini: The Missing Export
Google's approach is... conspicuously absent. Despite Google Takeout supporting dozens of services, Gemini conversations aren't included in any meaningful way. The Takeout export contains only empty placeholder files for "Gems" and "Scheduled Actions."
The workaround requires manual intervention:
- Share each conversation (generates a public URL)
- Open in Safari
- Save as Web Archive (
.webarchive)
This captures the rendered page as an Apple binary plist containing HTML. Parsing requires:
- Extracting HTML from the
WebMainResourceplist entry - Finding user queries in
<div class="user-query-container">elements - Finding responses in
<div class="markdown markdown-main-panel">elements - Downloading generated images from Google's CDN before links expire
It's fragile, manual, and could break with any UI update—but it's currently the only option.
Overall Impressions: This is Google; it has an HTML-available thing, and all its stuff is on its servers. Get over it —we haven't really tried or thought about what you'd need to extract it—and perhaps we will kill the application in the near future anyway. OK, I may still be a little salty about the whole Google Reader thing, but it would be nice if such a small fraction of all that ad cash were put toward creating a convenient way to get our conversations out.
The Parsing Challenges
Challenge 1: HTML to Markdown Conversion
Gemini's responses arrive as rendered HTML. Converting back to markdown requires careful ordering:
# Must process tables BEFORE removing generic tags
text = process_tables(text) # <table> → | col | col |
text = process_lists(text) # <li><p>text</p></li> → - text
text = process_paragraphs(text) # <p> → newlines
text = strip_remaining_tags(text)
Process in the wrong order and you lose structure. Tables become run-on sentences. Lists collapse into paragraphs.
Challenge 2: Image Extraction
Each provider handles images differently:
| Provider | Image Storage | Extraction Method |
|---|---|---|
| ChatGPT | Separate files with URI references | Map URIs to filesystem paths |
| Claude | Base64 embedded in JSON | Decode and save |
| Gemini | External Google CDN URLs | Download with =s0-rp param for PNG |
Gemini's images are particularly tricky—they're served as JPEGs by default, but appending =s0-rp to the URL requests lossless PNG format from Google's CDN.
Challenge 3: Schema Drift
Export formats change without notice. The system includes schema fingerprinting to detect when a provider modifies their structure:
CLAUDE_SCHEMA_V1 = SchemaFingerprint(
files={'conversations.json': ['uuid', 'name', 'chat_messages', ...]},
message_keys=['uuid', 'text', 'sender', 'content', ...],
content_types={'text', 'tool_use', 'tool_result', 'thinking'},
)
When an export doesn't match the expected schema, the converter warns but attempts to proceed—new fields are usually additive rather than breaking.
The Output: Obsidian-Ready Markdown
All conversations funnel into a consistent format. I use the providers name as a tag item for all content that is input into my repository from an llm and then format the conversation as regular markdown. We attempt to convert ascii math symbols to $\LaTeX$ for proper reading (though this is still not optimal).
---
tags:
- chatgpt/gemini/claude
relatedTo:
---
> User message appears as a blockquote
Assistant response appears as regular markdown text,
with **formatting**, `code`, and structure preserved.
> Next user message
Next response, with images as Obsidian wiki-links:
![[conversation_img01.png]]
The YAML frontmatter enables Obsidian's tagging and linking features. User messages as blockquotes creates clear visual distinction. Images land in an attachments/ folder with wiki-link references.
The Architecture
providers/ # Raw exports (input)
├── chatgpt/2026.01.15/
├── claude/2026.01.17/
└── gemini/2026.01.18/
bin/providers/ # Converter implementations
├── base.py # Abstract base class
├── chatgpt/converter.py # Tree traversal, asset mapping
├── claude/converter.py # Flat arrays, content types
└── gemini/converter.py # Webarchive parsing, image download
obsidian_export/ # Converted output
├── chatgpt/2026.01.15/
│ ├── markdown/
│ └── attachments/
└── ...
Adding a new provider means implementing one class with two required methods:
class NewProviderConverter(BaseConverter):
@property
def provider_name(self) -> str:
return 'newprovider'
def convert(self) -> dict:
# Parse export, write markdown, return stats
...
Lessons Learned
Data portability is an afterthought. Only Claude treats export as a first-class feature. ChatGPT's tree structure suggests export was bolted onto an architecture designed for other purposes. Gemini has no real export at all.
Rendered HTML is lossy. Gemini's web archive approach captures the presentation of a conversation, not its structure. Every UI change risks breaking the parser.
Standards would help. A common chat export format—even something as simple as JSON Lines with role and content fields—would make interoperability trivial. Instead, we have three incompatible approaches from three companies who all claim to support "open" AI development.
Local-first wins. Having my conversations in markdown files means I can grep them, back them up, version control them, and know they'll be readable in 20 years. No API access required. No account needed. Just text.
What's Next
- Automated scheduling — Run exports and conversions on a regular cadence
- Deduplication — Detect conversations that appear in multiple exports
- Search integration — Full-text search across all providers
- Additional providers — Perplexity, Copilot, and others as exports become available
The goal isn't just archival—it's building a personal knowledge base where AI conversations become reference material, searchable and linkable alongside notes, documents, and research.
Your conversations are valuable. They shouldn't be trapped in someone else's silo.
The converter is available at ChatConverter Repository. Contributions welcome, especially for additional provider support.