Everything essential, nothing wasted.
The 20GB Internet Archive is an experimental project to condense the most valuable, essential information from the internet into a single, ultra-efficient file. We're talking encyclopedia entries, technical documentation, historical records, scientific papers, cultural knowledge — curated and compressed using our custom DITF (Direct Information Text Format).
This isn't a backup of the entire web. It's a distilled archive of what matters most, designed to be accessible offline, portable across any device, and readable through our open-source app.
This is an early trial. The archive is incomplete, the format is evolving, and the reader app is in active development. Expect bugs, missing content, and breaking changes. We're building this in public — feedback welcome.
DITF format + open-source reader.
The archive is stored in a .ditf file — a custom format we designed to maximize information density while maintaining readability. You can't just open it in a text editor (it's compressed and structured), but our open-source reader app makes it simple to browse, search, and extract information.
20GB of pure information
No images, no ads, no fluff. Just text, data, and structured knowledge compressed to fit on a single drive.
DITF file format
Custom compression and indexing. Designed for fast search, low storage, and cross-platform compatibility.
Open-source reader
The reader app is fully open-source on GitHub. Works on Windows, Mac, Linux, and even mobile devices.
Fast search & indexing
Built-in search lets you query the entire archive in seconds. Find what you need without scrolling through 20GB of text.
A curated selection of the web.
We're focusing on timeless, high-value content across multiple domains:
- Wikipedia core articles (history, science, culture)
- Technical documentation (programming languages, frameworks, APIs)
- Academic papers and scientific research
- Historical texts and public domain literature
- Open-source project documentation
- Educational resources and tutorials
This is not a web scraper dump. Every piece of content is manually reviewed or algorithmically curated for relevance and quality.
Download the Archive
The archive and reader app are currently in development. Check back soon for the first release.
Download 20GB Archive (Coming Soon) Reader App (GitHub)