LogoAwesome Homelab
Logo of ArchiveBox

ArchiveBox

Open-source, self-hosted web archiving tool to save URLs, browser history, and more offline.

Introduction

ArchiveBox

ArchiveBox is a powerful, open-source, self-hosted web archiving solution designed to collect, save, and view websites offline. It enables individuals and organizations to preserve internet content in durable, accessible formats, ensuring long-term availability even if the original content disappears. ArchiveBox supports a wide range of input sources and output formats, making it a versatile tool for various archiving needs.

Key Features
  • Distributed Archiving: Users maintain control over their data by self-hosting, avoiding reliance on centralized services.
  • Future-Proof Storage: Saves content in multiple standard formats like HTML, PDF, PNG, MP4, and WARC for long-term accessibility.
  • Extensible and Flexible: Offers powerful CLI, Web UI, and Python APIs, with community-driven extractor additions.
  • Comprehensive Input Support: Imports URLs from browser history, bookmarks, RSS feeds, social media, and more.
  • Robust Output Formats: Extracts and saves diverse content types including media, articles, and source code.
  • Scheduled and Real-Time Archiving: Supports automated imports and real-time archiving via browser extensions or proxies.
Use Cases
  • Individuals: Save personal bookmarks, social media content, or portfolio items for legacy preservation.
  • Journalists: Archive cited pages and research materials for fact-checking and evidence.
  • Lawyers: Collect and preserve digital evidence for legal cases with tagging and review capabilities.
  • Researchers: Analyze trends and gather data for studies or LLM training.
  • Governments and NGOs: Snapshot public service sites and ensure recordkeeping compliance.

ArchiveBox stands out by empowering users to archive both public and private content while maintaining privacy and control, making it an essential tool for anyone concerned with digital preservation.