💬 We offer consulting services to set up, secure, and maintain ArchiveBox on your preferred storage provider.
We use this revenue (from corporate clients who can afford to pay) to support open source development and keep ArchiveBox free.
ArchiveBox supports a wide range of local and remote filesystems using rclone and/or Docker storage plugins. The examples below use Docker Compose bind mounts to demonstrate the concepts, you can adapt them to your OS and environment needs.
services:archivebox:...volumes: # your index db, config, logs, etc. should be stored on a local SSD (usually <10Gb)-./data:/data # but bulk archive/ content can be located on an HDD or remote filesystem-/mnt/archivebox-s3/data/archive:/data/archive
[!TIP] These default filesystems are fully supported by ArchiveBox on Linux and macOS (w/wo Docker).
ZFS (recommended for best experience on Linux/BSD) ⭐️
[!TIP] This is the recommended filesystem for ArchiveBox on Linux, macOS, and BSD (w/wo Docker).apt install zfsutils-linux
Provides RAID, compression, encryption, deduping, 0-cost point-in-time backups, remote sync, integrity verification, and more...
[!WARNING] These filesystems are likely supported, but are not officially tested.
EXT2, EXT3, FAT32, exFAT
[!CAUTION] Not recommended. Cannot store files >4GB or more than 31k ~ 65k Snapshot entries due to directory entry limits.
Supported Remote Filesystems
ArchiveBox supports many common types of remote filesystems using RClone, FUSE, Docker Storage providers, and Docker Volume Plugins.
The data/archive/ subfolder contains the bulk archived content, and it supports being stored on a slower remote server (SMB/NFS/SFTP/etc.) or object store (S3/B2/R2/etc.). For data integrity and performance reasons, the rest of the data/ directory (data/ArchiveBox.conf, data/logs, etc.) must be stored locally while ArchiveBox is running.
[!IMPORTANT] data/index.sqlite3 is your main archive DB, it must be on a fast, reliable, local filesystem which supports FSYNC (SSD/NVMe recommended for best experience).
[!TIP] If you use a remote filesystem, you should switch ArchiveBox's search backend from ripgrep to sonic (or FTS5).
(ripgrep scans over every byte in the archive to do each search, which is slow and potentially costly on remote cloud storage)
NFS (Docker Driver)
docker-compose.yml:
SMB / Ceph (Docker CIFS Driver)
docker-compose.yml:
local filesystem icon
local filesystem icon
Amazon S3 / Backblaze B2 / Google Drive / etc. (RClone)
Then define your remote storage config ~/.config/rclone/rclone.conf:
[!TIP] You can also create rclone.conf using the RClone Web GUI: rclone rcd --rc-web-gui
[!TIP] You can use any RClone FUSE mounts as a normal volumes (bind mount) for Docker ArchiveBox, typically no storage plugin is needed as long as allow-other is setup properly.
docker run -v $PWD:/data -v /opt/archivebox/data/archive:/data/archive
docker-compose.yml:
Option B: Running RClone with Docker Storage Plugin
This is only needed if you are unable to Option A for compatibility or performance reasons, or if you prefer defining your remote storage config in docker-compose.yml instead of rclone.conf.
# install the RClone and FUSE packages on your host
apt install rclone fuse # or brew install
# IMPORTANT: needed to allow FUSE drives to be shared with Docker
echo 'user_allow_other' >> /etc/fuse.conf
# Example rclone.conf using Amazon S3 for storage:
[archivebox-s3]
type = s3
provider = AWS
access_key_id = XXX
secret_access_key = YYY
region = us-east-1
rclone mount
--allow-other \ # essential, allows Docker to access FUSE mounts
--uid 911 --gid 911 \ # 911 is the default used by ArchiveBox
--vfs-cache-mode=full \ # cache both file metadata and contents
--transfers=16 --checkers=4 \ # use 16 threads for transfers & 4 for checking
archivebox-s3/data/archive:/opt/archivebox/data/archive # remote:local