Troubleshooting
▶️ If you need help or have a question, you can open an issue or reach out on Twitter.
What are you having an issue with?:
Installing
If using archivebox
without Docker, make sure you've followed the full guide in the [[Install]] instructions first. Then check here for help depending on what component you need help with.
Then make sure archivebox
is installed available in your $PATH
.
⭐️ Show the full archivebox version info + info about all installed dependencies:
(ensure the version shown is the most recent available from Releases)
Python
Make sure you have at least Python 3.9 installed on your system.
If you still need help getting Python installed, the official Python docs are a good place to start.
Chromium/Google Chrome
For more info, see the [[Chromium Install]] page.
ArchiveBox depends on being able to access a chromium-browser
/google-chrome
executable. The executable used defaults to chromium-browser
but can be manually specified with the environment variable CHROME_BINARY
:
Test to make sure you have Chrome on your
$PATH
with:
If no executable is displayed, follow the setup instructions to install and link one of them.
If a path is displayed, the next step is to check that it's runnable:
If no version is displayed, try the setup instructions again, or confirm that you have permission to access chrome.
If a version is displayed and it's
<111
, upgrade it:
If a version is displayed and it's
>=111
, make sure ArchiveBox is running the right one:
Wget & Curl
If you're missing wget
or curl
, simply install them using apt
or your package manager of choice. See the "Manual Setup" instructions for more details.
If wget times out or randomly fails to download some sites that you have confirmed are online, upgrade wget to the most recent version with brew upgrade wget
or apt upgrade wget
. There is a bug in versions <=1.19.1_1
that caused wget to fail for perfectly valid sites.
NPM Dependencies
NPM packages like readability
, singlefile
, etc. are auto-installed by archivebox setup
into data/node_modules
.
Make sure you have installed NodeJS + NPM first, here are their official install docs.
Archiving
No links parsed from export file
Please open an issue with a description of where you got the export, and preferrably your export file attached (you can redact the links). We'll fix the parser to support your format.
Lots of skipped sites
If you ran the archiver once, it wont re-download sites subsequent times, it will only download new links. If you haven't already run it, make sure you have a working internet connection and that the parsed URLs look correct. You can check the ArchiveBox stdout logs or the Web UI to see what links it's downloading.
If you're still having issues, try deleting or moving the ./archive
folder (back it up first!) and running archivebox init
again.
Lots of errors
Make sure you have all the dependencies installed and that you're able to visit the links from your browser normally. Open an issue with a description of the errors if you're still having problems.
Lots of broken links from the index
Not all sites can be effectively archived with each method, that's why it's best to use a combination of wget
, PDFs, and screenshots. If it seems like more than 10-20% of sites in the archive are broken, open an issue with some of the URLs that failed to be archived and I'll investigate.
Removing unwanted links from the index
archivebox remove --help
Hosting the Archive
If you're having issues trying to host the archive via nginx, make sure you already have nginx running with SSL. If you don't, google around, there are plenty of tutorials to help get that set up. Open an issue if you have problem with a particular nginx config.
Other database or filesystem issues
Docker Permissions issues
Try Setting PUID
& PGID
: https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#puid--pgid
Try using bindfs
to work around issues by remapping permissions, for example to remap uid:33 gid:33
on the host to 911:911
inside the container: docker-compose.yml
:
Database
Database and filesystem issues are uncommon but do come up from time to time (especially when using networked storage, large archives, or multiple ArchiveBox processes for a single collection).
ℹ️ Generally, these commands can help you resolve most issues:
Don't be scared by the volume of content here. Almost all of these issues linked below are duplicates or old resolved bugs, but they contain valuable context and troubleshooting steps if you're trying to figure out the cause of a problem with your setup.
Filesystem doesn't support FSYNC (e.g. network mounts)
The index.sqlite3
file must be stored on a filesystem that supports FSYNC (most local filesystems) in order to ensure SQLite3 database integrity when multiple ArchiveBox processes may be accessing it simultaneously. However, the ./archive
folder can be on a NAS or other filesystem that does not support FSYNC.
More info:
https://www.geeksforgeeks.org/python-os-fsync-method/
https://man7.org/linux/man-pages/man2/fdatasync.2.html
https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html
https://eclecticlight.co/2022/02/18/how-can-you-trust-a-disk-to-write-data/
Database and filesystem contention issues when running multiple ArchiveBox processes
ArchiveBox can sometimes struggle when archiving many links in parallel with multiple ArchiveBox processes trying to write to the database at the same time, leading to errors like this:
These errors can also be encountered when there are permissions, network, or filesystem issues preventing writes to index.sqlite3
.
More info:
https://www.sqlite.org/lockingv3.html
https://charlesleifer.com/blog/going-fast-with-sqlite-and-python/
https://victoria.dev/blog/sqlite-in-production-with-wal/
https://code.djangoproject.com/ticket/29280
https://stackoverflow.com/questions/47761570/how-can-i-avoid-database-is-locked-sqlite3-errors-in-django
Database migrations errors or upgrade issues
Migration or upgrade issues happen occasionally with some niche setups or when skipping major versions during archiving. Always backup your archive before upgrading, but know that migrations are deterministic and atomic using Django's migration system, so a failed migration does not mean your archive is unrecoverable, you just have to downgrade to the previous stable major version then continue upgrading.
More info:
https://docs.djangoproject.com/en/4.0/topics/migrations/
https://realpython.com/django-migrations-a-primer/
https://realpython.com/digging-deeper-into-migrations/
https://www.kite.com/blog/python/django-database-migrations-overview/
https://markusholtermann.eu/2021/06/writing-safe-database-migrations-in-django/
Repairing a corrupted SQLite3 database file
A corrupted database file can theoretically only happen if an external process or filesystem error corrupts the SQLite3 database (there has only been one report of a user encountering this in real life). If you ever need to repair a corrupted ArchiveBox index you can run the following steps.
Note this is specific to this error, these steps do not apply to other migrations/db errors (see below for other issues):
Generally all index issues should be fixable by running archivebox init
.
You can see the status of Snapshots and find any invalid/orphan/missing snapshots with archivebox status
.
Error output:
Steps to fix:
More info:
https://github.com/ArchiveBox/ArchiveBox/issues/955
https://stackoverflow.com/questions/5274202/sqlite3-database-or-disk-is-full-the-database-disk-image-is-malformed
See here for more info:
https://github.com/ArchiveBox/ArchiveBox/wiki/Upgrading
https://github.com/ArchiveBox/ArchiveBox/wiki/Merging-Collections
https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#python-shell-usage
https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#sql-shell-usage
https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#do-not-run-as-root
https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#output-folder
Last updated