Web Archiving Community

💬 Join us on our new ArchiveBox community chat server: https://Zulip.ArchiveBox.io

🔢 Just getting started and want to learn more about why Web Archiving is important? Check out this article: On the Importance of Web Archivingarrow-up-right.


The internet archiving community is surprisingly far-reaching and almost universally friendly!

Whether you want to learn which organizations are the big players in the web archiving space, want to find a specific open source tool for your web archiving need, or just want to see where archivists hang out online, this is my attempt at an index of the entire web archiving community.


The Master Lists

Indexes of archiving institutions and software maintained by other people. If there's anything archivists love doing, it's making lists.


Web Archiving Projects

Bookmarking Services


From the Archive.org & Archive-It teams


From Webrecorder

Webrecorderarrow-up-right develops a suite of open source tools, to capture websites and replay them at a later time as accurately as possible. Webrecorder also publishes the WACZ file format specarrow-up-right.


From Rhizome.org (Conifer)


From the Old Dominion University: Web Science Team


From the Archives Unleashed Team


From the IIPC team


Other Public Archiving Services

  • https://archive.is / https://archive.today

  • https://ghostarchive.org

  • https://perma.cc

  • https://arquivo.pt

  • https://www.pagefreezer.com

  • https://www.smarsh.com

  • https://www.stillio.com

  • https://archive.st

  • https://theoldnet.com/

  • https://timetravel.mementoweb.org/

  • https://freezepage.com/

  • https://webcitation.org/archive

  • https://archiveofourown.org/

  • https://megalodon.jp/

  • https://www.webarchive.org.uk/ukwa/

  • https://github.com/HelloZeroNet/ZeroNet (super cool project)

  • Google, Bing, DuckDuckGo, and other search engine cachesarrow-up-right


Other ArchiveBox Alternatives

There are lots more projects listed here too: https://github.com/stars/pirate/lists/internet-archiving

Ones I haven't personally vetted:


Smaller Utilities

Random helpful utilities for web archiving, WARC creation and replay, and more...

  • https://github.com/TheCakeIsNaOH/xbs-to-archivebox A utility to sync xBrowserSync bookmarks with ArchiveBox

  • https://github.com/karlicoss/promnesia A browser extension that collects and collates all the URLs you visitarrow-up-right into a hierarchical/graph structure with metadata

  • https://github.com/vrtdev/save-page-state A Chrome extension for saving the state of a page in multiple formats

  • https://github.com/jsvine/waybackpack command-line tool that lets you download the entire Wayback Machine archive for a given URL

  • https://github.com/hartator/wayback-machine-downloader Download an entire website from the Internet Archive Wayback Machine.

  • https://github.com/Lifesgood123/prevent-link-rot Replace any broken URLs in some content with Wayback machine URL equivalents

  • https://en.archivarix.com download an archived page or entire site from the Wayback Machine

  • https://proofofexistence.com prove that a certain file existed at a given time using the blockchain

  • https://github.com/chfoo/warcat for merging, extracting, and verifying WARC files

  • https://github.com/mozilla/readability tool for extracting article contents and text

  • https://github.com/mholt/timeliner All your digital life on a single timeline, stored locally

  • https://github.com/wkhtmltopdf/wkhtmltopdf Webkit HTML to PDF archiver/saver

  • Sheetsee-Pocketarrow-up-right project that provides a pretty auto-updating index of your Pocket links (without archiving them)

  • Pocket -> IFTTT -> Dropboxarrow-up-right Post by Christopher Su on his Pocket saving IFTTT recipe

  • http://squidman.net/squidman/index.html

  • https://wordpress.org/plugins/broken-link-checker/

  • https://github.com/ArchiveTeam/wpull

  • http://freedup.org/

  • https://en.wikipedia.org/wiki/Furl

  • https://preservica.com/digital-archive-software-1/active-digital-preservation For-profit company offering a digital preservation software suite

  • https://github.com/karlicoss/grasp capture webpages from Firefox and Chrome into Org-mode documents

  • https://github.com/dgtlmoon/changedetection.io Change detection and monitoring of web page content changes


Reading List

A collection of blog posts and articles about internet archiving, contact me / open an issue if you want to add a link here!


Blogs Friends of ArchiveBox

  • https://blog.archive.org

  • https://webrecorder.net/blog

  • https://netpreserveblog.wordpress.com

  • https://blog.conifer.rhizome.org/

  • https://ws-dl.blogspot.com

  • https://siarchives.si.edu/blog

  • https://parameters.ssrc.org

  • https://sr.ithaka.org/publications

  • https://ait.blog.archive.org

  • https://brewster.kahle.org

  • https://ianmilligan.ca

  • https://medium.com/@giovannidamiola


Articles We Like About Internet Archiving

  • https://items.ssrc.org/parameters/on-the-importance-of-web-archiving/

  • https://theconversation.com/your-internet-data-is-rotting-115891

  • https://www.bbc.com/future/story/20190401-why-theres-so-little-left-of-the-early-internet

  • https://sr.ithaka.org/publications/the-state-of-digital-preservation-in-2018/

  • https://gizmodo.com/delete-never-the-digital-hoarders-who-collect-tumblrs-1832900423

  • https://siarchives.si.edu/blog/we-are-not-alone-progress-digital-preservation-community

  • https://www.gwern.net/Archiving-URLs

  • http://brewster.kahle.org/2015/08/11/locking-the-web-open-a-call-for-a-distributed-web-2/

  • https://lwn.net/Articles/766374/

  • https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives

  • https://medium.com/@giovannidamiola/making-the-internet-archives-full-text-search-faster-30fb11574ea9

  • https://xkcd.com/1909/

  • https://samsaffron.com/archive/2012/06/07/testing-3-million-hyperlinks-lessons-learned#comment-31366

  • https://www.gwern.net/docs/linkrot/2011-muflax-backup.pdf

  • https://thoughtstreams.io/higgins/permalinking-vs-transience/

  • http://ait.blog.archive.org/files/2014/04/archiveit_life_cycle_model.pdf

  • https://blog.archive.org/2016/05/26/web-archiving-with-national-libraries/

  • https://blog.archive.org/2014/10/28/building-libraries-together/

  • https://ianmilligan.ca/2018/03/27/ethics-and-the-archived-web-presentation-the-ethics-of-studying-geocities/

  • https://ianmilligan.ca/2018/05/22/new-article-if-these-crawls-could-talk-studying-and-documenting-web-archives-provenance/

  • https://ws-dl.blogspot.com/2019/02/2019-02-08-google-is-being-shuttered.html

If any of these links are dead, you can find an archived version on https://archive.sweeting.me or https://web.archive.org.


ArchiveBox-Specific Posts, Tutorials, and Guides

Beware: many of these may be outdated, as ArchiveBox has frequent updates and continual improvement.

  • "Install ArchiveBox on SaltBox.dev" https://docs.saltbox.dev/sandbox/apps/archivebox/#3-setup

  • "ArchiveBox is an open-source self-hosted web archiving system for the web and the desktop" https://medevel.com/archivebox/

  • "Install ArchiveBox on a One-Click Docker Application" https://www.vultr.com/docs/install-archivebox-on-a-oneclick-docker-application/

  • "ArchiveBox, una solución para crear nuestro propio Archive.org en miniatura y personalizado" https://www.genbeta.com/herramientas/archivebox-solucion-para-crear-nuestro-propio-archive-org-miniatura-personalizado

  • "网页存档的开源工具ArchiveBox,可以将网页文字、图片、媒体文件等都保存下来,供日后查看。基于Python的开源项目,可搭建私人的网络存档服务。" https://www.bilibili.com/s/video/BV1ib4y1X7SL

  • "Персональный интернет-архив без боли" https://habr.com/ru/company/vdsina/blog/550180/

  • "ArchiveBox, una solución para crear nuestro propio Archive.org en miniatura y personalizado" https://www.genbeta.com/herramientas/archivebox-solucion-para-crear-nuestro-propio-archive-org-miniatura-personalizado

  • "Preserve the Internet With ArchiveBox" https://www.cyberpunks.com/preserve-the-internet-with-archivebox/

  • "Сам себе архивариус. Изучаем возможности ArchiveBox" https://xakep.ru/2021/02/01/archivebox/

  • "使用存档盒制作自己的Internet存档" http://www.diglog.com/story/1045192.html

  • "How to Make Your Own Internet Archive With ArchiveBox" https://nixintel.info/osint-tools/make-your-own-internet-archive-with-archive-box/

  • "Mit ArchiveBox Webseiten auf der Festplatte archivieren" https://www.linux-community.de/ausgaben/linuxuser/2020/12/mit-archivebox-webseiten-auf-der-festplatte-archivieren/

  • "ArchiveBox:开源的WEB存档" https://zhen.bushini.de/14738.html / https://www.1fishsauce.com/?p=4206

  • "两个基于爬虫的项目: Kiwix & ArchiveBox" https://blog.csdn.net/JackLang/article/details/108328791

  • "如何创建自己的私人自托管即时阅读应用程序" https://www.pcpc.me/tech/self-hosted-read-later-app

  • "How to install ArchiveBox to preserve websites you care about" https://blog.sleeplessbeastie.eu/2019/06/19/how-to-install-archivebox-to-preserve-websites-you-care-about/

  • "How to remotely archive websites using ArchiveBox" https://blog.sleeplessbeastie.eu/2019/06/26/how-to-remotely-archive-websites-using-archivebox/

  • "How to Create Your Own Private Self-Hosted Read-It-Later App" https://www.makeuseof.com/tag/self-hosted-read-later-app/

  • "How to use CutyCapt inside ArchiveBox" https://blog.sleeplessbeastie.eu/2019/07/10/how-to-use-cutycapt-inside-archivebox/

  • "Automate ArchiveBox with Google Spreadsheet to Backup your internet" https://manfred.life/archivebox

  • "【デモ有♪】ConoHaのArchiveBoxアプリケーションを使ってみたよ" https://qiita.com/CloudRemix/items/691caf91efa3ef19a7ad

  • "WEB-ARCHIV TEIL 8: WALLABAG UND ARCHIVEBOX" http://webermartin.net/blog/web-archiv-teil-8-wallabag-und-archivebox/

  • https://metaxyntax.neocities.org/entries/7.html

ArchiveBox Discussions in News & Social Media


Communities

Most Active Communities


Web Archiving Communities

Follow these technological and organizational archiving hubs for the latest archiving news.


General Archiving Foundations, Coalitions, Initiatives, and Institutes

Find your local archiving group in the list and see how you can contribute!

You can find more organizations and initiatives on these other lists:


ArchiveBox Community Resources

ArchiveBox Chat Rooms

ArchiveBox on Social Media

ArchiveBox on Package Distribution Platforms


arrow-up-right arrow-up-right

^ Back to Top ^

Last updated