abx-spec-behaviors @ v0.1.0 [DRAFT]
Last updated
Last updated
Proposal to allow user scripts to be shared between different browser automation / scraping / crawling tools.
🤔 To scrape Reddit comments using
playwright
today, you'd probably Googlereddit playwright
, attempt to copy/paste some examples, and likely end up writing your own code to scroll pages, wait for lazy loading, expand comments, extract as JSON, etc.🚀 Instead, imagine if a simple Github search for
reddit topic:abx-behavior
yielded hundreds of community-mainted, spec-compliantpuppeteer
/playwright
/webdriver
/etc.).
This spec defines a common format for user scripts + some core events that can be triggered from any browser automation environment.
🎭 Behavior
s can define event listeners for normal window
DOM events, but also for puppeteer lifecycle events, service worker / browser extension events, and other events that your crawling environment may choose to dispatch (see below for examples). It's one step up from Greasemonkey user scripts, with additional inspiration from browsertrix-behaviors
.
Dependencies: None, uses native JS EventTarget
API, works consistently across browser and Node.
Easy to Run: import {BehaviorBus} from
'behaviors.js'
(< 500 lines), load Behavior
s, fire PAGE_LOAD
[!IMPORTANT] This is an early-stage proposal, we're seeking feedback from tool makers who build with browser automation!
To create an inter-operable spec that allows scraping projects to share browser automation scripts.
Everyone scraping today has to hide the same popups / block the same ads / log into the same sites / get around the same CAPTCHAs / expand the same comments, leading to a massive duplication of effort. Most projects manually write their own scripts for every site they want to scrape, and there's no good way to share those scripts consistently.
Greasemonkey grew into a huge community because their very very simple spec allows anyone to quickly write a function and share it in a way that's compatible with many different driver extensions (e.g. Tampermonkey, ViolentMonkey, FireBug, etc.).
This Behavior
spec proposal aims to do something similar, but for slightly more powerful user scripts that can leverage puppeteer
, playwright
, and other crawling & scraping driver APIs.
No one wants to maintain all the user scripts needed effectively crawl millions of different websites alone.
Here are some examples of things that could be implemented as Behavior
s and shared between tools:
scroll down to load infiniscroll content
expand/unroll reddit/twitter comment threads automatically
auto-solve CAPTCHAs
log into a site using some saved credentils
dismiss modals / cookie consent popups / privacy policies
block ads requests / remove ads elements from page
extract youtube videos/audio/subtitles to mp4/mp3/sub files
export discovered outlink URLs to a Google Sheet
send some page content to an LLM with a prompt and store the response
and more...
We're aiming to foster easier collaboration & sharing of browser automation snippets between communities like these:
https://ArchiveBox.io
https://webrecorder.net (https://github.com/webrecorder/browsertrix-behaviors)
https://archive.org
https://conifer.rhizome.org
https://linkwarden.app
https://github.com/gildas-lormeau/singlefile
https://github.com/bellingcat/auto-archiver
https://docs.anthropic.com/en/docs/build-with-claude/computer-use
https://docs.anthropic.com/en/docs/build-with-claude/tool-use / and other AI function calling systems
https://reset.tech
https://mediaforcellc.com
https://www.starlinglab.org
Want to collaborate? Join us on the ArchiveBox Zulip or WebRecorder Discord, or open an issue.
Key Concepts:
Behavior
: a plugin that implements some event listener hook methods
BehaviorBus
: an event bus that coordinates emitting events and firing matching listeners
BehaviorEvent
: an event {type: 'PAGE_LOAD', url}
that goes through a BehaviorBus
BehaviorDriver
: navigates to URLs, sets up BehaviorBus
instances for browser/puppeteer/extensions, registers all the Behavior
event listeners, and fires main crawl lifecycle events
Behavior
Behaviors are the main focus of this proposal. A Behavior
is a plain JS object containing some metadata fields (name
, schema
, version
, description
, ...) and some hooks
(methods that get called to manipulate a page during crawling).
A simple one like HideModalsBehavior
might only provide one hook window: PAGE_LOAD
that deletes div.modal
from the DOM.
A more complex behavior like ExpandComments
might provide a window: PAGE_LOAD
hook that expands <details>
elements in the body, but it could also provide an extra puppeteer: PAGE_LOAD
hook that will run if the crawling environment uses puppeteer. The Behavior
is usable whether you're automating via browser extension or headless browser, because you can run it as long as you have window
, but when puppeter's extra powers (e.g. $$('pierce/...
) are available, the Behvior
provides extra functionality that makes it work across shadow DOMs and inside <iframe>
s.
If we all agree to use a minimal shared event spec like this then can we all share the benefit of community-maintained pools of "Behaviors" organically on Github. You can build a fancy app store style interface in your own tool and just populate it with all Github repos tagged with abx-behavior
+ yourtoolname
. Different crawling tools can implement different events and listeners, and when they dispatch events on BehaviorBus
during crawling, BehaviorBus
will run any Behavior
s that respond to those events. You get opt-in plugin functionality for free based on the events you fire, and you barely have to modify existing crawling code at all.
[!TIP] Almost all
Behavior
s will only need a singlePAGE_LOAD
orPAGE_CAPTURE
method to implement their functionality (under thewindow
context). Hooks for other contexts are only to be used when aBehavior
author wants to provide some extra bonus functionality for specific contexts (e.g.puppeteer
,serviceworker
, etc.).
This Spec is A-La-Carte
You can be minimalist and only fire PAGE_LOAD
if you don't want your crawling tool offer a big surface area to Behavior
scripts, or if you want all the functionality plugins have to offer, you can fire all the lifcycle events like PAGE_SETUP
PAGE_CAPTURE
PAGE_CLOSE
, etc.
Different browser automation environments provide different APIs to access the page during crawling. We expect all environments to provide window
, but we also provide BehaviorBus
implementations for other contexts like puppeteer
's page
, or serviceworker
's window
, playwright
, and more.
Behavior
hooks
methods are grouped by the name of the context they expect (e.g. window
), and they'll only trigger if you provide that context during your crawl.
Behavior
UsageYour crawling code should set up a new BehaviorBus()
for each context you'll have available, then attach that context (e.g. window
or puppeteer
's page
object) + the Behavior
s to run and link the busses together. When the page is ready, fire the main lifecycle events to trigger the Behaviors
.
Behavior
ExamplesTo see more example behaviors, check out: src/example_behaviors.js
and behaviors/
.
Behavior
CompositionIf you want to have a Behavior
depend on the output of an earlier one, it can simply listen for the relevant events it needs.
No API is provided for Behaviors to directly depend on other specific behaviors (e.g. depends_on: ['SomeOtherBehavior']
), and in general trying to do so is strongly discouraged.
By listening for a generic event, it allows users to swap out ScreenshotBehavior
for a different screenshot implementation, as long as it emits the same EXTRACTED_SCREENSHOT
event.
Strive for "loose coupling" / duck typing, the only hard contracts between behaviors are the EVENT_NAME
+ args they emit/listen for.
Respect the UNIX philosophy: Expect the output of every program to become the input to another, as yet unknown, program.
.
BehaviorBus
BehaviorBus
extends EventTarget
, a simple event bus that can consume/emit events + trigger event listeners.
BehaviorEvent
extends CustomEvent
, both use the native JS event system (and work the same as DOM events).
BehaviorBus
UsageA new BehaviorBus
should be set up for each context as soon as page loading starts.
See src/behaviors.js
for the full implementation.
BehaviorBus
ExamplesBehavior
s define some event listener hooks, which get attached to the BehaviorBus
by BehaviorBus.attachBehaviors([...])
:
BehaviorBus
instances get connected across contexts
BehaviorEvent
BehaviorEvent
extends CustomEvent
which is the standard Event
type that browsers use for all DOM events.
BehaviorEvent
UsageEvents can be dispatched by calling BehaviorBus.emit({type: 'EVENT_TYPE', ...})
from any context:
Each event should include relevant context in its payload such as URLs, extracted text, file paths, selectors, etc. Events can contain plain JSON-serilizable values only, don't put raw DOM element handles or special objects like window
into events.
Event type names (e.g. PAGE_LOAD
) should follow these principles:
Use existing DOM event names where applicable
Use NOUN + present tense VERB pattern for events typically fired by driver, that hooks react to (e.g., PAGE_SETUP
, PAGE_LOAD
, PAGE_CHANGE
, PAGE_CLOSE
)
Use past tense VERB + NOUN pattern e.g. DISCOVERED_VIDEO
or EXTRACTED_VIDEO
when a Behavior is reporting a content discovery or extraction it made
Include _COMPLETE
suffix for events that report the ending of a process
Include _ERROR
suffix for error variants of events
A driver striving to be feature-complete should emit all these lifecycle events to the BehaviorBus
at the correct times, however it is not required for it to emit all of them.
A simple driver may only emit PAGE_LOAD
for example, but it would miss out on any more complex Behavior
plugin functionality that might depended on PAGE_SETUP
.
PAGE_SETUP
: Fired when page navigation starts but before DOM is ready (equivalent to document.readystate = 'loading'
)
DOM_CONTENT_LOADED
: Fired when initial HTML is loaded and parsed (maps directly to DOM event)
PAGE_LOAD
: Fired when page has finished loading including images/styles (equivalent to window.onload
)
PAGE_IDLE
: Fired when page has been idle with no network activity for 2+ seconds
PAGE_CAPTURE
: Fired when it's time to extract content/take snapshots of the page
PAGE_CAPTURE_COMPLETE
: Fired when all capture/extraction operations are finished
PAGE_BEFORE_UNLOAD
: Fired before page is about to be unloaded (maps to window.onbeforeunload
)
PAGE_UNLOAD
: Fired when page is being unloaded (maps to window.onunload
)
A driver that expects Behaviors
(e.g. ExtractArticleText
) to output files to the filesystem needs to listen for these events and provide implementations for them. e.g. if you're in node you could handle FS_WRITE_FILE
by calling fs.writeFileSync(event.path, event.content)
, but if you are running Behaviors
from a browser you may need to use OPFS instead.
FS_WRITE_FILE
: Fired when a Behavior
is requesting to write a file
FS_MAKE_DIR
: Fired when requesting to create a directory (optional)
FS_DELETE_FILE
: Fired when requesting to delete a file (optional)
FS_REMOVE_DIR
: Fired when requesting to remove a directory (optional)
A driver could choose to implement these if it wants to allow Behaviors
to use LLM APIs to do things. Behaviors should do LLM logic using these events, as then they be used with any LLM backend of the driver's choosing. Behaviors then won't have to hardcode their own internal logic to make calls to Open AI or Anthropic's APIs, and it makes it easier to swap in and out models depending on context.
LLM_REQUEST
: Fired when a Behavior wants to call whatever AI/LLM API might be provided by the driver
LLM_REQUEST_COMPLET
: Fired when AI/LLM processing completes
LLM_REQUEST_ERROR
: Fired when AI/LLM processing fails
... you coordinate other custom event types for your own private APIs too ...
Behaviors working with these types of content should emit these events when they discover relevant content on the page. You might have a Behavior
that scans <a href>
links on the page, have it emit DISCOVERED_OUTLINK
for each one it finds. Then if your driver wants to do recursiving crawling, it could listen for DISCOVERED_OUTLINK
events on the BehaviorBus
, and add the reported URLs to its crawl queue.
DISCOVERED_OUTLINK
: Fired when a new URL is found that could be crawled
DISCOVERED_IMAGE
: Fired when an image resource is found
DISCOVERED_VIDEO
: Fired when a video resource is found
DISCOVERED_AUDIO
: Fired when an audio resource is found
DISCOVERED_DOWNLOAD
: Fired when a download link (ZIP/PDF/DOC/EXE/etc.) is found
DISCOVERED_FEED
: Fired when an RSS/Atom feed is found
DISCOVERED_API
: Fired when an API endpoint is found
DISCOVERED_FORM
: Fired when an interactive form is found
DISCOVERED_TEXT
: Fired when significant text content is found
When content has been extracted out of a page and saved as a file somewhere.
EXTRACTED_METADATA
: Fired when page metadata has been collected
EXTRACTED_SCREENSHOT
: Fired when a screenshot has been taken
EXTRACTED_PDF
: Fired when a PDF has been generated
EXTRACTED_WARC
: Fired when an archive file has been created
Behaviors can choose to emit these when emulating user stpes on a page / listen for them being emitted from other behaviors. These events don't do anything on their own and are not required, it's just recommended to announce these to make it easier for other plugins to listen for changes and coordinate their own logic.
SCROLL
: Announce whenver a page's croll position is changed
SCROLL_COMPLETE
: Fired when a sequence of scroll operations is finished
FORM_SUBMIT
: Fired when attempting to submit a form
FORM_SUBMIT_COMPLETE
: Fired when form submission is finished
CLICK
: Fired when programmatically clicking an element
HOVER
: Fired when programmatically hovering over an element
INPUT
: Fired when programmatically entering text into a field
INPUT_COMPLETE
: Fired when a sequence of text input operations is finished
DIALOG_OPEN
: Fired when a modal/dialog opens
DIALOG_CLOSE
: Fired when a modal/dialog closes
BehaviorDriver
BehaviorDriver
s are actually just Behavior
s like any other, with the same metadata fields + hooks
.
The only distinction is that BehaviorDriver
s generally implement hooks
to handle the discovery events that Behavior
s use to announce outputs that you can do something with e.g. extracted video/audio/text, URLs to add to crawl queue, etc...
If a crawling project wants to use Behavior
s to extract things out of pages during a crawl, then it should implement a BehaviorDriver
to listen for the announcements about content it cares about.
Like normal Behavior
s, BehaviorDriver
s also can also maintain some state
internally (if needed).
To see how drivers might implement the core event handlers differently, check out the example drivers:
BehaviorDriver
UsageHere's how you can test a driver:
Here's the example output from a full puppeteer crawl run with all the example Behavior
s:
Proposal Discussions: ArchiveBox Zulip and WebRecorder Discord
Development Accouncement: https://docs.sweeting.me/s/archivebox-plugin-ecosystem-announcement
Browsertrix's existing behaviors system: https://github.com/webrecorder/browsertrix-behaviors
Built on: https://developer.mozilla.org/en-US/docs/Web/API/EventTarget
Inspired by: https://pluggy.readthedocs.io/en/stable/index.html