Hello friends,
Today I want to talk about one of the most exciting parts of Ferret v2: new language capabilities that are now available directly in the syntax.
Ferret v2 is not meant to be a completely different language. The goal is to preserve the original spirit of Ferret - a practical language for extracting structured data from messy websites - while making the language more expressive, more consistent, and easier to extend.
The familiar Ferret style remains: small scripts, readable data flow, structured values, and extraction-focused operations. The new syntax is there to make those scripts clearer, not to make Ferret feel like a large general-purpose programming language.
That means adding new keywords, statements, and expressions where they help describe common extraction workflows more clearly: querying data, shaping arrays, building strings, dispatching events, updating and deleting values, handling failures, waiting for conditions, aliasing module namespaces, and defining reusable functions.
So this post is not about replacing Ferret’s identity. It is about giving the language a stronger foundation for the next stage of the project.
Why add new language capabilities?
Ferret v1 proved that a query-oriented language can be useful for web automation and data extraction. It allowed developers to describe browser interaction and data shaping in a way that felt more focused than stitching together low-level automation calls.
That idea is still at the center of Ferret v2.
But as Ferret grew, it became clear that some concepts deserved to be represented more directly in the language. Querying is one example. Waiting for something to appear is another. Dispatching an event, updating a value, or choosing an error strategy are all common parts of extraction workflows.
In Ferret v1, some of these ideas had to be expressed indirectly through functions or domain-specific helpers. That works, but it has limits. It can make scripts harder to read, harder to optimize, and harder to explain through good diagnostics.
Ferret v2 introduces syntax-level support for these concepts not because the language needs to look different, but because these operations are important enough to be first-class.
A function call can perform the same work, but it cannot always describe the same intent.
For example, a compiler can understand that QUERY ".product-item" IN document USING css is a query operation. It can attach better diagnostics to the selector, apply query-specific policies, validate dialect support, and eventually optimize or trace the operation as part of the execution model.
With a plain function call, most of that meaning is hidden behind an arbitrary host function boundary.
This also matters for error reporting. When the compiler understands that a piece of code is a query, a dispatch operation, a wait condition, or an assignment path, it can point to the part of the script that actually caused the problem. Better syntax gives Ferret better structure, and better structure leads to better diagnostics.
The goals are simple:
- keep common extraction workflows readable.
- make behavior explicit where it matters.
- make the language easier to compile and optimize.
- improve diagnostics by giving the compiler clearer structure.
- make Ferret extensible beyond a single hardcoded domain.
In other words, Ferret v2 is still Ferret. It just has a more capable language foundation.
Query as a language capability
The most important example is querying.
In Ferret v1, querying was strongly associated with documents, elements, and selectors, but was implemented via host functions like ELEMENTS or XPATH.
In Ferret v2, querying becomes a general language capability. A value can support queries if it implements the queryable capability.
The important part is USING css. The query string does not define the meaning by itself. The dialect does.
That means the same language construct can work with different query dialects:
The language does not need separate statements for CSS selectors, XPath, SQL, JSONPath, or future query systems.
Instead, the target value and the selected dialect decide how the query is interpreted.
This is one of the main design directions in Ferret v2: keep the syntax stable, but allow capabilities to define behavior.
Query modifiers
Queries often need different result shapes. Sometimes you want all matches. Sometimes you want one match. Sometimes you only care whether something exists or how many matches there are.
Ferret v2 introduces query modifiers for those cases:
By default, QUERY returns the normal result shape for the selected dialect and target value. For CSS-style document queries, that usually means a collection of matching elements.
QUERY EXISTS returns a boolean value:
QUERY COUNT returns the number of matches:
QUERY ONE returns a single matching result:
This is useful for the common case where a script expects one element and does not want to query a collection and then index into it manually.
Instead of writing:
A script can express the intent directly:
The modifier makes cardinality visible. A query that returns all matches is different from a query that returns one match. A query that checks existence is different from a query that counts matches.
By making that intent visible in the syntax, Ferret can make scripts easier to read and potentially easier to optimize.
It also gives the runtime and diagnostics a clearer model. If a script asks for one result, Ferret can treat that as a distinct operation instead of a collection query followed by an index access.
Query shorthand
For common cases, the long form can be too verbose. Ferret v2 also supports shorthand query expressions.
The regular shorthand uses ~:
This is equivalent to a regular query:
When a script expects a single result, Ferret also supports the ~? shorthand:
This is equivalent to QUERY ONE:
The distinction is small, but important.
~ asks for the normal query result shape. For CSS-style document queries, that usually means a collection of matching elements.
~? asks for one matching result.
That makes the common “give me the first matching thing” case more readable without forcing scripts to query a collection and then index into it manually:
More dynamic query expressions should still use the long form:
The same applies to other dialects that need explicit options:
This gives Ferret both sides: concise syntax for everyday cases, and explicit syntax for cases that need more control.
The shorthand forms are intentionally limited. They should make common scripts pleasant to write, but they should not become a second full query language hidden inside brackets.
Array operators for data shaping
Web extraction rarely returns one perfectly shaped value.
More often, a script gets a list of elements, rows, links, products, or records, and then needs to transform that list into clean structured output.
Ferret v2 brings array operators inspired by ArangoDB’s AQL into the language to make this kind of data shaping easier to express.
Instead of writing loops for every small transformation, scripts can map, filter, slice, and project arrays directly.
The basic operators cover the most common cases:
[n]accesses an array element by index.[*]expands an array and allows projecting fields from each item.[**],[***], and deeper forms flatten nested arrays.[* FILTER ...]filters an array while expanding it.[* LIMIT ...]limits the expanded result.[* RETURN ...]projects each expanded item into a new shape.[? ...]checks whether an array contains values matching a condition.
The [*] operator is especially useful after queries. For example, extracting all link targets from a page can stay compact:
Inline expressions make filtering and projection more explicit. They use . to refer to the current array item:
FILTER, LIMIT, and RETURN can be combined in that order:
The question-mark operator is different. It does not return the filtered items. It answers whether matching items exist.
This returns a boolean value.
That distinction matters: use [* FILTER ...] when you want the matching values, and [? ...] when you want to test whether matching values exist.
Array contraction is useful when querying nested collections:
This is especially useful after a query. QUERY gives the script a collection of values, and array operators help turn that collection into the shape the caller actually needs.
The goal is the same as with the rest of Ferret v2 syntax: make common extraction workflows readable without forcing every small data transformation into a manual loop.
Dispatch as a language capability
Many values are not only queryable. They can also receive events, commands, or signals.
In browser automation, clicking an element is the obvious example. But dispatch is not limited to DOM events. A queue, actor, stream, workflow, or custom host object could also expose dispatch behavior.
Ferret v2 represents this with DISPATCH:
With payload and options, the same statement stays explicit:
For simple payload-less signals, Ferret v2 also has a concise shorthand:
This reads as: send the click signal to button.
The shorthand is intentionally narrow. Once payloads, options, or more explicit behavior are needed, the long form is clearer.
Again, the important part is not only the browser use case. The important part is that dispatch becomes a language-level operation over values that support the dispatchable capability.
The value itself decides how to interpret the signal, what to do with the payload, and how to handle options.
Values with dispatch capabilities are provided by registered host modules.
Match for structured control flow
Ferret v2 also improves control flow with MATCH.
The goal is not to make Ferret feel imperative. The goal is to provide enough control flow for real extraction logic while preserving the language’s declarative feel.
Ternary expressions are still useful for small choices. MATCH is meant for cases where branching logic grows beyond one or two conditions and benefits from a more structured form.
Guard-style matching can express condition-based branching:
Scrutinee-style matching can inspect a value directly:
Ferret v2 also supports object pattern matching. This is useful when a script needs to branch based on the shape or selected fields of a value:
This makes common extraction and normalization logic easier to express. Instead of pulling fields out first and then writing a chain of conditions, the match arm can describe the shape it expects and bind the values it needs.
The pattern system will continue to evolve, but object pattern matching is already supported and is part of Ferret’s current direction. MATCH should become the primary way to express structured branching when a script has multiple cases to handle.
This is especially useful in extraction workflows, where scripts often need to classify responses, handle missing values, normalize data, or branch based on page state.
String templates
Ferret v2 also adds string templates for cases where scripts need to build readable strings from values.
Extraction scripts often need to create URLs, format labels, build messages, or normalize output fields. Plain string concatenation works, but it quickly becomes noisy.
With string templates, the same expression is easier to read:
Expressions inside ${...} are evaluated and inserted into the final string:
String templates are especially useful when building dynamic query strings or URLs:
They also work well for shaping final output:
The goal is simple: keep common string-building cases readable without forcing scripts into long chains of concatenation.
Mutable values and assignment
Ferret has traditionally favored query-style expression flow, but some tasks are simply easier with local mutable state.
Ferret v2 separates immutable and mutable bindings:
LET remains immutable. VAR is explicit. Reassignment is allowed only when the nearest binding is mutable.
The second example is invalid because attempts was declared with LET.
This keeps mutation available without making every binding mutable by default.
One important distinction is that LET prevents rebinding the variable itself. It does not necessarily make the underlying value deeply immutable. If a value supports mutation, its fields may still be updated.
For objects and other mutable values, Ferret uses familiar assignment syntax:
Safe access also applies naturally to mutation paths:
The goal is not to make Ferret more imperative than necessary. The goal is to support the cases where mutation describes the workflow more naturally, while preserving immutability as the default style.
Deleting properties
Assignment lets scripts create or update values, but extraction workflows also often need to remove data.
A script may need to clean up intermediate fields, remove deprecated metadata, drop optional values, or normalize an object before returning it.
Ferret v2 adds DELETE for removing properties from mutable values:
This removes the final property in the path. It does not delete the whole object. In this example, profile remains, but profile.deprecated is removed.
Bracket access is supported as well:
Safe access can be used when intermediate values may be missing:
This makes deletion safe and explicit. If the path cannot be reached because a safe segment evaluates to NONE, the operation does nothing.
Like assignment, DELETE works on values that support mutation. The statement describes the operation, while the value decides how that operation is applied.
The goal is to make cleanup and normalization code readable without introducing helper functions for basic property removal.
Waiting as part of extraction
Waiting is another operation that deserves first-class treatment in web extraction.
Pages are dynamic. Data may appear after a network request, a DOM update, an animation, or a client-side route change. In Ferret v1, this kind of behavior often had to be expressed through helper functions or custom retry logic.
Ferret v2 makes waiting explicit:
This describes the operation directly: evaluate the value repeatedly, use a polling interval, stop after a timeout, and choose a fallback if the condition is not met.
The result is easier to read than hand-written retry logic, and easier for the runtime to trace, optimize, and explain.
Waiting for network activity
Some extraction workflows depend not only on the DOM, but also on network activity.
Modern pages often load data lazily through background requests. A product list, search result, price, availability status, or recommendation block may appear only after one or more API calls finish.
The CDP driver exposes network lifecycle events that can be observed from Ferret scripts:
network.request_startednetwork.response_receivednetwork.request_finishednetwork.request_failednetwork.idle
This makes it possible to wait for network behavior directly instead of guessing with fixed delays.
For example, a script can wait until the page becomes network-idle before querying the DOM:
A script can also wait for a specific request to finish before reading the updated page state:
Network events are also useful for debugging or collecting metadata from a page session:
The important part is that network activity becomes observable through the same language-level waiting model. Ferret does not need a separate WAITFOR NETWORK construct. The CDP driver can expose network activity as events, and WAITFOR EVENT can observe them.
This keeps the language general while still supporting browser-specific workflows.
Error and timeout policies close to the operation
Web data extraction often fails for normal reasons: a page is slow, an element is missing, a network request times out, or a site returns an unexpected response.
In Ferret v2, failure policy can live close to the operation that may fail.
Timeout behavior can be expressed in a similar way where supported:
This keeps the happy path readable while making fallback behavior explicit.
It also gives Ferret a clearer execution model. A timeout policy or fallback value is not hidden inside arbitrary user code. It is part of the operation itself.
User-defined functions
Ferret v2 is also moving toward user-defined functions.
The goal is not to turn Ferret into a general-purpose application language. The goal is to let scripts define reusable extraction and normalization logic directly where it belongs.
Functions are especially useful for normalization logic: parsing prices, cleaning text, mapping statuses, extracting IDs, and turning inconsistent page data into stable output shapes.
For larger functions, the block form gives enough structure without relying on indentation-sensitive syntax or END markers.
For smaller functions, the body can stay compact:
This should make Ferret scripts easier to organize while keeping the language focused.
Control flow using pattern matching and user-defined functions
With user-defined functions and pattern matching, Ferret can express more complex logic without relying on host functions or external code.
This is an important step toward making Ferret a more self-contained language for data extraction and processing.
Where values with capabilities come from
One important question is where these values with capabilities come from.
A Ferret script does not manually attach capabilities to a value. Capabilities come from the runtime, modules, and host applications that embed or extend Ferret.
For example, an HTML module can expose a document value that supports CSS queries:
A DOM element can be both queryable and dispatchable:
A database module could expose a connection or table value that supports SQL queries:
A host application embedding Ferret can also provide its own values. That could be a queue, cache, workflow object, browser session, API client, or any other domain-specific object.
From the script’s point of view, these values participate in the same language constructs. The script does not need to know whether a value came from the standard library, a contrib module, or the embedding application. It only matters which capabilities the value exposes.
As the compiler and runtime mature, this model also gives Ferret a clearer way to report capability errors. The problem is not just that a function failed, but that a value does not support the operation being requested.
For example, QUERY ... IN value USING css requires the target value to support querying with the selected dialect. If it does not, Ferret can report that directly.
For the Ferret ecosystem, this separation matters.
Core Ferret can stay small, while modules add support for new document types, protocols, storage systems, APIs, browser integrations, and runtime-specific behavior without requiring new syntax for each one.
That boundary keeps the language compact while leaving plenty of room for extension at the runtime and module level.
It is also the reason HTML drivers are no longer treated as part of the language core. They were moved to a separate repository to make it easier to iterate on browser-specific behavior without affecting the core language. The same applies to other domain-specific modules that may come in the future.
Namespace aliases with USE
Ferret v2 also improves how scripts work with modules.
Modules can expose host functions, but fully-qualified names can become noisy when a script repeatedly uses functions from the same module or namespace.
USE lets a script create a local alias for a fully-qualified namespace or symbol:
This keeps module-based scripts readable without pulling an entire module directly into local scope.
The important part is that USE does not hide where functionality comes from. It creates an explicit local alias for a fully-qualified namespace or symbol, so scripts can stay concise while still making the shortcut visible at the top of the file.
This keeps Ferret explicit while making module-heavy scripts easier to read.
The bigger idea: capability-oriented syntax
The common thread behind these additions is capability-oriented design.
Ferret v2 does not need every domain to become a new language feature. Instead, the language provides a small set of operations that can work across different kinds of values:
QUERYfor values that can be queried.- array operators for transforming and filtering collections.
- string templates for readable string construction.
DISPATCHfor values that can receive events or signals.WAITFORfor values or expressions that can be observed over time.MATCHfor structured branching.- assignment and
DELETEfor local/object mutation and property removal. USEfor explicit aliases to fully-qualified names.- operation-level policies for errors and timeouts.
The goal is not to turn every useful library operation into syntax. Only operations that shape the structure of extraction workflows should become language constructs.
The syntax stays focused, while capabilities allow modules and host applications to define what those operations mean for their own values.
That is what makes Ferret more than a browser scripting language. Browser automation remains an important use case, but the language is being shaped as a programmable data extraction engine.
Ferret should help developers turn messy websites and data sources into clean structured data. These language-level capabilities are designed around that goal.
Try it in the playground
The best way to understand the new syntax is to try it.
Ferret v2 is available in the playground, so you can experiment with queries, array operators, string templates, pattern matching, waiting, and other language features directly in the browser:
Try Ferret v2 in the Playground
The playground is especially useful for small examples. You can tweak selectors, change return shapes, test query modifiers, and get a feel for how the new language capabilities work together.
Closing thoughts
Ferret v2 is still evolving, but the direction is intentional.
The language should stay concise for common extraction tasks, explicit when behavior matters, and structured enough for better diagnostics, optimization, and tooling.
The new language capabilities are not about changing Ferret for the sake of change. They are about giving important extraction concepts a clear place in the language.
Ferret v2 is still Ferret: practical, focused, and built for developers who need reliable structured data from messy sources.
It just has a stronger foundation now.