workerbee
v0.16.6
Published
Utilities for writing Cloudflare Workers
Downloads
17
Readme
Worker Bee: 🐝 Cloudflare Worker Composer ☁️
Toolkit for composing Cloudflare Workers, focused on the use case of having an upstream server, and wanting to conditionally manipulate requests and responses.
Example Uses
- All requests to
/landing-page/
should strip that subdirectory and proxy from Netlify instead of your normal server. - Requests from the
googleweblight
user agent should haveCache-Control: no-transform
set on the response. - Cookies should be stripped for requests to the
/shop/
section of your site. - UTM parameters and Facebook click IDs should be removed from requests to your server to increase cacheability.
- WordPress users should not be logged in on the front of the site unless they’re previewing a post.
- Make your entire site HTTPS except for one section.
- Make all images use browser-native lazy loading.
If you'd like, jump straight to the examples.
Table of Contents
Concepts
Cloudflare Worker Utilities is based around three main concepts:
- Handlers — Functions that are run when a request is being received, and/or a response from the server/cache is coming back. They can change the request/response, deliver a new request/response altogether, or conditionally add other handlers.
- Routes — Host/route request path patterns with handlers thare are only added only for requests that match the pattern.
- Conditions — Functions that determine whether a handler should be applied.
Usage
- Bootstrap your Cloudflare Worker, using Wrangler. Make sure you’re using Webpack.
npm i workerbee
from your Worker directory.- In your Worker, import
handleFetch
and provide an array of request/response handlers, and/or route-limited request/response handlers.
Example:
import handleFetch from 'workerbee'
handleFetch({
request: requestHandlers, // Run on every request.
response: responseHandler, // Run on every response.
routes: (router) => {
router.get('/test', {
request: requestHandlers, // Run on matching requests.
response: responseHanders, // Run on responses from matching requests.
})
router.get('/posts/:id', {
request: requestHandlers, // Run on matching requests.
response: responseHandlers, // Run on responses from matching requests.
})
},
})
Top level request and response handlers will be run on every route, before any route-specific handlers.
For all places where you specify handlers, you can provide one handler, an array of handlers, or no handlers (null, or empty array). Routes can also accept variadic handlers, which will be assumed to be request handlers.
Lifecycle
It goes like this:
Request
is received.- The
Request
loops through all request handlers (global, and then route). - If early
Response
wasn’t received, the resultingRequest
object is fetched (from the cache or the server). - The resulting
Response
object is passed through the response handlers (global, and then route). - The response is returned to the client.
┌──────────────────┐
│ Incoming Request │
│ to your Worker │
└──────────────────┘
│
▼
.───────────────.
( Matches route? )───Yes─┐
`───────────────' │
│ ▼
│ ┌───────────────────────┐
No │ Append route handlers │
│ │ to global handlers │
│ └───────────────────────┘
│ │
└───────┬────────┘
│
▼
┌─────────────────┐
│ Run next │
┌────────▶│ request handler │
│ └─────────────────┘
│ │
│ ▼
│ .─────────────────────────────.
│ ( Handler returned a Response? )───┐
│ `─────────────────────────────' │
│ │ Yes
│ No │
Yes │ │
│ ▼ │
│ .───────────────. │
└─────────( More handlers? ) │
`───────────────' │
│ │
No │
│ │
▼ │
.─────────────────────. │
┌─────( Request in CF cache? )────┐ │
│ `─────────────────────' │ │
Yes No │
│ ┌────────────┐ ┌────────────┐ │ │
│ │ Fetch from │ │ Fetch from │ │ │
└─▶│ cache │ │ server │◀─┘ │
└────────────┘ └────────────┘ │
│ │ │
└───────┬───────┘ │
│ │
▼ │
┌──────────┐ │
│ Response │◀─────────────┘
└──────────┘
│
▼
┌──────────────────┐
│ Run next │
┌──▶│ response handler │
│ └──────────────────┘
│ │
Yes ▼
│ .───────────────.
└────( More handlers? )
`───────────────'
│
No
│
▼
┌────────────────┐
│ Final Response │
└────────────────┘
Routing
The router has functions for all HTTP methods, plus router.all()
which matches
any method. e.g. router.get(path, handlers)
, router.post(path, handlers)
.
The path argument uses the path-to-regexp library, which has good support for positional path parameters. Here’s what various routes would yield for a given request:
| Pattern | 🆗 | URL | Params |
| -------------------------- | --- | ---------------------------------- | ------------------------------------ |
| /posts/:id
| | | |
| | ✅ | /posts/123
| {id: "123"}
|
| | ✅ | /posts/hello
| {id: "hello"}
|
| | ❌ | /posts
| |
| /posts/:id?
| | | |
| | ✅ | /posts/123
| {id: "123"}
|
| | ✅ | /posts/hello
| {id: "hello"}
|
| | ✅ | /posts
| {}
|
| | ❌ | /posts/hello/another
| |
| /posts/:id(\\d+)/:action
| | | |
| | ✅ | /posts/123/edit
| {id: "123", action: "edit"}
|
| | ❌ | /posts/hello/edit
| |
| /posts/:id+
| | | |
| | ✅ | /posts/123
| {id: ["123"]}
|
| | ✅ | /posts/123/hello/there
| {id: ["123", "hello", "there"]}
|
| /posts/:id*
| | | |
| | ✅ | /posts
| {}
|
| | ✅ | /posts/123
| {id: ["123"]}
|
| | ✅ | /posts/123/hello
| {id: ["123", "hello"]}
|
| /bread/:meat+/bread
| | | |
| | ✅ | /bread/turkey/bread
| {meat: ["turkey"]}
|
| | ✅ | /bread/peanut-butter/jelly/bread
| {meat: ["peanut-butter", "jelly"]}
|
| | ❌ | /bread/bread
| |
| /mother{-:type}?
| | | |
| | ✅ | /mother
| {}
|
| | ✅ | /mother-in-law
| {type: "in-law"}
|
| | ❌ | /mothers
| |
If you want to match a path prefix and everything after it, just use a wildcard
matcher like /prefix/:any*
(and then just ignore what gets matched by :any*
).
Note that a trailing slash will match, so /posts/
will match /posts
.
Go read the path-to-regex documentation for more information.
You can also limit your routes to a specific host, like so:
import handleFetch, { forbidden, setRequestHeaders } from 'workerbee'
handleFetch({
routes: (router) => {
router.host('example.com', (router) => {
router.get('/', setRequestHeaders({ 'x-foo': 'bar' }))
})
router.host('*.blogs.example.com', (router) => {
router.all('/xmlrpc.php', forbidden())
})
},
})
This makes it trivial to set up a Worker that services multiple subdomains and routes, instead of having to maintain a bunch of independent Workers.
Handlers
Handlers are functions (preferably async
functions). They are passed an object
that contains:
{
addRequestHandler(),
addResponseHandler(),
addCfPropertiesHandler(),
setRedirectMode(),
originalRequest,
request,
response,
current,
params,
phase,
}
addRequestHandler(handler, options)
— dynamically adds another request handler (pass{immediate: true}
to add it as the first or next handler).addResponseHandler(handler, options)
— dynamically adds another response handler (pass{immediate: true}
to add it as the first or next handler).addCfPropertiesHandler(handler)
— adds a callback that receives and returns new properties to pass tofetch()
on thecf
key (see Cloudflare documentation).setRedirectMode(mode)
— sets the redirect mode for the main fetch. Default is 'manual', but you can set 'follow' or 'error'.request
— ARequest
object representing the current state of the request.originalRequest
— The originalRequest
object (might be different if other handlers returned a new request).response
— AResponse
object with the current state of the response. —current
— During the request phase, this will equalrequest
. During the response phrase, this will equalresponse
. This is mostly used for conditions. For instance, theheader
condition works on either requests or responses, as both have headers. Thus it looks at{ current: { headers } }
.params
— An object containing any param matches from the route.phase
— One of"request"
or"response"
.
Request handlers can return three things:
- Nothing — the current request will be passed on to the rest of the request handlers.
- A new
Request
object — this will get passed on to the rest of the request handlers. - A
Response
object — this will skip the rest of the request handlers and get passed through the response handlers.
Response handlers can return two things:
- Nothing — the current response will be passed on to the rest of the repsonse handlers.
- A new
Response
object — this will get passed on to the rest of the request handlers.
Bundled Handlers
The following handlers are included:
setUrl(url: string)
setHost(host: string)
setPath(path: string)
setProtocol(protocol: string)
setHttps()
setHttp()
forbidden()
setRequestHeaders([header: string, value: string][] | {[header: string]: string})
appendRequestHeaders([header: string, value: string][] | {[header: string]: string})
removeRequestHeaders(headers: string[])
setResponseHeaders([header: string, value: string][] | {[header: string]: string})
appendResponseHeaders([header: string, value: string][] | {[header: string]: string})
removeResponseHeaders(headers: string[])
copyResponseHeader(from: string, to: string)
lazyLoadImages()
prependPath(pathPrefix: string)
removePathPrefix(pathPrefix: string)
redirect(status: number)
redirectHttps()
redirectHttp()
requireCookieOrParam(param: string, forbiddenMessage: string)
Logic
Instead of bundling logic into custom handlers, you can also use
addHandlerIf(condition, ...handlers)
together with the any()
, all()
and
none()
gates to specify the logic outside of the handler. Here’s an example:
import {
handleFetch,
addHandlerIf,
contains,
header,
forbidden,
} from 'workerbee'
handleFetch({
request: [
addHandlerIf(
any(
header('user-agent', contains('Googlebot')),
header('user-agent', contains('Yahoo! Slurp')),
),
forbidden(),
someCustomHandler(),
),
],
})
addHandlerIf()
takes a single condition as its first argument, but you can
nest any()
, all()
and none()
as much as you like to compose a more complex
condition.
Conditions
As hinted above, there are several built-in conditions for you to use:
header(headerName: string, matcher: ValueMatcher)
contentType(matcher: ValueMatcher)
isHtml()
hasParam(paramName: string)
hasRouteParam(paramName: string)
param(paramName: string, matcher: ValueMatcher)
routeParam(paramName: string, matcher: ValueMatcher)
isHttps()
isHttps()
The ones that take a string (or nothing) are straightforward, but what’s up with
ValueMatcher
?
A ValueMatcher
is flexible. It can be:
string
— will match if the string===
the value.string[]
— will match if any of the strings===
the value.ValueMatchingFunction
— a function that takes the value and returns a boolean that decides the match.ValueMatchingFunction[]
— an array of functions that take the value, any of which can return true and decide the match.
The following ValueMatchingFunction
s are available:
contains(value: string | NegatedString | CaseInsensitiveString | NegatedCaseInsensitiveString)
startsWith(value: string | NegatedString | CaseInsensitiveString | NegatedCaseInsensitiveString)
endsWith(value: string | NegatedString | CaseInsensitiveString | NegatedCaseInsensitiveString)
These functions can also accept insensitive strings and negated strings with the
text('yourtext').i
and text('yourtext).not
helpers.
addHandlerIf(
header('User-Agent', startsWith(text('WordPress').not.i)),
forbidden(),
)
Note that you can use logic functions to compose value matchers! So the example from the Logic section could be rewritten like this:
import {
handleFetch,
addHandlerIf,
contains,
header,
forbidden,
} from 'workerbee'
handleFetch({
request: [
addHandlerIf(
header(
'user-agent',
any(contains('Googlebot'), contains('Yahoo! Slurp')),
),
forbidden(),
someCustomHandler(),
),
],
})
Two more points:
- The built-in conditionals support partial application. So you can do this:
const userAgent = header('user-agent')
Now, userAgent
is a function that accepts a ValueMatcher
.
You could take this further and do:
const isGoogle = userAgent(startsWith('Googlebot'))
Now you could just add a handler like:
handleFetch({
request: [addHandlerIf(isGoogle, forbiddden)],
})
- The built-in conditionals automatically apply to
current
. So if you run them as a request handler, header inspection will look at the request. As a response handler, it’ll look at response. But you can also use the raw conditionals while creating your own handlers. For instance, in a response handler you might want to look at the request that went to the server, or the originalRequest that came to Cloudflare.
import forbidden from 'workerbee'
import { hasParam } from 'workerbee/conditions'
export default async function forbiddenIfFooParam({ request }) {
if (hasParam('foo', request)) {
return forbidden()
}
}
In most cases you will not be reaching into the request from the response. A better way to handle this is to have a request handler that conditionally adds a response handler. But if you want to, you can, and you can use those "raw" conditions to help. Note that the raw conditions will not be curried, and you'll have to pass a request or response to them as their last argument.
Best Practices
- Always return a new Request or Response object if you want to change things.
- Don’t return anything if your handler is declining to act.
- If you have a response handler that is only needed based on what a request handler does, conditionally add that response handler on the fly in the request handler.
- Use partial application of built-in conditionals to make your code easier to read.
License
MIT License
Copyright © 2020–2021 Mark Jaquith
This software incorporates work covered by the following copyright and permission notices:
tsndr/cloudflare-worker-router
Copyright © 2021 Tobias Schneider
(MIT License)
pillarjs/path-to-regexp
Copyright © 2014 Blake Embrey
(MIT LICENSE)