strobbery

v0.1.0

Published

a year ago

An SPA capture and archival application

Downloads

0High
0Medium
0Low

b3nsn0w

hand over your source code
this is a strobbery

strobbery

a simple runtime to download single-page web apps and execute them locally

usage

to create a strobbery archive, you need to first write a descriptor file in yaml for the site you intend to capture. for the sake of the example, let's download https://gsplat.tech, a cool tech demo for a 3D rendering technology that would be nice to preserve:

strobbery: 1 # required to indicate the file signature

entryPoint: https://gsplat.tech # the url at which we enter the app
originGuard:
  resource: # all origins we'd like to capture. you'll have to dig into the network tab for these
    - https://gsplat.tech
    - https://kit.fontawesome.com
    - https://cdn.jsdelivr.net
    - https://ka-f.fontawesome.com
    - https://f005.backblazeb2.com
  navigable: # all origins you want to see in the address bar, usually a very short list
    - https://gsplat.tech
  blacklisted: # all origins that should never be loaded. useful for removing trackers
    - https://getinsights.io

allowNetworkFallback: false # whether resources outside the resource origin list can be loaded
networkFallbackMode: whitelist # controls whether unknown origins can be loaded
continuousCapture: false # whether we'd like the fallback responses to be captured as well

allowOutlinks: true # whether we allow the app to open links (these will open in the default browser)

# if the app you're capturing is a client to something, this allows for undisturbed api access
allowApiHosts: false
apiHosts:
  - https://api.example.com

then simply open strobbery, add this as a new capture (file -> new capture, you'll need to select the yaml descriptor file you created, and capture mode will be enabled by default), and navigate through the site. strobbery will capture all the documents in the background, and when you press "save as", it creates an archive of all of them.

if you already have an archive, you can go to file -> open and open the archive. strobbery will load the settings and the captured data from the file and you can browse the site as if it were live.

if you ever need to extend a capture because you realized you haven't fully explored a site in capture mode, you can continue your work by

loading up the file (file -> open)
switching to capture mode (runtime -> capture mode)
refreshing the site (runtime -> refresh site)

whenever you refresh or reload, strobbery clears all site data in-browser, so the site will behave as if you were visiting it for the first time. if this creates issues for you, report a bug -- handling of local site data is currently in the ideation phase.

if you'd like to test out your capture, just save it (file -> save as) and reload from disk (runtime -> reload from disk). this will turn capture mode off and load the site from the archive. if you have it configured with network fallback, consider turning it off and refreshing the site to see how it behaves without network access.

file format

strobbery files are just fancy zip files. if you'd like to look inside, just open the file in your favorite archive manager. the file structure is as follows:

strobbery.yaml: the descriptor file, as discussed above. it is copied into the strobbery archive verbatim, so if you'd like to make any changes, you can do it here.
captured/: a folder with the captured files. the autogenerated folder structure follows the file structure online, so for example https://example.com/dog.jpg would be found under captured/example.com/dog.jpg. this is done to help make the file structure navigable, but it holds no semantic meaning, files are matched according to their url, not their location in the folder structure.
- captured/{file} holds the content of the file. this is a raw binary, so if the file is something that would normally make sense (like a jpeg image) you can export it directly.
- captured/{file}.strb.yaml is the per-file metadata descriptor. this holds data like original headers and url at the time of capture.
continuous/: a folder with files captured during continuous capture. the structure of this folder is identical to captured/, with the main distinction being the lower importance of continuous capture -- this allows the "clear continuous caputre" setting to work between sessions.

here's an example for the per-file metadata descriptor:

url: https://gsplat.tech/5cbcc55e748139370334.jpg
headers:
  accept-ranges: bytes
  access-control-allow-origin: "*"
  # ...

internally, strobbery takes these and replays captured requests to the browser, keeping the request body and headers intact. matching is done based on the url field of the descriptor.

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

strobbery

v0.1.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

strobbery

usage

file format