strobbery
v0.1.0
Published
An SPA capture and archival application
Downloads
1
Readme
hand over your source code
this is a strobbery
strobbery
a simple runtime to download single-page web apps and execute them locally
usage
to create a strobbery archive, you need to first write a descriptor file in yaml for the site you intend to capture. for the sake of the example, let's download https://gsplat.tech, a cool tech demo for a 3D rendering technology that would be nice to preserve:
strobbery: 1 # required to indicate the file signature
entryPoint: https://gsplat.tech # the url at which we enter the app
originGuard:
resource: # all origins we'd like to capture. you'll have to dig into the network tab for these
- https://gsplat.tech
- https://kit.fontawesome.com
- https://cdn.jsdelivr.net
- https://ka-f.fontawesome.com
- https://f005.backblazeb2.com
navigable: # all origins you want to see in the address bar, usually a very short list
- https://gsplat.tech
blacklisted: # all origins that should never be loaded. useful for removing trackers
- https://getinsights.io
allowNetworkFallback: false # whether resources outside the resource origin list can be loaded
networkFallbackMode: whitelist # controls whether unknown origins can be loaded
continuousCapture: false # whether we'd like the fallback responses to be captured as well
allowOutlinks: true # whether we allow the app to open links (these will open in the default browser)
# if the app you're capturing is a client to something, this allows for undisturbed api access
allowApiHosts: false
apiHosts:
- https://api.example.com
then simply open strobbery, add this as a new capture (file -> new capture, you'll need to select the yaml descriptor file you created, and capture mode will be enabled by default), and navigate through the site. strobbery will capture all the documents in the background, and when you press "save as", it creates an archive of all of them.
if you already have an archive, you can go to file -> open and open the archive. strobbery will load the settings and the captured data from the file and you can browse the site as if it were live.
if you ever need to extend a capture because you realized you haven't fully explored a site in capture mode, you can continue your work by
- loading up the file (file -> open)
- switching to capture mode (runtime -> capture mode)
- refreshing the site (runtime -> refresh site)
whenever you refresh or reload, strobbery clears all site data in-browser, so the site will behave as if you were visiting it for the first time. if this creates issues for you, report a bug -- handling of local site data is currently in the ideation phase.
if you'd like to test out your capture, just save it (file -> save as) and reload from disk (runtime -> reload from disk). this will turn capture mode off and load the site from the archive. if you have it configured with network fallback, consider turning it off and refreshing the site to see how it behaves without network access.
file format
strobbery files are just fancy zip files. if you'd like to look inside, just open the file in your favorite archive manager. the file structure is as follows:
strobbery.yaml
: the descriptor file, as discussed above. it is copied into the strobbery archive verbatim, so if you'd like to make any changes, you can do it here.captured/
: a folder with the captured files. the autogenerated folder structure follows the file structure online, so for examplehttps://example.com/dog.jpg
would be found undercaptured/example.com/dog.jpg
. this is done to help make the file structure navigable, but it holds no semantic meaning, files are matched according to their url, not their location in the folder structure.captured/{file}
holds the content of the file. this is a raw binary, so if the file is something that would normally make sense (like a jpeg image) you can export it directly.captured/{file}.strb.yaml
is the per-file metadata descriptor. this holds data like original headers and url at the time of capture.
continuous/
: a folder with files captured during continuous capture. the structure of this folder is identical tocaptured/
, with the main distinction being the lower importance of continuous capture -- this allows the "clear continuous caputre" setting to work between sessions.
here's an example for the per-file metadata descriptor:
url: https://gsplat.tech/5cbcc55e748139370334.jpg
headers:
accept-ranges: bytes
access-control-allow-origin: "*"
# ...
internally, strobbery takes these and replays captured requests to the browser, keeping the request body and headers intact. matching is done based on the url
field of the descriptor.