@garethnunns/cli-content-tracker
v0.8.0
Published
CLI File Content Tracking
Downloads
15
Readme
CLI Content Tracker to AirTable
This project came about as a means to track video content on concert shows, but is written generically. It recursively scans folders, which can be mapped network drives or streamed cloud storage like Dropbox, Drive, LucidLink or Suite and stores a list of the files & folders on AirTable.
Quick Start
npx @garethnunns/cli-content-tracker
Then you just need to edit the config.json
file that was created where the command was executed and re-run with the config command line option tracker -c
.
Will require Node to be installed, if on Mac probably install with Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install node
AirTable Setup
You can duplicate this base or create your own detailed below. See the section on configuring the settings for AirTable on how to link this to the script.
Manual Base Creation
You'll need a base with the two following tables for folders & files.
Folders Table
| Field | Description | Type
| --- | --- | ---
| _path
| Unix path to this folder in the project | Single line text
| _fullPath
| Will store the full path of the folder | Single line text
| _size
| Size of the folder in bytes | Number - 0 decimal places
| _ctime
| Creation time of the folder | Date - include time, Use same time for all collaborators
| _mtime
| Last modified time of the folder | Date - include time, Use same time for all collaborators
| _items
| Number of items in the folder | Number - 0 decimal places
| _parent
| Parent folder of this folder | Link to record - this table
Files Table
Only use this smaller table when the mediaMetadata setting is disabled.
| Field | Description | Type
| --- | --- | ---
| _path
| Unix path to this folder in the project | Single line text
| _fullPath
| Will store the full path of the folder | Single line text
| _size
| Size of the file in bytes | Number - 0 decimal places
| _ctime
| Creation time of the file | Date - include time, Use same time for all collaborators
| _mtime
| Last modified time of the file | Date - include time, Use same time for all collaborators
| _parent
| Parent folder of this file | Link to record - folders table
File Table with Metadata
All the fields of the in the files table plus:
| Field | Description | Type
| --- | --- | ---
| _duration
| Duration of the media | Number - 2 decimal places
| _video
| Whether the media has a video stream | checkbox
| _videoStill
| Whether the file is a image | checkbox
| _videoCodec
| Video codec of the media | Single line text
| _videoWidth
| Width of the media | Number - 0 decimal places
| _videoHeight
| Height of the media | Number - 0 decimal places
| _videoFormat
| Pixel format of the media | Single line text
| _videoAlpha
| Whether the media has an alpha channel | checkbox
| _videoFPS
| Frames Per Second of the video, 0 for stills| Number - 2 decimal places
| _videoBitRate
| Bit rate of the video (b) | Number - 0 decimal places
| _audio
| Whether the media has a audio stream | checkbox
| _audioCodec
| Audio codec of the media | Single line text
| _audioSampleRate
| Sample rate of the audio (KHz) | Number - 0 decimal places
| _audioChannels
| Number of channels of audio | Number - 0 decimal places
| _audioBitRate
| Bit rate of the audio (b) | Number - 0 decimal places
Field Notes
The fields are named like this so it's clear which fields are entered via the tracker, you can easily do other fields that reference these values, but do not edit the values in these columns or they will just get overwritten/deleted.
If all the records are being updated every time, it is likely because of a mismatch on the created/modified time - ensure the Use same time for all collaborators option is selected on these fields and you can try setting the timezone to match the computer which is running the script. Despite sending them as UTC strings, AirTable is a bit funky on how it handles the dates.
Command Line Options
Get the latest options by running tracker -h
which will return something like:
Usage: tracker [options]
CLI File Content Tracking
Options:
-V, --version output the version number
-c, --config <path> config file path (if none specified template will be created)
-d, --dry-run will do everything apart from updating AirTable
-l, --logging <level> set the logging level
-nd, --no-delete never remove records from AirTable
-w, --wipe-cache clear the metadata cache
-h, --help display help for command
Config Option: -c, --config
When you run tracker
for the first time without a path option it will generate the config.json
file which you will then need to update. Once updated run the command again with tracker -c config.json
, for more info on the contents of the config file, check the section on config files.
Dry-run Option: -d, --dry-run
Inherently, it's nice to know this isn't going to wreak havoc on your AirTable, so if you run tracker -c config.json -d
it will show you what it's going to do but stop short of modifying the AirTable. It will still run on at the frequency you specify. You will need to run with a (#logging-option--l---logging) logging level of verbose to see all the details.
Logging Option: -l, --logging
Specify one of the following levels of logging:
- error
- warn
- info
- http (default)
- verbose
- debug
- silly
No Confirm Option: -nc, --no-confirm
By default the script will stop and confirm before it removes more than 10% of the table (if you don't respond in under a minute then it will assume not), but when run with -nc
it will always delete the records in AirTable.
No Delete Option: -nd, --no-delete
This will perform still insert and update records in AirTable, however will not remove them if they have been deleted in the local file system - handy if you are freeing up space on your local disk by deleting media but still want the reference in AirTable.
Wipe Cache: -w, --wipe-cache
There's a local cache built up in ./db
of the metadata of files so they don't have to keep getting queried via FFMpeg which is a tad computationally expensive, but if you need to rescan all files run with this option. Files will already be automatically scanned if they change size or the modified time changed.
Config File
Create the default config as described above, which will generate something like this:
{
"settings": {
"files": {
"rootPath": "/Users/user/Documents/cli-content-tracker",
"dirs": [
"/Users/user/Documents/cli-content-tracker"
],
"frequency": 30,
"rules": {
"dirs": {
"includes": [],
"excludes": []
},
"files": {
"includes": [],
"excludes": [
"/\\.DS_Store$/"
]
}
},
"mediaMetadata": true,
"limitToFirstFile": false,
"concurrency": 100
},
"airtable": {
"api": "",
"base": "",
"foldersID": "",
"filesID": "",
"view": "API"
}
}
}
In general, leave all the parameters in the JSON file, there is some error handling if they're not present but probably for the best to leave everything in there.
config.settings
config.settings.files
This section relates to all the local file scanning. The script in general builds up a list of all the files and folders and here you get a bit of control over that.
config.settings.files.rootPath
In an effort to make the script less dependent on exactly where the files are stored on your computer/where the folder is mounted, this string will be removed from the start of file paths.
"rootPath": "/Volumes/Suite/Project"
config.settings.files.dirs
Array of irectory to recursively search through, e.g.
"dirs": [ "/Volumes/Suite/Project/Item 1", "/Volumes/Suite/Project/Item 2" ]
config.settings.files.frequency
How often the directory is scanned (in seconds), e.g. if you wanted to scan it every minute:
"frequency": 60
You can also set this to 0
and the script will only run once - this is intended if you want to automate this as part of a cron job.
config.settings.files.rules
So I'll be honest, this is the only slightly faffy bit... but it definitely beats entering them as command-line arguments that was my first plan. The thought process here is you're filtering the paths of the files and folders will get included in the tables which are pushed to AirTable.
Whatever you specify in these fields the script will still have to traverse all of the folders in the directory - if you have specified a pattern like /.*\/Delivery\/.*/
which would match any folder with /Delivery/
in the path, by the nature of the task you're still going to have to search through every folder.
Now the bit that makes it faffy is you have to stringify JS regex patterns, which usually just means escaping the slashes - a handy one to make use of the dry run option. Note you're matching the entire path of the file/folder in both dirs
& files
.
For example, in the example below we're limiting the folders which are stored on AirTable to only be ones that include /05 Delivery/
somewhere in the path, then only including specific image sequence TIFFs:
"rules": {
"dirs": {
"includes": [
"/.*\/05 Delivery\/.*/"
],
"excludes": []
},
"files": {
"includes": [
"/[_.]v\\d{2,3}\\.tif/",
"/[_.]0{4,5}\\.tif/"
],
"excludes": []
}
}
config.settings.files.mediaMetadata
Whether you want to get all the metadata for the media, which will scrape all the fields in the (files with metadata table)[#file-table-with-metadata], e.g.
"mediaMetadata": true
config.settings.files.limitToFirstFile
This works in conjunction with the file rules but limits it to only the first file in each folder, with the intention of finding the first image in sequences, e.g. this just gets the first TIFF file in the deliveries folder:
"rules": {
"dirs": {
"includes": [],
"excludes": []
},
"files": {
"includes": [
"/05_Delivery/.*\\.tif/"
],
"excludes": []
}
},
"mediaMetadata": true,
"limitToFirstFile": true
config.settings.files.concurrency
This limits the number of concurrent file operations, introduced to throttle the load on Suite. If you have a computer with enough RAM and the files are static then this value can be larger, however if it's a weak computer and the files are being streamed you might want to limit this to < 10, e.g.
"concurrency": 1
config.settings.airtable
Once you've setup your AirTable, please configure the following settings:
config.settings.airtable.api
Get your API key from the AirTable Tokens page, it will need the following permissions:
- data.records:read
- data.records:write
Plus access to the workspace where your base is located.
config.settings.airtable.base
You'll need to go to the API page for your base and get the base ID and enter this here, e.g.
"base": "app**************"
config.settings.airtable.foldersID / filesID
Technically you can just put the folders table name here, but on the same page as the base ID you can get the IDs for the tables, then if the table names get updated later it won't affect the script, e.g.
"foldersID": "tbl**************",
"filesID": "tbl**************"
config.settings.airtable.view
This is the view the script compares the local file list against - e.g. you could technically store other items in the table and filter them out in this view; you could have multiple scripts all writing into the same table and filter them out per view (this might be better achieved by writing to multiple tables).
This defaults to a view called API
:
"view": "API"
Development
Clone the repo and run it locally like so:
git clone https://github.com/garethnunns/cli-content-tracker.git
cd cli-content-tracker
npm install
npm link
tracker
You can always npm unlink
this later.
Very welcome to pull requests!