firestore-backfire

v2.5.3

Published

a year ago

Ultimate control over backing up and restoring your Firestore data

Downloads

185

0High
0Medium
0Low

benyap

firebase firestore import export backup restore

🔥 Firestore Backfire

Ultimate control over importing and exporting data from Firestore and the Firestore Emulator, on your CLI and in your code.

This documentation is for 2.x. Find documentation for 1.x here.

✨ Features ✨

Import and export your Firestore data with ease
Specify which documents or collections are imported or exported using paths or by matching regex patterns
Control the depth of subcollections to import or export
Limit the number of documents to export
Import and export data as NDJSON to a variety of storage sources:
- local files
- Google Cloud Storage
- AWS S3
- Or implement your own data source

Installation

Install firestore-backfire and @google-cloud/firestore using your favourite package manager.

yarn add firestore-backfire @google-cloud/firestore

pnpm add firestore-backfire @google-cloud/firestore

npm install firestore-backfire @google-cloud/firestore

Peer dependencies for Google Cloud Storage

If you plan to import and export data from Google Cloud Storage, you should install:

@google-cloud/storage

Peer dependencies for AWS S3

If you plan to import and export data from S3, you should install:

@aws-sdk/client-s3

Additionally, if you want to use a credential profile to run this program, you should also install:

@aws-sdk/credential-provider-ini

Usage and examples

CLI

Firestore Backfire can be called on the CLI using backfire. The aliases bf and firestore are also provided for convenience.

If installed in your project, run it using your package manager:

yarn backfire import path-to-my-data ...

If installed globally, you can call it directly:

backfire import path-to-my-data ...

You can also use it in your package.json scripts.

// package.json
{
  "scripts": {
    "import-my-data": "backfire import path-to-my-data ..."
  }
}

CLI options

All options listed in the documentation have a CLI flag equivalent unless otherwise specified. The flag will always be -- followed by the option name. For example, the option limit can be passed on the CLI using --limit. In most cases, a shorthand may be available. Use the backfire [command] --help command to see the available options and their repsective flags.

CLI examples

Export documents...

to a file called emails.ndjson in an S3 bucket called bucket, using the AWS credentials profile named default
from a Firestore project called demo using the credentials found at key.json
from the emails and messages collection

backfire export s3://bucket/emails --awsProfile default -P demo -K key.json --paths emails messages --awsRegion us-east-1

Export documents...

to a local file called emails.ndjson in the export folder
from a Firestore project called demo using the credentials found at key.json
from the emails collection
where the document id starts with "abc" or "123"
where the document id cannot end with "xyz"

backfire export ./export/emails -P demo -K key.json --paths emails --match ^emails/abc ^emails/123 --ignore xyz$

Import documents...

from a file called emails.ndjson in a Google Cloud Storage bucket called bucket, belonging to a project with the ID gcp-demo, using a service account key file called gcp-demo.json
to the demo project in the Firestore Emulator running on port 8080
where the document belongs to a root level collection (depth of 0)
only import the first 10 documents
overwrite any existing data

backfire import gs://bucket/emails --gcpProject gcp-demo --gcpKeyFile gcp-demo.json -P demo -E localhost:8080 --depth 0 --limit 10 --mode overwrite

Node

Firestore Backfire exposes functions in Node that you can use to import and export data using a data source.

import {
  importFirestoreData,
  exportFirestoreData,
  // ...
} from "firebase-backfire";

await importFirestoreData(connection, reader, options);
await exportFirestoreData(connection, writer, options);

Options for specifying the Firestore instance to connect to can be provided through the connection parameter. The reader and writer parameters are data sources (see here for more information on how to create a data source). The options parameter allow you to configure the import/export behaviour.

Exporting data

To export data from Firestore, use the export command on the CLI, or use the exportFirestoreData function in Node. Each document is exported as per the SerializedFirestoreDocument interface as a line of NDJSON.

backfire export <path> [options]

import { exportFirestoreData } from "firestore-backfire";

await exportFirestoreData(connection, writer, options);

When using the CLI, path should point to the location where you want the data to be exported to. This can be a path to a local file, a Google Cloud Storage path (prefixed with gs://), or an S3 path (prefixed with s3://).

When using the exportFirestoreData function, the connection parameter can be an instance of Firestore, or it can be an object that specifies options for creating a connection to Firestore. The writer parameter must be an implementation of IDataSourceWriter. See the section on data sources for more information.

Options

All options have a CLI flag equivalent unless otherwise specified. Follows the ExportFirestoreDataOptions interface.

| Option | Type | Description | | ------------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | paths | string[] | Provide a list of paths where you want to export data from. This can be a collection path (e.g. emails), or a path to a document (e.g. emails/1). If not specified, all paths will be exported, starting from the root collections. | | match | RegExp[] | Provide a list of regex patterns that a document path must match to be exported. | | ignore | RegExp[] | Provide a list of regex patterns that prevent a document from being exported when its path matches any of the patterns. Takes precendence over match. | | depth | number | Limit the subcollection depth to export documents from. Documents in the root collection have a depth of 0. If not specified, no limit is applied. | | limit | number | Limit the number of documents to export. If not specified, no limit is applied. | | overwrite | boolean | Overwrite any existing data at the output path. Defaults to false. | | update | number | The interval (in seconds) at which update logs are printed. Update logs are at the debug level. Defaults to 5. | | exploreInterval* | number | The interval (in milliseconds) at which chunks of paths are dequeued for exploration using Firestore SDK's listDocuments() or listCollections() methods. Defaults to 10. | | exploreChunkSize* | number | The chunk size to use when dequeuing paths for exploration. Defaults to 5000. | | downloadInterval* | number | The interval (in milliseconds) at which chunks of document paths are dequeued to be filtered and downloaded from Firestore. Defaults to 1000. | | downloadChunkSize* | number | The chunk size to use when dequeueing paths for download. Defaults to limit if supplied, otherwise it dequeues all available paths. |

* Advanced configuration - default values should be suitable for most use cases. Considered internal, so may change as implementation changes.

Logging options

By default, only log messages at the info level and above are printed. Follows the LoggingOptions interface.

| Option | Type | Description | | ------- | --------- | ------------------------------------------------------- | | debug | boolean | Print debug level logs and higher. | | verbose | boolean | Print verbose level logs and higher. Overrides debug. | | quiet | boolean | Silence all logs. Overrides debug and verbose. |

Importing data

To import data into Firestore, use the import command on the CLI, or use the importFirestoreData function in Node. The data being imported is expected to be in NDJSON format, where each line follows the SerializedFirestoreDocument interface.

backfire import <path> [options]

import { importFirestoreData } from "firestore-backfire";

await importFirestoreData(connection, reader, options);

When using the CLI, path should point to the location where you want the data to be imported from. This can be a path to a local file, a Google Cloud Storage path (prefixed with gs://), or an S3 path (prefixed with s3://).

When using the importFirestoreData function, the connection parameter can be an instance of Firestore, or it can be an object that specifies options for creating a connection to Firestore. The reader parameter must be an implementation of IDataSourceReader. See the section on data sources for more information.

⚠️ NOTE: When using the Firestore Emulator, importing a large amount of data can result in errors as the emulator is not designed to scale.

Options

All options have a CLI flag equivalent unless otherwise specified. Follows the ImportFirestoreDataOptions interface.

| Option | Type | Description | | ----------------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | paths | string[] | Provide a list of paths where you want to import data from. This can be a collection path (e.g. emails), or a path to a document (e.g. emails/1). If not specified, all paths will be imported. | | match | RegExp[] | Provide a list of regex patterns that a document path must match to be imported. | | ignore | RegExp[] | Provide a list of regex patterns that prevent a document from being imported if its path matches any of the patterns. Takes precendence over match. | | depth | number | Limit the subcollection depth to import documents from. Documents in the root collection have a depth of 0. If not specified, no limit is applied. | | limit | number | Limit the number of documents to import. If not specified, no limit is applied. | | mode | "create" "insert" "overwrite" "merge" | Specify how to handle importing documents that would overwrite existing data. See the import mode section for more information. Defaults to create. | | update | number | The interval (in seconds) at which update logs are printed. Update logs are at the debug level. Defaults to 5. | | flush* | number | The interval (in seconds) at which documents are flushed to Firestore. Defaults to 1. | | processInterval* | number | The interval (in milliseconds) at which documents are processed as they stream in from the data source. Defaults to 10. | | processLimit* | number | The maximum number of pending writes to Firestore. Defaults to 200. |

* Advanced configuration - default values should be suitable for most use cases. Considered internal, so may change as implementation changes.

Import mode

The mode option specifies how to handle importing documents that would overwrite existing data in Firestore. The default import mode is create.

create mode will log an error when impporting documents that already exist in Firestore, and existing documents will not be modified.
insert mode will only import documents that do not exist, and existing documents will not be modified.
overwrite mode will import documents that do not exist, and completely overwrite any existing documents.
merge mode will import documents that do not exist, and merge existing documents.

Logging options

By default, only log messages at the info level and above are printed. Follows the LoggingOptions interface.

Get document

Have you ever wanted to quickly inspect or export a document as JSON from Firestore? This CLI command can help you do just that. path should be a valid Firestore document path. Prints the document as pretty JSON.

Also ensure you provide appropriate options for connecting to Firestore.

backfire get <path> [options]

Options

All options have a CLI flag equivalent unless otherwise specified. Follows the GetFirestoreDataOptions interface.

| Option | Type | Description | | --------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------- | | stringify | boolean or number | JSON.stringify() the output. Pass true to use the default indent of 2, or pass a number to specify the indent amount. |

List documents and collections

List the document IDs or collection IDs at the specified path.

Also ensure you provide appropriate options for connecting to Firestore.

backfire list documents <path> [options]

backfire list collections [path] [options]

When listing collections, you may leave path empty to list root collections, or pass a valid Firestore document path to list its subcollections.

Options

All options have a CLI flag equivalent unless otherwise specified. Follows the ListFirestoreDataOptions interface.

| Option | Type | Description | | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | limit | number | Limit the number of documents/collections to return. Note that this does not "truly" limit the API call, it only truncates the output after the data is received from Firebase. |

Count documents and collections

Count the number of the documents in a collection, or the number of collections at the specified path.

Also ensure you provide appropriate options for connecting to Firestore.

backfire count documents <path> [options]

backfire count collections [path] [options]

When counting collections, you may leave path empty to count root collections, or pass a valid Firestore document path to count its subcollections.

Connecting to Firestore

In order to read and write data to Firestore, you will need to specify some options for the connection. Follows the FirestoreConnectionOptions interface.

| Option | Type | Description | | ------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | project | string | The ID of the Firestore project to connect to. | | adc | boolean | Use Application Default Credentials. | | keyFile | string | The path to a service account's private key JSON file. Takes precedence over adc. | | emulator | string or boolean | Connect to a local Firestore emulator. Defaults to localhost:8080. Pass a string value to specify a different host. Takes precedence over adc and keyFile. | | credentials* | object | Service account credentials. Fields client_email and private_key are expected. Takes precedence over adc, keyFile and emulator. |

* Not available in the CLI.

The project option is always required
To connect to a real Firestore instance, you must specify adc or keyFile, or pass a credentials object (Node only)
If you are connecting to a local Firestore emulator, you can use the emulator option

As an alternative, these options can also be provided through a configuration file or as environment variables. Note that CLI options will always take precendence over environment variables.

GOOGLE_CLOUD_PROJECT can be used to provide project
GOOGLE_APPLICATION_CREDENTIALS can be used to provide keyFile
FIRESTORE_EMULATOR_HOST can be used to provide emulator

In Node, you can also pass an existing instance of Firestore instead of providing connection options.

Data sources

A data source provides a way to to read and write data to an external location. This pacakge comes with a few implementations, and exports interfaces for you to implement your own ones in Node if the provided implementations do not suit your needs.

Local

This data source reads and writes data as local files to your machine. To use this data source on the CLI, specify a path that points to a valid file path (note that this is different from v1). If the path is in a directory that does not exist, it will be created for you.

No other configuration options are required.

Google Cloud Storage

This data source reads and writes data from a Google Cloud Storage bucket. To use this data source on the CLI, specify a path beginning with gs://.

Credentials for reading and writing to the Google Cloud Storage bucket must also be provided as CLI options or through a configuration file.

| Option | Type | Description | | ---------------- | --------- | ----------------------------------------------------------------------------------------------------------------------------------- | | gcpProject | string | The Google Cloud project the bucket belongs to. | | gcpAdc | boolean | Use Application Default Credentials. | | gcpKeyFile | string | Path to the service account credentials file to use. Takes precedence over gcpAdc. | | gcpCredentials* | object | Service account credentials. Fields client_email and private_key are expected. Takes precedence over gcpAdc and gcpKeyFile. |

* Not available in the CLI.

The gcpProject option is always required
You must specify gcpAdc or gcpKeyFile, or pass a gcpCredentials object (Node only)

Alternatively, these values can also be provided through the corresponding environment variables:

GOOGLE_CLOUD_PROJECT can be used to provide gcpProject
GOOGLE_APPLICATION_CREDENTIALS can be used to provide gcpKeyFile

IMPORTANT: These environment variables are also used by Firestore connection options. If you need to use different credentials for connecting to Firestore and accessing Google Cloud Storage, you can override the environment variables by passing them as CLI options or through a configuration file.

AWS S3

This data source reads and writes data from an S3 bucket. To use this data source on the CLI, specify a path beginning with s3://.

Credentials for reading and writing to the S3 bucket must also be provided as CLI options or through a configuration file.

| Option | Type | Description | | ------------------ | -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | awsRegion | string | The AWS region to use. | | awsProfile | string | The name of the profile to use from your local AWS credentials. Requires @aws-sdk/credential-provider-ini to be installed. | | awsAccessKeyId | string | The access key id to use. This takes precendence over the awsProfile option, which means that if you provide awsProfile as well as access keys, the access keys will be used. | | awsSecretAccessKey | string | The secret access key to use. This takes precendence over the awsProfile option, which means that if you provide awsProfile as well as access keys, the access keys will be used. |

The awsRegion option is always required
You can choose to use either awsProfile, or awsAcecssKeyId and awsSecretAccessKey

Alternatively, these values can also be provided through the corresponding environment variables:

AWS_REGION can be used to provide awsRegion
AWS_PROFILE can be used to provide awsProfile
AWS_ACCESS_KEY_ID can be used to provide awsAccessKeyId
AWS_SECRET_ACCESS_KEY can be used to provide awsSecretAccessKey

Creating a data source in Node

All provided data source implementations are registered in a default instance of DataSourceFactory, which is exposed to you in Node. You can create a reader or writer implementation directly from the factory by calling the createReader() or createWriter() method.

The factory will automatically select the data source to create based on the path it was given. The default implementation will fall back to using the local data source if the path does not match any other data sources.

// Import the default factory instance
import {
  dataSourceFactory,
  importFirestoreData,
  exportFirestoreData,
} from "firestore-backfire";

// Create the reader and writer.
// The `options` object should provide the credentials to
// connect to data sources if required (such as GCS or S3).
const path = "s3://my-bucket/exported-data.ndjson";
const reader = await dataSourceFactory.createReader(path, options);
const writer = await dataSourceFactory.createWriter(path, options);

// Use the reader and writer
await importFirestoreData(connection, reader, options);
await exportFirestoreData(connection, writer, options);

Custom data sources

There are two types of data sources: readers and writers. A reader reads text data from a stream, whilst a writer writes lines of text data to a stream.

A data source does not need to provide both a reader and a writer, but obviously if a reader is not provided, you cannot import data, and if a writer is not provided, you cannot export data.

To create a data source and make it useable with Firestore Backfire, follow these steps:

Create at least one of the following:
- A class that implements the IDataSourceReader interface
- A class that implements the IDataSourceWriter interface
Construct a IDataSource object, in which you should define:
- A unique id for the data source
- A match function, which takes a path parameter and returns true if the path can be used with this data source
- A reader property, which can your IDataSourceReader class directly, or provide a function that will create an instance of it (this can be left empty if you do not want to import data)
- A writer property, which can your IDataSourceWriter class directly, or provide a function that will create an instance of it (this can be left empty if you do not want to export data)
Register the data source using the register() method on the default DataSourceFactory instance (exposed as dataSourceFactory)

Once your data source has been registered, you can use the createReader() or createWriter() methods on the default DataSourceFactory instance to construct your data source.

Alternatively, you can instantiate your custom data source yourself and pass it directly to the importFirestoreData or exportFirestoreData if you do not need to support different path types or use the default implementations.

Implementation example

You can always take a look at how the provided implementations are written by looking at the source code, and seeing how they are registered. Below is a basic example as a reference.

import {
  IDataSourceReader,
  IDataSourceWriter,
  dataSourceFactory,
} from "firestore-backfire";

// First define your custom implementations

class MyDataReader implements IDataSourceReader {
  // ...
}

class MyDataWriter implements IDataSourceReader {
  // ...
}

// You might want to define some custom options to use
// with your data source, such as credentials
interface MyCustomOptions {
  username?: string;
  password?: string;
}

// Then register them with the data source factory

dataSourceFactory.register<MyCustomOptions>({
  id: "custom",
  // Use this data source with any paths starting with "custom://"
  match: (path) => path.startsWith("custom://"),
  // You can tell the factory to use the class directly, which will
  // pass the `path` and `options` object to the constructor
  reader: { useClass: MyDataReader },
  // You can also tell the factory to call a function to create
  // the class, which is useful for processing options that are passed
  writer: {
    useFactory: async (path, options) => {
      // E.g. check that the required options are present
      if (!options.username) throw new Error("username is required");
      if (!options.password) throw new Error("password is required");
      // If everything is good, return an instance of your class
      return new MyDataWriter(path, options.username, options.password);
    },
  },
});

// Then create the data source using the factory
const path = "custom://my-data";
const reader = await dataSourceFactory.getReader<MyCustomOptions>(path, {
  username: "...",
  password: "...",
});

Configuration file

Instead of providing options on the CLI, you can also set defaults through a configuration file. You can use the flag --config <path> to point to a specific file to use as configuration. Note that CLI options will always override options provided through a configuration file.

IMPORTANT: Do not to commit any secrets in your config file to version control.

The configuration file is loaded using cosmiconfig, which supports a wide range of configuration file formats. Some examples of supported formats:

.backfirerc.json
.backfirerc.yaml
.backfirerc.js
backfire.config.js

Sample YAML config:

project: demo-project
keyFile: ./service-account.json
emulator: localhost:8080
paths:
  - emails
match:
  - ^emails/123
ignore:
  - xyz$
depth: 2

Sample JSON config:

{
  "project": "demo-project",
  "keyFile": "./service-account.json",
  "emulator": "localhost:8080",
  "paths": ["emails"],
  "match": ["^emails/123"],
  "ignore": ["xyz$"],
  "depth": 2
}

Migration

1.x to 2.x

Firestore Backfire v2 is a rewrite of v1 to provide a more up to date and extensible design. It provides new and improved functionality, uses NDJSON as the data format, and no longer uses worker threads.

Breaking changes

-p has been renamed to -P
-k has been renamed to -K
-e has been renamed to -E
--patterns has been renamed to --match
--workers has been removed as worker threads are no longer used
--logLevel has been removed, use --verbose, --debug or --silent instead
--prettify has been renamed to --stringify
--force has been renamed to --overwrite
--mode values have changed to "create", "insert", "overwrite", "merge"
Import and export file format changed to NDJSON (not backward compatible)

New features

New options:
- ignore (--ignore, -i) to ignore paths
- limit (--limit, -l) to limit number of documents imported/exported
- update (--update) to specify the frequency of update messages
- A few more advanced configuration options
New commands:
- backfire get <path> to get a document from Firestore
- backfire list:documents <path> to list documents in a collection
- backfire list:collections [path] to list root collections or subcollections
Support for passing some options as environment variables
- GOOGLE_CLOUD_PROJECT
- GOOGLE_APPLICATION_CREDENTIALS
- AWS_PROFILE
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_REGION
Ability to create custom data sources in Node
Ability to use an existing Firestore instance in Node

Contributing

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome! Please follow the contributing guidelines.

Changelog

Please see CHANGELOG.md.

License

Please see LICENSE.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

🔥 Firestore Backfire

✨ Features ✨

Table of contents

Installation

Peer dependencies for Google Cloud Storage

Peer dependencies for AWS S3

Usage and examples

CLI

CLI options

CLI examples

Node

Exporting data

Options

Logging options

Importing data

Options

Import mode

Logging options

Get document

Options

List documents and collections

Options

Count documents and collections

Connecting to Firestore

Data sources

Local

Google Cloud Storage

AWS S3

Creating a data source in Node

Custom data sources

Implementation example

Configuration file

Migration

1.x to 2.x

Contributing

Changelog

License