copious

v0.2.0

Published

3 years ago

A framework to provide some guidance for seeding data in developer environments

Downloads

331

0High
0Medium
0Low

acuthbert

Copious

Copious provides a way to make seeding data in a project a better experience for developers.

Why Seeded Data?

There are many reasons for choosing to seed data in your project. Here are the main ones:

Privacy. Real data should stay on your production servers, not be downloaded into developer environments.
Quality. Fake data entered by developers and testers is often of low quality. Good seeded data should look and feel much more realistic
Consistency. Every developer on the team can have the same data without sharing snapshots around.
Up story. New developers on a project can get up and running on a project without requiring and data snapshots.

Scenario Data Seeding

It's easy to make some scripts to fill your tables with some random data, but this has a number of problems:

Data Integrity. It's rare that random data will respect the rules of your application by simply flooding tables with data
Referential Integrity. It's unlikely foreign keys will be properly respected
Scale. The number of objects will not match real world ratios. Too many of one object, not enough of another. Or you might have too many objects - how many is enough?

Instead Copious introduces scenario based data seeding. It has a simple maxim:

Every piece of seeded data should exist because of a scenario

What is a scenario?

Scenarios create data required to complete a key journey in the software. It exposes the functionality by setting up states ready for the user to join and complete key actions. It saves time as testers don't need to create those states themselves.

Scenarios can be useful to a number of people on the team.

QA : Testers will be able to join complex journeys at key places without the tedium of setting up the relevant state. QA may also setup scenarios for automated testing tools in order to minimise test run times

PMs : Project Managers will also be interested in testing key journeys. They may also request scenarios configured in order to complete demos with a client.

Designers : Designers often need to assist on components or screens in the middle or at the end of complex journeys. Scenarios can get them there immediately without wasting time exploring how to do it.

Developers : Developers often need to reset data in order to retry modifications to code. Scenario seeding makes this fast and easy.

In practice, the factory objects used for seeding also become useful when writing unit and integration tests.

The golden rules

In our experience, there are some golden rules which if observed by the whole team it ensures everyone gets the most value from data seeding.

1. Naming: Scenarios name a feature (or sub feature).

The name of a scenario should describe not 'what' it creates, by 'why' it creates it. Why is this scenario important and to whom?

2. Descriptive: Scenarios should describe what they've created

The scenario body should detail to the console the various objects and journeys it has configured. Logins should be detailed, URLs presented etc. This should be done with regard for presentation - tabs, bold etc. to make it as readable as possible.

The output of a scenario should give the user everything they need to make their contribution.

3. Idempotent: Scenarios should not 'spam' the database

If you run a seeder twice in succession it should not create double the records. It should find and reset existing records, and only add if missing. This is extremely important as seeding is designed to be run repeatedly.

4. Complete: Scenarios should be independent of one another

One scenario should not depend upon another scenario, or indeed upon the order of execution. Users should be able to run all scenarios, or just one scenario and expect it to work.

5. Isolated: Don't delete or update other records

Scenarios should only modify data for the records it has created. It should not empty tables etc. In theory, you should be able to run another person's scenario at any point without it damaging your own data.

6. Composed: Built from shared units

Keep things DRY by using Factories to make objects from Recipes.

Using Copious

First, add Copious to your project:

npm install copious --save-dev

Create a Suite object and add some scenarios to it:

// my-seeding-suite.js

const suite = new Suite();
suite.addScenario("bookings", "Cancel Booking", (describe, faker) => {
  // make some booking and get it into a state for cancelling
  describe("A booking exists {insert url} ready to be cancelled.");
});

This scenario is setting up a journey for testing around the cancellation of a booking.

addScenario takes 3 arguments.

A scope - a simple string that groups related scenarios together. This lets you run all the scenarios from a single scope.
A name - another simple string that describes the intention of the scenario.
A callback that accepts an instance of faker and another callback (called describe) which can be used to output messages to the console.

To run the seeder we need to either drop a copious.json file into our current folder, or pass a switch to load our suite file.

copious.json

This file is used to provide defaults for copious, like the location of our default suite.

{
  "suite": "./my-seeding-suite"
}

Command line switches

Copious takes two possible arguments:

--config, -c : path to copious.json if not the current working directory

--suite, -s : path to the default suite JS file if not using a copious.json file.

Running Copious

Invoke Copious either by just calling copious if installed globally, or you can use npx:

# If installed globally:
copious

# Otherwise:
npx copious

Composing seeders

You don't need to have a single suite, you can have several. You must however compose these into a single suite to pass to Copious. You can use addSuite to do this:

// my-seeding-suite.js
import { bookingSuite } from "bookings/seeders/booking-suite";
import { orderSuite } from "orders/seeders/order-suite";

const suite = new Suite();

suite.addSuite(bookingSuite);
suite.addSuite(orderSuite);

export default suite;

Factories and Recipes

Making objects should be the role of a Factory. A Factory often has a single method that takes a Recipe and returns an instance of the object. Nearly all the permutations and complexity of the object is described by the Recipe. Sometimes however a Factory may have more than one method if it provides a description and handy shortcut.

e.g.

createCustomer(customerRecipe)
createIndividualCustomer(customerRecipe)
createCorporateCustomer(customerRecipe)

Factory objects should extend the Factory base class.

A Recipe object is a simple object enclosing all the values, flags and switches that control the objects returned by a Factory.

An overarching principle is that everything in a recipe, including the recipe itself, should be optional. All missing values should be randomised using Faker.

Recipe objects should provide setters to set those values and use a fluent pattern so that those calls can be chained. This allows for inline expressions that are very easy to read. For example:

const customer = new CustomerFactory().createCustomer(
  new CustomerRecipe()
    .withName("John", "Smith")
    .inCategory("Prospects")
    .havingOrders(3)
);

This clear creates a customer called John Smith in a category called 'Prospects' having 3 orders.

Missing Values

As mentioned above it's important that all values should be optional.

Here's how the createCustomer method might start:

class CustomerFactory extends Factory {
  public createCustomer(recipe?: CustomerRecipe) {
    if (!recipe) {
      // First allow for no recipe being passed - the caller doesn't
      // care what sort of customer they get, just a realistic one.
      // Happy to oblige!
      recipe = new CustomerRecipe();
    }

    if (!recipe.firstName) {
      recipe.firstName = this.getFaker().name.firstName;
    }

    if (!recipe.lastName) {
      recipe.lastName = this.getFaker().name.lastName;
    }

    // ... Now make the customer using the properties in the recipe.
  }
}

You can see that we allow for no recipe, and then a recipe without a name.

We're assuming here it's okay for a customer to be valid and not exist in a category or have any orders - those are fine with no value.

Using Faker

Faker is a library for generating random data. You can see in the examples above that a Faker instance is passed both to a scenario seeding callback function and is available to a Factory by calling this.getFaker().

This is the recommended way of using Faker with Copious. The instance retrieved via either of these methods has been seeded with a number based on the name of the scenario. This means that the random data produced by your scenario will be consistent. If you run it multiple times the same values will be generated.

Because the data is random but not different every time we solve one of our big goals and give ourselves every chance of being idempotent.

Using Copious with TypeORM

Copious comes with built in support for TypeORM through the EntityFactory.

EntityFactory is a generic base class, that extends Factory and provides a helper function to find or create entities based on existing data.

A Factory based on TypeORM might look like this:

export class CustomerFactory extends EntityFactory<Customer> {
  getEntity(): any {
    return Customer;
  }

  public async createCustomer(recipe?: CustomerRecipe): Promise<Customer> {
    // Make sure we have a name
    if (!recipe.firstName) {
      recipe.firstName = this.getFaker().name.firstName;
    }

    if (!recipe.lastName) {
      recipe.lastName = this.getFaker().name.lastName;
    }

    const customer = await this.findOrCreateEntity({
      firstName: recipe.firstName,
      lastName: recipe.lastName
    });

    customer.email = recipe.email;

    await this.getRepository().save(customer);

    return customer;
  }
}

A few things to note in this example:

We're extending EntityFactory<Customer> where Customer indicates our entity class
We have to implement the getEntity() method to return our entity class (if you're interested this is a limitation of generics in Typescript which means that the base class can only use T as a type check, it can't create a new instance of T)
We call the method this.findOrCreateEntity(). This takes a map of key value pairs and will try to find a matching existing entity using those columns and values. If none exists it will make a new one and set those values and the values of the provided recipe for you.
If we need to perform other operations on our entity before it is committed to the DB, you can pass false to the commit parameter on findOrCreateEntity() so that you get a constructed entity back, but one that hasn't been saved to the DB yet. Remember to commit the object yourself when you have finished with it with await this.getRepository().save(customer);

This means you achieve a find-and-reset or create-new pattern each time.

One final thing that needs to be done is to give Copious the TypeORM connection object to your database:

EntityFactory.setConnection(connection);

You can do this in the same location as your suite definition or wherever is convenient.