copious
v0.2.0
Published
A framework to provide some guidance for seeding data in developer environments
Downloads
383
Readme
Copious
Copious provides a way to make seeding data in a project a better experience for developers.
Why Seeded Data?
There are many reasons for choosing to seed data in your project. Here are the main ones:
- Privacy. Real data should stay on your production servers, not be downloaded into developer environments.
- Quality. Fake data entered by developers and testers is often of low quality. Good seeded data should look and feel much more realistic
- Consistency. Every developer on the team can have the same data without sharing snapshots around.
- Up story. New developers on a project can get up and running on a project without requiring and data snapshots.
Scenario Data Seeding
It's easy to make some scripts to fill your tables with some random data, but this has a number of problems:
- Data Integrity. It's rare that random data will respect the rules of your application by simply flooding tables with data
- Referential Integrity. It's unlikely foreign keys will be properly respected
- Scale. The number of objects will not match real world ratios. Too many of one object, not enough of another. Or you might have too many objects - how many is enough?
Instead Copious introduces scenario based data seeding. It has a simple maxim:
Every piece of seeded data should exist because of a scenario
What is a scenario?
Scenarios create data required to complete a key journey in the software. It exposes the functionality by setting up states ready for the user to join and complete key actions. It saves time as testers don't need to create those states themselves.
Scenarios can be useful to a number of people on the team.
QA : Testers will be able to join complex journeys at key places without the tedium of setting up the relevant state. QA may also setup scenarios for automated testing tools in order to minimise test run times
PMs : Project Managers will also be interested in testing key journeys. They may also request scenarios configured in order to complete demos with a client.
Designers : Designers often need to assist on components or screens in the middle or at the end of complex journeys. Scenarios can get them there immediately without wasting time exploring how to do it.
Developers : Developers often need to reset data in order to retry modifications to code. Scenario seeding makes this fast and easy.
In practice, the factory objects used for seeding also become useful when writing unit and integration tests.
The golden rules
In our experience, there are some golden rules which if observed by the whole team it ensures everyone gets the most value from data seeding.
1. Naming: Scenarios name a feature (or sub feature).
The name of a scenario should describe not 'what' it creates, by 'why' it creates it. Why is this scenario important and to whom?
2. Descriptive: Scenarios should describe what they've created
The scenario body should detail to the console the various objects and journeys it has configured. Logins should be detailed, URLs presented etc. This should be done with regard for presentation - tabs, bold etc. to make it as readable as possible.
The output of a scenario should give the user everything they need to make their contribution.
3. Idempotent: Scenarios should not 'spam' the database
If you run a seeder twice in succession it should not create double the records. It should find and reset existing records, and only add if missing. This is extremely important as seeding is designed to be run repeatedly.
4. Complete: Scenarios should be independent of one another
One scenario should not depend upon another scenario, or indeed upon the order of execution. Users should be able to run all scenarios, or just one scenario and expect it to work.
5. Isolated: Don't delete or update other records
Scenarios should only modify data for the records it has created. It should not empty tables etc. In theory, you should be able to run another person's scenario at any point without it damaging your own data.
6. Composed: Built from shared units
Keep things DRY by using Factories to make objects from Recipes.
Using Copious
First, add Copious to your project:
npm install copious --save-dev
Create a Suite
object and add some scenarios to it:
// my-seeding-suite.js
const suite = new Suite();
suite.addScenario("bookings", "Cancel Booking", (describe, faker) => {
// make some booking and get it into a state for cancelling
describe("A booking exists {insert url} ready to be cancelled.");
});
This scenario is setting up a journey for testing around the cancellation of a booking.
addScenario
takes 3 arguments.
A scope - a simple string that groups related scenarios together. This lets you run all the scenarios from a single scope.
A name - another simple string that describes the intention of the scenario.
A callback that accepts an instance of faker and another callback (called
describe
) which can be used to output messages to the console.
To run the seeder we need to either drop a copious.json file into our current folder, or pass a switch to load our suite file.
copious.json
This file is used to provide defaults for copious, like the location of our default suite.
{
"suite": "./my-seeding-suite"
}
Command line switches
Copious takes two possible arguments:
--config, -c : path to copious.json if not the current working directory
--suite, -s : path to the default suite JS file if not using a copious.json file.
Running Copious
Invoke Copious either by just calling copious
if installed globally, or you can use
npx:
# If installed globally:
copious
# Otherwise:
npx copious
Composing seeders
You don't need to have a single suite, you can have several. You must however compose these into a single suite to pass to Copious. You can use addSuite to do this:
// my-seeding-suite.js
import { bookingSuite } from "bookings/seeders/booking-suite";
import { orderSuite } from "orders/seeders/order-suite";
const suite = new Suite();
suite.addSuite(bookingSuite);
suite.addSuite(orderSuite);
export default suite;
Factories and Recipes
Making objects should be the role of a Factory
. A Factory
often has a single method
that takes a Recipe
and returns an instance of the object. Nearly all the permutations
and complexity of the object is described by the Recipe
. Sometimes however a Factory
may have more than one method if it provides a description and handy shortcut.
e.g.
- createCustomer(customerRecipe)
- createIndividualCustomer(customerRecipe)
- createCorporateCustomer(customerRecipe)
Factory
objects should extend the Factory
base class.
A Recipe
object is a simple object enclosing all the values, flags and switches
that control the objects returned by a Factory
.
An overarching principle is that everything in a recipe, including the recipe itself, should be optional. All missing values should be randomised using Faker.
Recipe
objects should provide setters to set those values and use a fluent pattern
so that those calls can be chained. This allows for inline expressions that are
very easy to read. For example:
const customer = new CustomerFactory().createCustomer(
new CustomerRecipe()
.withName("John", "Smith")
.inCategory("Prospects")
.havingOrders(3)
);
This clear creates a customer called John Smith in a category called 'Prospects' having 3 orders.
Missing Values
As mentioned above it's important that all values should be optional.
Here's how the createCustomer
method might start:
class CustomerFactory extends Factory {
public createCustomer(recipe?: CustomerRecipe) {
if (!recipe) {
// First allow for no recipe being passed - the caller doesn't
// care what sort of customer they get, just a realistic one.
// Happy to oblige!
recipe = new CustomerRecipe();
}
if (!recipe.firstName) {
recipe.firstName = this.getFaker().name.firstName;
}
if (!recipe.lastName) {
recipe.lastName = this.getFaker().name.lastName;
}
// ... Now make the customer using the properties in the recipe.
}
}
You can see that we allow for no recipe, and then a recipe without a name.
We're assuming here it's okay for a customer to be valid and not exist in a category or have any orders - those are fine with no value.
Using Faker
Faker is a library for generating random data. You can see in the examples above that
a Faker instance is passed both to a scenario seeding callback function and is available
to a Factory by calling this.getFaker()
.
This is the recommended way of using Faker with Copious. The instance retrieved via either of these methods has been seeded with a number based on the name of the scenario. This means that the random data produced by your scenario will be consistent. If you run it multiple times the same values will be generated.
Because the data is random but not different every time we solve one of our big goals and give ourselves every chance of being idempotent.
Using Copious with TypeORM
Copious comes with built in support for TypeORM through the EntityFactory
.
EntityFactory
is a generic base class, that extends Factory
and provides a helper
function to find or create entities based on existing data.
A Factory based on TypeORM might look like this:
export class CustomerFactory extends EntityFactory<Customer> {
getEntity(): any {
return Customer;
}
public async createCustomer(recipe?: CustomerRecipe): Promise<Customer> {
// Make sure we have a name
if (!recipe.firstName) {
recipe.firstName = this.getFaker().name.firstName;
}
if (!recipe.lastName) {
recipe.lastName = this.getFaker().name.lastName;
}
const customer = await this.findOrCreateEntity({
firstName: recipe.firstName,
lastName: recipe.lastName
});
customer.email = recipe.email;
await this.getRepository().save(customer);
return customer;
}
}
A few things to note in this example:
- We're extending
EntityFactory<Customer>
whereCustomer
indicates our entity class - We have to implement the
getEntity()
method to return our entity class (if you're interested this is a limitation of generics in Typescript which means that the base class can only use T as a type check, it can't create a new instance of T) - We call the method
this.findOrCreateEntity()
. This takes a map of key value pairs and will try to find a matching existing entity using those columns and values. If none exists it will make a new one and set those values and the values of the provided recipe for you. - If we need to perform other operations on our entity before it is committed to the DB,
you can pass
false
to the commit parameter onfindOrCreateEntity()
so that you get a constructed entity back, but one that hasn't been saved to the DB yet. Remember to commit the object yourself when you have finished with it withawait this.getRepository().save(customer);
This means you achieve a find-and-reset or create-new pattern each time.
One final thing that needs to be done is to give Copious the TypeORM connection object to your database:
EntityFactory.setConnection(connection);
You can do this in the same location as your suite definition or wherever is convenient.