npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

datagen-test

v0.1.4

Published

Materialize Datagen CLI tool

Downloads

4

Readme

Datagen CLI

Installation

Note: Until the package has been published on npmjs.org, you can install it from source

git clone https://github.com/MaterializeInc/datagen.git
cd datagen
npm install
npm link

Usage

datagen -h
Usage: datagen [options]

Fake Data Generator

Options:
  -V, --version             output the version number
  -f, --format <char>       The format of the produced data (choices: "json", "avro", default: "json")
  -s, --schema <char>       Schema file to use
  -n, --number <char>       Number of records to generate. For infinite records, use -1 (default: "10")
  -c, --clean               Clean Kafka topic and schema registry before producing data
  -dr, --dry-run            Dry run (no data will be produced to Kafka)
  -d, --debug               Output extra debugging information
  -w, --wait <int>          Wait time in ms between record production
  -rs, --record-size <int>  Record size in bytes, eg. 1048576 for 1MB
  -h, --help                display help for command

Env variables

To produce records to a Kafka topic, you need to set the following environment variables:

SASL_USERNAME=
SASL_PASSWORD=
SASL_MECHANISM=
KAFKA_BROKERS=

Examples

# Generate 10 records in JSON format
datagen -s products.sql -f json -n 10

Output:

✔  Parsing schema...


✔  Creating Kafka topic...


✔  Producing records...


✔  Record sent to Kafka topic
  {"products":{"id":50720,"name":"white","merchant_id":76809,"price":1170,"status":89517,"created_at":"upset"}}
  ...

JSON Schema

The JSON schema option allows you to define the data that is generated using Faker.js.

[
    {
        "_meta": {
            "topic": "mz_datagen_users"
        },
        "id": "datatype.uuid",
        "name": "internet.userName",
        "email": "internet.exampleEmail",
        "phone": "phone.imei",
        "website": "internet.domainName",
        "city": "address.city",
        "company": "company.name",
        "age": "datatype.number",
        "created_at": "datatype.datetime"
    }
]

The schema needs to be an array of objects, as that way we can produce relational data in the future.

Each object represents a record that will be generated. The _meta key is used to define the topic that the record will be sent to.

You can find the documentation for Faker.js here

Record Size Option

In some cases, you might need to generate a large amount of data. In that case, you can use the --record-size option to generate a record of a specific size.

The --record-size 1048576 option will generate a 1MB record. So if you have to generate 1GB of data, you run the command with the following options:

datagen -s ./tests/datasize.json -f json -n 1000 --record-size 1048576

This will add a recordSizePayload key to the record with the specified size and will send the record to Kafka.

Note: The 'Max Message Size' of your Kafka cluster needs to be set to a higher value than 1MB for this to work.

UPSERT Evelope Support

To make sure UPSERT envelope is supported, you need to define an id column in the schema. The value of the id column will be used as the key of the record.

Faker.js and SQL Schema

The SQL schema option allows you to define the data that is generated using Faker.js by defining a COMMENT on the column.

CREATE TABLE "ecommerce"."products" (
  "id" int PRIMARY KEY,
  "name" varchar COMMENT 'internet.userName',
  "merchant_id" int NOT NULL COMMENT 'datatype.number',
  "price" int COMMENT 'datatype.number',
  "status" int COMMENT 'datatype.boolean',
  "created_at" datetime DEFAULT (now())
);

The COMMENT needs to be a valid Faker.js function. You can find the documentation for Faker.js here.

Docker

Build the docker image.

docker buildx build -t datagen .

Run a command.

docker run \
  --rm -it \
  -v ${PWD}/.env:/app/.env \
  -v ${PWD}/tests/schema.json:/app/blah.json \
    datagen -s blah.json -n 1 --dry-run

Generate records with sequence numbers

To simulate auto incrementing primary keys, you can use the iteration.index variable in the schema.

This is particularly useful when you want to generate a small set of records with sequence of IDs, for example 1000 records with IDs from 1 to 1000:

[
    {
        "_meta": {
            "topic": "mz_datagen_users"
        },
        "id": "iteration.index",
        "name": "internet.userName",
    }
]

Example:

datagen -s tests/iterationIndex.json --dry-run -f json -n 1000