npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@kenzic/story

v0.9.1

Published

Simple plain text format designed to facilitate the creation of fine-tuning training examples for Open AI

Downloads

1

Readme

Using Parser

Install

yarn add @kenzic/story

Usage

JavaScript

import { storyToJsonl } from '@kenzic/story';

CLI

$ story convert <input> [output]

$ story convert input.story output.jsonl

$ story convert input.story
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

Story Format Documentation

Introduction

The story format is a plain text format designed to facilitate the easy creation of fine-tuning training examples for OpenAI's GPT-3.5 models. Its key advantage lies in its simplicity and easy conversion to the JSONL format required by OpenAI. The story format is ideal for developers who aim to customize the behavior of their GPT-3.5 model without dealing with complex data structures.

The problem

Fine Tuning--which is the process of training a model to perform a specific task--requires a large amount of training data. This data must be in the form of a JSONL file, which is a JSON file where each line is a JSON object. This format is difficult for humans to edit and maintain, and it is not ideal for creating a diverse range of conversational scenarios.

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

The solution

The story format allows you to write your training data in a simple, human-readable format. Each story is a sequence of turns, where each turn is a role and a message. The story format is easy to read, write, and commit to your repo while getting meaningful diffs.

System: Marv is a factual chatbot that is also sarcastic.
User: What's the capital of France?
Assistant: Paris, as if everyone doesn't know that already.
---
System: Marv is a factual chatbot that is also sarcastic.
User: Who wrote 'Romeo and Juliet'?
Assistant: Oh, just some guy named William Shakespeare. Ever heard of him?
---
System: Marv is a factual chatbot that is also sarcastic.
User: How far is the Moon from Earth?
Assistant: Around 384,400 kilometers. Give or take a few, like that really matters.

Each message is formatted with the role and content separated by a colon and a space. Conversations are separated by three dashes (---).

Basic Structure

A story is essentially a sequence of turns involving a system, user, and assistant. Each turn is designated by the role initiating it--System, User, or Assistant (must be capitalized)--followed by a colon, and then the corresponding text.

Support for "Function" role is not currently supported by OpenAI API.

System: Role instruction
User: User input
Assistant: Assistant response

Multiple turns can exist in a single story, and they are separated by three dashes (---). Each story represents a single conversational scenario.

Example

Here is an example in the story format:

User: Write a function that takes a argument and does [TASK]
Assistant: Sorry, messages aren’t a great way to share code. but checkout my github

---

System: You are a chatbot
User: Write a function that takes a argument and does [TASK]
Assistant: Hit me up on LinkedIn, I’m happy to help.
User: I need it now
Assistant: I hear you. have you tried Chat GPT? It can be pretty helpful

Converting to JSONL

You can easily convert the story format to the JSONL format required by OpenAI. Each turn becomes a dictionary with a role and content key, and the whole story becomes a list of such dictionaries. This list is then serialized as a JSON line in a .jsonl file.

Advantages

  • Simplicity: Easy to read, write and understand diffs, reducing the chances of making errors.
  • Flexibility: Allows for a diverse range of conversations and scenarios. A tune's content can be anything from a single sentence to a multi-lined paragraph or code example.
  • Easy Conversion: Straightforward transformation to the required JSONL format.

By utilizing the story format, developers can more efficiently prepare and fine-tune their models to perform specific tasks or behave in particular manners.