npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

markovjs-async

v0.1.3

Published

reinforcement learning in javascript (fork supporting promises)

Downloads

5

Readme

markovjs

This is an implementation of markovjs that supports async Game logic. Note that it does not work yet due to an issue with async iterators in Babel. Stay tuned.

npm install markovjs-async

Release License

This is a reference implementation of a basic reinforcement learning environment. It is intended as a playground for anyone interested in this field.

My goal is to provide a minimal and clean implementation of the main concepts, so you can:

  • Plug in some problem you want to try to solve and play around
  • Understand what's going on and how does the agent learn
  • Extend functionality via custom data types or functions

What's inside:

  • Basic TD(0) value iteration algorithm
  • Basic memory implementation
  • Common policies

Getting Started

This package exports a function that provides the environment you'll need to try your own problems.

There are three components required for the learning to start:

  • a game implementation
  • a memory implementation
  • policies for the agent to follow

The environment provides helpful methods to set those up, train an agent and replay its findings within your game. This example shows a basic usage of this package, and each step will be explained in its own section in order.

import markov from 'markovjs'
import {egreedy} from 'markovjs/policies'
import * as memory from 'markovjs/memory'
import * as game './game'

const α = 0.1 // learning rate
const γ = 0.9 // discount factor
const ε = 0.1 // exploration rate

markov() // creates an environment
  .game(game, game.initial) // sets up the game
  .memory(memory, memory.init(0.0)) // sets up the memory
  .policies(egreedy(ε)) // sets up the policies
  .train(100, α, γ) // train for one hundred episodes
  .play(episode => { /* play time! */ })

.game (game: Game, initialGameState: G)

sets up the game for the learning environment

It takes the game implementation as its first argument and the game initial state as the second one. This initial game state will be used in all game simulations and can only be changed by calling this method again.

The game implementation should be implemented by you following this interface:

// A: Action type
// G: Game state type
type Game<A, G> = {
  actions: G=> Array<A>, // what are the allowed actions for given state?
  act: (G, A) => Promise<G>, // what state leads given state taken given action?
  reward: (G, G) => Promise<number>, // what is the reward from going to state from state?
  final: G=> Promise<boolean> // is the given state final?
}

This is generally all you need to implement in order to use this package.

That's not to say you shouldn't mess around anywhere else if you feel like it.

tips
  • Need an example? grid-world and n-armed-bandit (coming soon)
  • The way you model your problem affects the agent's ability to learn it. State is what your agent sees and the reward is what it seeks!
  • There might be restraints on your state implementation depending on the memory implementation you use. Check out the memory section for more info

.memory (memory: Memory, initialMemoryState: M)

sets up the memory for the learning environment

This method is analogous to .game. It takes the memory implementation as its first argument and the memory initial state as the second one.

This package provides a basic implementation for the memory that can be used out of the box. It includes both required functions and an extra init one, that returns an empty memory state. The init function takes a number to be used as the initial value for all unset state-action pairs.

import * as memory from 'markovjs/memory'

const m0 = memory.init(0.1) // this means all values are defaulted to 0.1
const m1 = memory.update(m0, 0, 1, v => v + 2.0) // updates the value for G=0 A=1
const rater = memory.rater(m1, 0) // gets rater for G=0
rater(0) // rates G=0 A=0, which gives out 0.1
rater(1) // rates G=0 A=1, which gives out 2.1

This memory implementation relies on toString method to compare your game states. This means that for this memory to work correctly, you need to make sure the string returned by toString for your game state really represents it.

tips
  • You might have to implement a custom toString method for your state type. Need an example? grid-world
  • Don't feel like implementing the toString method? Check out this memory implementation
  • Most of the heavy work is lifted by the memory. Want to speed things up? Roll up your own faster memory implementation!

.policies (move: Policy, learn: Policy = move, play: Policy = learn)

sets up the policies to be followed by the agent in the learning environment

It takes one required policy (move) and two optional ones (learn and play). If one policy is omitted, it is defaulted to the previous one. The policies are used by the agent as follows:

  • move: the one followed while learning
  • learn: the one expected to be learned
  • play: the one followed while playing

This package provides the implementation of the most popular policies used in this type of learning algorithm.

import * as policies from 'markovjs/policies'

policies.random // always chooses random action
policies.greedy // always chooses the action with higher expected return
policies.egreedy(0.1) // acts random with 0.1 chance and greedy with 0.9 chance
tips
  • Use the greedy policy carefully, since it can lead to infinite loops on training or playing
  • If your agent follows and learns the same policy during training, call it SARSA
  • If your agent follows one policy while learning the greedy one, call it Q-Learning

.train (sessions: number, alpha: number, gamma: number)

trains an agent using the game, memory and policies previously set

It takes the number of episode sessions to train your agent for as its first argument. The second and third ones are the learning rate and discount factor parameters.

This method will mutate the environment's memory to reflect the agent's learning. How long it takes for this method to run will depend both on your game's episode length and agent's performance.

Meaning it will not take forever unless your agent is both really stubborn and really disciplined.

tips
  • Both the learning rate and discount rate are problem specific.
  • How many sessions it takes to learn the problem? Great question.

.play (callback: Episode => void)

generates a playing episode using current game, memory and policy settings

The only parameter taken by this function is a callback to pass the resulting episode.

An episode is a javascript iterator of Transitions.

export type Transition<A, G> = {|
  gameState: G, // state the agent was at
  action: A, // the action it took
  nextGameState: G, // where the action led
  reward: number // what the agent got out of it
|}
tips
  • The episode isn't guaranteed to be finite (specially if you're agent is too greedy)
  • The reward sum is what your agent is trying to maximize!

Going Deeper

Not satisfied with the included memory implementation? Want to try out a custom policy? This training environment is too simple for you?

This section will expose the main data types and abstractions adopted in this package.

Let me know if you code something awesome with them.

Memory

The included memory implementation is supposed to be basic and easy to understand. Other implementations might focus on performance or even new functionality.

If you want to implement your own, here's what you need to code:

// A: Action type
// G: Game state type
// M: Memory type
export type Memory<A, G, M> = {
  update: (M, G, A, number=> number)=> M, // maps memory value for (G, A) pair using given function
  rater: (M, G) => (A) => number // returns a function that rates actions for state G
}

Policy

If you want to implement your own policies, it is just as easy as writing a simple function. You probably won't need to, since the ones included should get you covered. I sure won't stop you though, so here is the expected signature:

export type Policy <A> = (
  Array <A>, // the array of actions to choose from
  A=> number // a function that returns the expected return of an action
) => A // chosen action

Misc

In order to implement the learning environment I found useful to code these two primitives:

  • Move: makes a step from given game state following given policy using given memory state.
  • Learn: updates the memory using a 1-step value iteration function, simulating the next move in given game with given policy and memory.

You might find these functions useful to code your own extensions, so here are their signatures:

export type Move<A, G, M> = (
  Game<A, G>,
  G,
  Memory<A, G, M>,
  M,
  Policy<A>
)=> Transition<A, G>
export type Learn<A, G, M> = (
  Game<A, G>,
  Transition<A, G>,
  Memory<A, G, M>,
  M,
  Policy<A>
)=> M

What Next

Coming Soon

  • n-armed-bandit game example
  • eligibility traces support
  • function approximation support

Thanks

Seriously, for reading this whole doc.

You're awesome.