@rl-js/interfaces

v0.9.5

Published

3 years ago

Core interfaces for rl-js: Reinforcement Learning in JavaScript

Downloads

0High
0Medium
0Low

cpnota

Interfaces

ActionTraces

Kind: global interface

ActionTraces
- .record(state, action) ⇒ ActionTraces
- .update(error) ⇒ ActionTraces
- .decay(amount) ⇒ ActionTraces
- .reset() ⇒ ActionTraces

actionTraces.record(state, action) ⇒ ActionTraces

Records a trace for the given state-action pair.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |

actionTraces.update(error) ⇒ ActionTraces

Updates the value function based on the stored traces, and the given error.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

| Param | Type | Description | | --- | --- | --- | | error | number | The current TD error |

actionTraces.decay(amount) ⇒ ActionTraces

Decay the traces by the given amount.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

| Param | Type | Description | | --- | --- | --- | | amount | number | The amount to multiply the traces by, usually a value less than 1. |

actionTraces.reset() ⇒ ActionTraces

Reset the traces to their starting values. Usually called at the beginning of an episode.

Kind: instance method of ActionTraces
Returns: ActionTraces - - This object

ActionValueFunction ⇐ FunctionApproximator

Kind: global interface
Extends: FunctionApproximator

ActionValueFunction ⇐ FunctionApproximator
- .call(state, action) ⇒ number
- .update(state, action, error)
- .gradient(state, action) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)

actionValueFunction.call(state, action) ⇒ number

Estimate the expected value of the returns given a specific state-action pair

Kind: instance method of ActionValueFunction
Overrides: call
Returns: number - - The approximated action value (q)

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |

actionValueFunction.update(state, action, error)

Update the value of the function approximator for a given state-action pair

Kind: instance method of ActionValueFunction
Overrides: update

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment | | error | number | The difference between the target value and the currently approximated value |

actionValueFunction.gradient(state, action) ⇒ Array.<number>

Compute the gradient of the function approximator for a given state-action pair, with respect to its parameters.

Kind: instance method of ActionValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |

actionValueFunction.getParameters() ⇒ Array.<number>

Get the differentiable parameters of the function approximator

Kind: instance method of ActionValueFunction
Returns: Array.<number> - The parameters that define the function approximator

actionValueFunction.setParameters(parameters)

Set the differentiable parameters fo the function approximator

Kind: instance method of ActionValueFunction

| Param | Type | Description | | --- | --- | --- | | parameters | Array.<number> | new parameters for the function approximator |

actionValueFunction.updateParameters(errors)

Update the parameters in some direction given by an array of errors.

Kind: instance method of ActionValueFunction

| Param | Type | Description | | --- | --- | --- | | errors | Array.<number> | = The direction with which to update each parameter |

AgentFactory

Kind: global interface

agentFactory.createAgent() ⇒ Agent

Kind: instance method of AgentFactory

Agent

Kind: global interface

Agent
- .newEpisode(environment)
- .act()

agent.newEpisode(environment)

Prepare the agent of the next episode. The Agent should perform any cleanup and setup stepts that are necessary here. An Environment object is passed in, which the agent should store each time.

Kind: instance method of Agent

| Param | Type | Description | | --- | --- | --- | | environment | Environment | The Environment object for the new episode. |

agent.act()

Perform an action for the current timestep. Usually, the agent should at least:

dispatch an action to the environment, and
perform any necessary internal updates (e.g. updating the value function).

Kind: instance method of Agent

EnvironmentFactory

Kind: global interface

environmentFactory.createEnvironment() ⇒ Environment

Kind: instance method of EnvironmentFactory

Environment

Kind: global interface

Environment
- .dispatch(action)
- .getObservation() ⇒ *
- .getReward() ⇒ number
- .isTerminated() ⇒ boolean

environment.dispatch(action)

Apply an action selected by an Agent to the environment. This could a string representing the action (e.g. "LEFT"), or an array representing the force to apply on actuators, etc.

Kind: instance method of Environment

| Param | Type | Description | | --- | --- | --- | | action | * | An action object specific to the environment. |

environment.getObservation() ⇒ *

Get an environment-specific observation for the current timestep. This might be a string identifying the current state, an array representing the current environment parameters, pixel-data representing the agent's vision, etc.

Kind: instance method of Environment
Returns: * - An observation object specific to the environment.

environment.getReward() ⇒ number

Get the reward for the current timestep. Rewards guide the learning of the agent: Positive rewards should be given when the agent selects good actions, and negative rewards should be given when the agent selects bad actions.

Kind: instance method of Environment
Returns: number - A scalar representing the reward for the current timestep.

environment.isTerminated() ⇒ boolean

Return whether or not the current episode is terminated, or finished. For example, this should return True if the agent has reached some goal, if the maximum number of timesteps has been exceeded, or if the agent has otherwise failed. Otherwise, this should return False.

Kind: instance method of Environment
Returns: boolean - A boolean representing whether or not the episode has terminated.

FunctionApproximator

Kind: global interface

FunctionApproximator
- .call(args) ⇒ number
- .update(args, error)
- .gradient(args) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)

functionApproximator.call(args) ⇒ number

Call the function approximators with the given arguments. The FA should return an estimate of the value of the function at the point given by the arguments.

Kind: instance method of FunctionApproximator
Returns: number - - The approximated value of the function at the given point

| Param | Type | Description | | --- | --- | --- | | args | * | Arguments to the function being approximated approximated |

functionApproximator.update(args, error)

Update the value of the function approximator at the given point.

Kind: instance method of FunctionApproximator

| Param | Type | Description | | --- | --- | --- | | args | * | Arguments to the function being approximated approximated | | error | number | The difference between the target value and the currently approximated value |

functionApproximator.gradient(args) ⇒ Array.<number>

Compute the gradient of the function approximator at the given point, with respect to its parameters.

Kind: instance method of FunctionApproximator
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

| Param | Type | Description | | --- | --- | --- | | args | Array.<number> | Arguments to the function being approximated approximated |

functionApproximator.getParameters() ⇒ Array.<number>

Get the differentiable parameters of the function approximator

Kind: instance method of FunctionApproximator
Returns: Array.<number> - The parameters that define the function approximator

functionApproximator.setParameters(parameters)

Set the differentiable parameters fo the function approximator

Kind: instance method of FunctionApproximator

| Param | Type | Description | | --- | --- | --- | | parameters | Array.<number> | new parameters for the function approximator |

functionApproximator.updateParameters(errors)

Update the parameters in some direction given by an array of errors.

Kind: instance method of FunctionApproximator

| Param | Type | Description | | --- | --- | --- | | errors | Array.<number> | = The direction with which to update each parameter |

PolicyTraces

Kind: global interface

PolicyTraces
- .record(state, action) ⇒ PolicyTraces
- .update(error) ⇒ PolicyTraces
- .decay(amount) ⇒ PolicyTraces
- .reset() ⇒ PolicyTraces

policyTraces.record(state, action) ⇒ PolicyTraces

Records a trace for the given state-action pair.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |

policyTraces.update(error) ⇒ PolicyTraces

Updates the value function based on the stored traces, and the given error.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

| Param | Type | Description | | --- | --- | --- | | error | number | The current TD error |

policyTraces.decay(amount) ⇒ PolicyTraces

Decay the traces by the given amount.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

| Param | Type | Description | | --- | --- | --- | | amount | number | The amount to multiply the traces by, usually a value less than 1. |

policyTraces.reset() ⇒ PolicyTraces

Reset the traces to their starting values. Usually called at the beginning of an episode.

Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object

Policy

Kind: global interface

Policy
- .chooseAction(state) ⇒ *
- .chooseBestAction(state) ⇒ *
- .probability(state, action) ⇒ number
- .update(state, action, error)
- .gradient(state, action) ⇒ Array.<number>
- .trueGradient(state, action) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)

policy.chooseAction(state) ⇒ *

Choose an action given the current state.

Kind: instance method of Policy
Returns: * - An Action object of type specific to the environment

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |

policy.chooseBestAction(state) ⇒ *

Choose the best known action given the current state.

Kind: instance method of Policy
Returns: * - An Action object of type specific to the environment

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |

policy.probability(state, action) ⇒ number

Compute the probability of selecting a given action in a given state.

Kind: instance method of Policy
Returns: number - the probability between [0, 1]

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |

policy.update(state, action, error)

Update the probability of choosing a particular action in a particular state. Generally, a positive error should make chosing the action more likely, and a negative error should make chosing the action less likely.

Kind: instance method of Policy

| Param | Type | Description | | --- | --- | --- | | state | Array.<number> | State object of type specific to the environment | | action | * | Action object of type specific to the environment | | error | number | The direction and magnitude of the update |

policy.gradient(state, action) ⇒ Array.<number>

Compute the gradient of natural logarithm of the probability of choosing the given action in the given state with respect to the parameters of the policy. This can often be computed more efficiently than the true gradient.

Kind: instance method of Policy
Returns: Array.<number> - The gradient of the policy

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |

policy.trueGradient(state, action) ⇒ Array.<number>

Compute the true gradient of the probability of choosing the given action in the given state with respect to the parameters of the policy. This is contrast to the log gradient which is used for most things.

Kind: instance method of Policy
Returns: Array.<number> - The gradient of log(π(state, action))

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |

policy.getParameters() ⇒ Array.<number>

Get the differentiable parameters of the policy

Kind: instance method of Policy
Returns: Array.<number> - The parameters that define the policy

policy.setParameters(parameters)

Set the differentiable parameters of the policy

Kind: instance method of Policy

| Param | Type | Description | | --- | --- | --- | | parameters | Array.<number> | The parameters that define the policy |

policy.updateParameters(errors)

Update the parameters in some direction given by an array of errors.

Kind: instance method of Policy

| Param | Type | Description | | --- | --- | --- | | errors | Array.<number> | = The direction with which to update each parameter |

StateTraces

Kind: global interface

StateTraces
- .record(state) ⇒ StateTraces
- .update(error) ⇒ StateTraces
- .decay(amount) ⇒ StateTraces
- .reset() ⇒ StateTraces

stateTraces.record(state) ⇒ StateTraces

Records a trace for the given state

Kind: instance method of StateTraces
Returns: StateTraces - - This object

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |

stateTraces.update(error) ⇒ StateTraces

Updates the value function based on the stored traces, and the given error.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

| Param | Type | Description | | --- | --- | --- | | error | number | The current TD error |

stateTraces.decay(amount) ⇒ StateTraces

Decay the traces by the given amount.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

| Param | Type | Description | | --- | --- | --- | | amount | number | The amount to multiply the traces by, usually a value less than 1. |

stateTraces.reset() ⇒ StateTraces

Reset the traces to their starting values. Usually called at the beginning of an episode.

Kind: instance method of StateTraces
Returns: StateTraces - - This object

StateValueFunction ⇐ FunctionApproximator

Kind: global interface
Extends: FunctionApproximator

StateValueFunction ⇐ FunctionApproximator
- .call(state) ⇒ number
- .update(state, error)
- .gradient(state) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)

stateValueFunction.call(state) ⇒ number

Estimate the expected value of the returns given a specific state.

Kind: instance method of StateValueFunction
Overrides: call
Returns: number - - The approximated state value (v)

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |

stateValueFunction.update(state, error)

Update the value of the function approximator for a given state

Kind: instance method of StateValueFunction
Overrides: update

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | error | number | The difference between the target value and the currently approximated value |

stateValueFunction.gradient(state) ⇒ Array.<number>

Compute the gradient of the function approximator for a given state, with respect to its parameters.

Kind: instance method of StateValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point

| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |

stateValueFunction.getParameters() ⇒ Array.<number>

Get the differentiable parameters of the function approximator

Kind: instance method of StateValueFunction
Returns: Array.<number> - The parameters that define the function approximator

stateValueFunction.setParameters(parameters)

Set the differentiable parameters fo the function approximator

Kind: instance method of StateValueFunction

| Param | Type | Description | | --- | --- | --- | | parameters | Array.<number> | new parameters for the function approximator |

stateValueFunction.updateParameters(errors)

Update the parameters in some direction given by an array of errors.

Kind: instance method of StateValueFunction

| Param | Type | Description | | --- | --- | --- | | errors | Array.<number> | = The direction with which to update each parameter |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Interfaces

ActionTraces

actionTraces.record(state, action) ⇒ ActionTraces

actionTraces.update(error) ⇒ ActionTraces

actionTraces.decay(amount) ⇒ ActionTraces

actionTraces.reset() ⇒ ActionTraces

ActionValueFunction ⇐ FunctionApproximator

actionValueFunction.call(state, action) ⇒ number

actionValueFunction.update(state, action, error)

actionValueFunction.gradient(state, action) ⇒ Array.<number>

actionValueFunction.getParameters() ⇒ Array.<number>

actionValueFunction.setParameters(parameters)

actionValueFunction.updateParameters(errors)

AgentFactory

agentFactory.createAgent() ⇒ Agent

Agent

agent.newEpisode(environment)

agent.act()

EnvironmentFactory

environmentFactory.createEnvironment() ⇒ Environment

Environment

environment.dispatch(action)

environment.getObservation() ⇒ *

environment.getReward() ⇒ number

environment.isTerminated() ⇒ boolean

FunctionApproximator

functionApproximator.call(args) ⇒ number

functionApproximator.update(args, error)

functionApproximator.gradient(args) ⇒ Array.<number>

functionApproximator.getParameters() ⇒ Array.<number>

functionApproximator.setParameters(parameters)

functionApproximator.updateParameters(errors)

PolicyTraces

policyTraces.record(state, action) ⇒ PolicyTraces

policyTraces.update(error) ⇒ PolicyTraces

policyTraces.decay(amount) ⇒ PolicyTraces

policyTraces.reset() ⇒ PolicyTraces

Policy

policy.chooseAction(state) ⇒ *

policy.chooseBestAction(state) ⇒ *

policy.probability(state, action) ⇒ number

policy.update(state, action, error)

policy.gradient(state, action) ⇒ Array.<number>

policy.trueGradient(state, action) ⇒ Array.<number>

policy.getParameters() ⇒ Array.<number>

policy.setParameters(parameters)

policy.updateParameters(errors)

StateTraces

stateTraces.record(state) ⇒ StateTraces

stateTraces.update(error) ⇒ StateTraces

stateTraces.decay(amount) ⇒ StateTraces

stateTraces.reset() ⇒ StateTraces

StateValueFunction ⇐ FunctionApproximator

stateValueFunction.call(state) ⇒ number

stateValueFunction.update(state, error)

stateValueFunction.gradient(state) ⇒ Array.<number>

stateValueFunction.getParameters() ⇒ Array.<number>

stateValueFunction.setParameters(parameters)

stateValueFunction.updateParameters(errors)