@rl-js/interfaces
v0.9.5
Published
Core interfaces for rl-js: Reinforcement Learning in JavaScript
Downloads
10
Readme
Interfaces
ActionTraces
Kind: global interface
actionTraces.record(state, action) ⇒ ActionTraces
Records a trace for the given state-action pair.
Kind: instance method of ActionTraces
Returns: ActionTraces - - This object
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |
actionTraces.update(error) ⇒ ActionTraces
Updates the value function based on the stored traces, and the given error.
Kind: instance method of ActionTraces
Returns: ActionTraces - - This object
| Param | Type | Description | | --- | --- | --- | | error | number | The current TD error |
actionTraces.decay(amount) ⇒ ActionTraces
Decay the traces by the given amount.
Kind: instance method of ActionTraces
Returns: ActionTraces - - This object
| Param | Type | Description | | --- | --- | --- | | amount | number | The amount to multiply the traces by, usually a value less than 1. |
actionTraces.reset() ⇒ ActionTraces
Reset the traces to their starting values. Usually called at the beginning of an episode.
Kind: instance method of ActionTraces
Returns: ActionTraces - - This object
ActionValueFunction ⇐ FunctionApproximator
Kind: global interface
Extends: FunctionApproximator
- ActionValueFunction ⇐ FunctionApproximator
- .call(state, action) ⇒ number
- .update(state, action, error)
- .gradient(state, action) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)
actionValueFunction.call(state, action) ⇒ number
Estimate the expected value of the returns given a specific state-action pair
Kind: instance method of ActionValueFunction
Overrides: call
Returns: number - - The approximated action value (q)
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |
actionValueFunction.update(state, action, error)
Update the value of the function approximator for a given state-action pair
Kind: instance method of ActionValueFunction
Overrides: update
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment | | error | number | The difference between the target value and the currently approximated value |
actionValueFunction.gradient(state, action) ⇒ Array.<number>
Compute the gradient of the function approximator for a given state-action pair, with respect to its parameters.
Kind: instance method of ActionValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |
actionValueFunction.getParameters() ⇒ Array.<number>
Get the differentiable parameters of the function approximator
Kind: instance method of ActionValueFunction
Returns: Array.<number> - The parameters that define the function approximator
actionValueFunction.setParameters(parameters)
Set the differentiable parameters fo the function approximator
Kind: instance method of ActionValueFunction
| Param | Type | Description | | --- | --- | --- | | parameters | Array.<number> | new parameters for the function approximator |
actionValueFunction.updateParameters(errors)
Update the parameters in some direction given by an array of errors.
Kind: instance method of ActionValueFunction
| Param | Type | Description | | --- | --- | --- | | errors | Array.<number> | = The direction with which to update each parameter |
AgentFactory
Kind: global interface
agentFactory.createAgent() ⇒ Agent
Kind: instance method of AgentFactory
Agent
Kind: global interface
agent.newEpisode(environment)
Prepare the agent of the next episode. The Agent should perform any cleanup and setup stepts that are necessary here. An Environment object is passed in, which the agent should store each time.
Kind: instance method of Agent
| Param | Type | Description | | --- | --- | --- | | environment | Environment | The Environment object for the new episode. |
agent.act()
Perform an action for the current timestep. Usually, the agent should at least:
- dispatch an action to the environment, and
- perform any necessary internal updates (e.g. updating the value function).
Kind: instance method of Agent
EnvironmentFactory
Kind: global interface
environmentFactory.createEnvironment() ⇒ Environment
Kind: instance method of EnvironmentFactory
Environment
Kind: global interface
- Environment
- .dispatch(action)
- .getObservation() ⇒ *
- .getReward() ⇒ number
- .isTerminated() ⇒ boolean
environment.dispatch(action)
Apply an action selected by an Agent to the environment. This could a string representing the action (e.g. "LEFT"), or an array representing the force to apply on actuators, etc.
Kind: instance method of Environment
| Param | Type | Description | | --- | --- | --- | | action | * | An action object specific to the environment. |
environment.getObservation() ⇒ *
Get an environment-specific observation for the current timestep. This might be a string identifying the current state, an array representing the current environment parameters, pixel-data representing the agent's vision, etc.
Kind: instance method of Environment
Returns: * - An observation object specific to the environment.
environment.getReward() ⇒ number
Get the reward for the current timestep. Rewards guide the learning of the agent: Positive rewards should be given when the agent selects good actions, and negative rewards should be given when the agent selects bad actions.
Kind: instance method of Environment
Returns: number - A scalar representing the reward for the current timestep.
environment.isTerminated() ⇒ boolean
Return whether or not the current episode is terminated, or finished. For example, this should return True if the agent has reached some goal, if the maximum number of timesteps has been exceeded, or if the agent has otherwise failed. Otherwise, this should return False.
Kind: instance method of Environment
Returns: boolean - A boolean representing whether or not the episode has terminated.
FunctionApproximator
Kind: global interface
- FunctionApproximator
- .call(args) ⇒ number
- .update(args, error)
- .gradient(args) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)
functionApproximator.call(args) ⇒ number
Call the function approximators with the given arguments. The FA should return an estimate of the value of the function at the point given by the arguments.
Kind: instance method of FunctionApproximator
Returns: number - - The approximated value of the function at the given point
| Param | Type | Description | | --- | --- | --- | | args | * | Arguments to the function being approximated approximated |
functionApproximator.update(args, error)
Update the value of the function approximator at the given point.
Kind: instance method of FunctionApproximator
| Param | Type | Description | | --- | --- | --- | | args | * | Arguments to the function being approximated approximated | | error | number | The difference between the target value and the currently approximated value |
functionApproximator.gradient(args) ⇒ Array.<number>
Compute the gradient of the function approximator at the given point, with respect to its parameters.
Kind: instance method of FunctionApproximator
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point
| Param | Type | Description | | --- | --- | --- | | args | Array.<number> | Arguments to the function being approximated approximated |
functionApproximator.getParameters() ⇒ Array.<number>
Get the differentiable parameters of the function approximator
Kind: instance method of FunctionApproximator
Returns: Array.<number> - The parameters that define the function approximator
functionApproximator.setParameters(parameters)
Set the differentiable parameters fo the function approximator
Kind: instance method of FunctionApproximator
| Param | Type | Description | | --- | --- | --- | | parameters | Array.<number> | new parameters for the function approximator |
functionApproximator.updateParameters(errors)
Update the parameters in some direction given by an array of errors.
Kind: instance method of FunctionApproximator
| Param | Type | Description | | --- | --- | --- | | errors | Array.<number> | = The direction with which to update each parameter |
PolicyTraces
Kind: global interface
policyTraces.record(state, action) ⇒ PolicyTraces
Records a trace for the given state-action pair.
Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |
policyTraces.update(error) ⇒ PolicyTraces
Updates the value function based on the stored traces, and the given error.
Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object
| Param | Type | Description | | --- | --- | --- | | error | number | The current TD error |
policyTraces.decay(amount) ⇒ PolicyTraces
Decay the traces by the given amount.
Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object
| Param | Type | Description | | --- | --- | --- | | amount | number | The amount to multiply the traces by, usually a value less than 1. |
policyTraces.reset() ⇒ PolicyTraces
Reset the traces to their starting values. Usually called at the beginning of an episode.
Kind: instance method of PolicyTraces
Returns: PolicyTraces - - This object
Policy
Kind: global interface
- Policy
- .chooseAction(state) ⇒ *
- .chooseBestAction(state) ⇒ *
- .probability(state, action) ⇒ number
- .update(state, action, error)
- .gradient(state, action) ⇒ Array.<number>
- .trueGradient(state, action) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)
policy.chooseAction(state) ⇒ *
Choose an action given the current state.
Kind: instance method of Policy
Returns: * - An Action object of type specific to the environment
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |
policy.chooseBestAction(state) ⇒ *
Choose the best known action given the current state.
Kind: instance method of Policy
Returns: * - An Action object of type specific to the environment
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |
policy.probability(state, action) ⇒ number
Compute the probability of selecting a given action in a given state.
Kind: instance method of Policy
Returns: number - the probability between [0, 1]
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |
policy.update(state, action, error)
Update the probability of choosing a particular action in a particular state. Generally, a positive error should make chosing the action more likely, and a negative error should make chosing the action less likely.
Kind: instance method of Policy
| Param | Type | Description | | --- | --- | --- | | state | Array.<number> | State object of type specific to the environment | | action | * | Action object of type specific to the environment | | error | number | The direction and magnitude of the update |
policy.gradient(state, action) ⇒ Array.<number>
Compute the gradient of natural logarithm of the probability of choosing the given action in the given state with respect to the parameters of the policy. This can often be computed more efficiently than the true gradient.
Kind: instance method of Policy
Returns: Array.<number> - The gradient of the policy
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |
policy.trueGradient(state, action) ⇒ Array.<number>
Compute the true gradient of the probability of choosing the given action in the given state with respect to the parameters of the policy. This is contrast to the log gradient which is used for most things.
Kind: instance method of Policy
Returns: Array.<number> - The gradient of log(π(state, action))
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | action | * | Action object of type specific to the environment |
policy.getParameters() ⇒ Array.<number>
Get the differentiable parameters of the policy
Kind: instance method of Policy
Returns: Array.<number> - The parameters that define the policy
policy.setParameters(parameters)
Set the differentiable parameters of the policy
Kind: instance method of Policy
| Param | Type | Description | | --- | --- | --- | | parameters | Array.<number> | The parameters that define the policy |
policy.updateParameters(errors)
Update the parameters in some direction given by an array of errors.
Kind: instance method of Policy
| Param | Type | Description | | --- | --- | --- | | errors | Array.<number> | = The direction with which to update each parameter |
StateTraces
Kind: global interface
stateTraces.record(state) ⇒ StateTraces
Records a trace for the given state
Kind: instance method of StateTraces
Returns: StateTraces - - This object
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |
stateTraces.update(error) ⇒ StateTraces
Updates the value function based on the stored traces, and the given error.
Kind: instance method of StateTraces
Returns: StateTraces - - This object
| Param | Type | Description | | --- | --- | --- | | error | number | The current TD error |
stateTraces.decay(amount) ⇒ StateTraces
Decay the traces by the given amount.
Kind: instance method of StateTraces
Returns: StateTraces - - This object
| Param | Type | Description | | --- | --- | --- | | amount | number | The amount to multiply the traces by, usually a value less than 1. |
stateTraces.reset() ⇒ StateTraces
Reset the traces to their starting values. Usually called at the beginning of an episode.
Kind: instance method of StateTraces
Returns: StateTraces - - This object
StateValueFunction ⇐ FunctionApproximator
Kind: global interface
Extends: FunctionApproximator
- StateValueFunction ⇐ FunctionApproximator
- .call(state) ⇒ number
- .update(state, error)
- .gradient(state) ⇒ Array.<number>
- .getParameters() ⇒ Array.<number>
- .setParameters(parameters)
- .updateParameters(errors)
stateValueFunction.call(state) ⇒ number
Estimate the expected value of the returns given a specific state.
Kind: instance method of StateValueFunction
Overrides: call
Returns: number - - The approximated state value (v)
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |
stateValueFunction.update(state, error)
Update the value of the function approximator for a given state
Kind: instance method of StateValueFunction
Overrides: update
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment | | error | number | The difference between the target value and the currently approximated value |
stateValueFunction.gradient(state) ⇒ Array.<number>
Compute the gradient of the function approximator for a given state, with respect to its parameters.
Kind: instance method of StateValueFunction
Overrides: gradient
Returns: Array.<number> - The gradient of the function approximator with respect to its parameters at the given point
| Param | Type | Description | | --- | --- | --- | | state | * | State object of type specific to the environment |
stateValueFunction.getParameters() ⇒ Array.<number>
Get the differentiable parameters of the function approximator
Kind: instance method of StateValueFunction
Returns: Array.<number> - The parameters that define the function approximator
stateValueFunction.setParameters(parameters)
Set the differentiable parameters fo the function approximator
Kind: instance method of StateValueFunction
| Param | Type | Description | | --- | --- | --- | | parameters | Array.<number> | new parameters for the function approximator |
stateValueFunction.updateParameters(errors)
Update the parameters in some direction given by an array of errors.
Kind: instance method of StateValueFunction
| Param | Type | Description | | --- | --- | --- | | errors | Array.<number> | = The direction with which to update each parameter |