@bedrock-libraries/profiler

v1.6.1

Published

2 months ago

Common utilities for use with minecraft scripting modules

Downloads

0High
0Medium
0Low

rroman6174

@bedrock-libraries/profiler

A performance testing library for Minecraft Bedrock Edition scripting, designed to help you compare the execution time of different functions and determine if the differences are statistically significant.

Features

Compare the performance of two functions
Run multiple trials for statistically significant results
Calculate average execution time and standard deviation
Perform statistical analysis to determine if performance differences are significant
Support for setup and cleanup operations before and after trials

Installation

To install the package, run the following command in your project directory:

npm install -D @bedrock-libraries/profiler

Usage

Here's a basic example of how to use the PerformanceTest class:

import { PerformanceTest, PerformanceFunction } from '@bedrock-libraries/profiler';

// Define two functions to compare
const setDynamicProperty = new PerformanceFunction('setDynamicProperty', i => {
    player.setDynamicProperty(`apples`, i);
});

const setScoreboard = new PerformanceFunction('setScoreboard', i => {
    obj.setScore(player, i);
});

// Create a new performance test
const perfTest = new PerformanceTest(functionA, functionB, 50, 10000);

// Run the test
await perfTest.executeTests();

API

PerformanceTest

The main class for running performance tests.

Constructor

constructor(
    funcA: PerformanceFunction,
    funcB: PerformanceFunction,
    trials: number = 30,
    iterations: number = 1000,
    options?: PerformanceTestOptions
)

funcA: The first function to test
funcB: The second function to test
trials: The number of trials to run (must be >= 30)
iterations: The number of times each function is executed per trial
options: Additional options (optional)

Methods

executeTests(): Promise<void>: Runs the performance test and logs the results

PerformanceFunction

A class representing a function to be tested.

Constructor

constructor(
    description: string,
    fn: PerfIterationTestFn,
    beforeEachTrial?: PerfTestTrialFn,
    afterEachTrial?: PerfTestTrialFn,
    beforeAllTrials?: PerfTestAllTrialFn,
    afterAllTrials?: PerfTestAllTrialFn
)

description: A text description of the function
fn: The function to be tested
beforeEachTrial: Function to run before each trial (optional)
afterEachTrial: Function to run after each trial (optional)
beforeAllTrials: Function to run once before all trials (optional)
afterAllTrials: Function to run once after all trials (optional)

PerformanceTestOptions

An interface for additional options when creating a PerformanceTest.

interface PerformanceTestOptions {
    skipControl?: boolean;
    logger?: (arg: string) => void;
}

skipControl: If true, the execution time of the control function will not be subtracted from each trial
logger: A custom logging function (defaults to console.warn)

Technical Overview

Overcoming Millisecond Precision Limitations

In the Minecraft Bedrock scripting environment, the only available timing function is Date.now(), which provides millisecond precision. This level of precision can be insufficient for accurately measuring the performance of fast-executing functions, especially in a game engine environment where operations often complete in microseconds.

To overcome this limitation, the @bedrock-libraries/profiler uses a statistical approach:

Multiple Iterations: Each function is executed multiple times in a single trial. This allows the total execution time to accumulate to a measurable level.
Multiple Trials: The test is repeated across multiple trials to gather a statistically significant sample size.
Statistical Analysis: The results are analyzed using statistical methods to determine if the performance difference between functions is significant.

Selecting Appropriate Values for `trials` and `iterations`

The choice of trials and iterations values is crucial for obtaining reliable results:

Trials

Minimum Value: At least 30 trials are required for the statistical analysis to be valid. This is based on the Central Limit Theorem, which states that the distribution of sample means approximates a normal distribution as the sample size becomes large, generally considered to be 30 or more.
Recommended Range: 30-100 trials usually provide a good balance between accuracy and execution time.
Considerations: More trials increase the confidence in your results but also increase the total execution time of your test.

Iterations

Purpose: The number of iterations determines how many times each function is executed within a single trial.
Selecting a Value: The goal is to choose a number of iterations that makes each trial last long enough to be measurable with millisecond precision.
- Start with a lower number (e.g., 1000) and observe the results.
- If the execution times are too small to be reliably measured (i.e., mostly 0 ms), increase the number of iterations. The testing framework will attempt to do this for you if the measured time is below 2ms
- If execution times are very large, you may decrease the number of iterations to reduce overall test duration.
Typical Range: Depending on the complexity of the functions being tested, anywhere from 1,000 to 1,000,000 iterations per trial may be appropriate.

Example

// For very fast operations
const fastFunctionTest = new PerformanceTest(fastFuncA, fastFuncB, 50, 100000);

// For slower operations
const slowFunctionTest = new PerformanceTest(slowFuncA, slowFuncB, 30, 1000);

Interpreting Results

The profiler calculates the average execution time per iteration by dividing the total time for each trial by the number of iterations. It then performs a statistical analysis across all trials to determine if the difference in performance between the two functions is significant.

By using this approach, the @bedrock-libraries/profiler can provide meaningful performance comparisons even with the millisecond precision limitation of Date.now().

Understanding Result Variability

When running performance tests, it's important to understand that even back-to-back executions with the same settings can produce slightly different results. This variability is normal and expected due to various factors in the runtime environment. Let's examine an example:

Example: Two Consecutive Executions

Execution 1:

Average time for setDynamicProperty: 0.003643333333333333 ms (Std Dev: 0.0003692334401612799)
Average time for setScoreboard: 0.006083333333333333 ms (Std Dev: 0.00021668876987168895)
This performance test execution suggests that setDynamicProperty is about 1.67 times faster than setScoreboard.
The difference in performance between setDynamicProperty and setScoreboard is significant (p-value: 0.00100)

Execution 2:

Average time for setDynamicProperty: 0.00364 ms (Std Dev: 0.00017140393911700976)
Average time for setScoreboard: 0.00599 ms (Std Dev: 0.0001900090741934766)
This performance test execution suggests that setDynamicProperty is about 1.65 times faster than setScoreboard.
The difference in performance between setDynamicProperty and setScoreboard is significant (p-value: 0.00100)

Analyzing the Variability

Average Times: The average times for both functions show slight variations between executions. This is normal due to factors like system load, memory state, and other background processes.
Standard Deviations: The standard deviations also vary between executions, indicating different levels of consistency in the measurements.
Performance Ratio: The relative performance (1.67x vs 1.65x faster) is similar but not identical, reflecting the inherent variability in performance measurements.
Statistical Significance: In both cases, the p-value remains the same (0.00100), indicating that despite the small variations, the performance difference remains statistically significant.

Factors Contributing to Variability

Several factors can contribute to variability in performance measurements:

System Load: Background processes and system activities can affect execution times.
Memory State: The state of memory (e.g., cache contents) can vary between runs.
Just-In-Time (JIT) Compilation: For languages with JIT compilation, optimizations may vary between runs.
Hardware Factors: CPU frequency scaling, thermal throttling, and other hardware-related factors can influence performance.
Timing Precision: The millisecond precision of Date.now() can lead to rounding effects, especially for very fast operations.

Interpreting Results

When interpreting results:

Look for Consistent Trends: While individual numbers may vary, the overall trend (which function is faster) should remain consistent across multiple runs.
Consider the Magnitude: Small variations (like in the example above) are normal. Large variations might indicate inconsistent performance or external factors affecting the test.
Run Multiple Tests: For critical performance assessments, consider running the test multiple times and analyzing the aggregate results.
Check Statistical Significance: The p-value helps determine if the observed difference is statistically significant, even with small variations in exact numbers.

By understanding and accounting for this variability, you can make more informed decisions based on your performance test results.

Statistical Methodology

Two-Tailed Independent Samples t-Test

The profiler uses a two-tailed independent samples t-test to compare the performance of two functions. This statistical test is appropriate for our use case because:

Two-Tailed: We're interested in detecting differences in performance in either direction (whether function A is faster or slower than function B).
Independent Samples: Each trial of function A is independent of each trial of function B.
t-Test: We're comparing the means of two groups (the execution times of the two functions) and assuming that the underlying population follows a normal distribution.

Significance Level

The profiler uses a significance level (α) of 0.01 for determining whether the performance difference between two functions is statistically significant.

Interpreting p-values

For a two-tailed test with α = 0.01:

If p < 0.01: Strong evidence of a significant performance difference in either direction.
If 0.01 ≤ p < 0.05: Suggestive but not conclusive evidence of a difference. Consider running more trials or increasing iterations.
If p ≥ 0.05: Not enough evidence to conclude there's a significant performance difference.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@bedrock-libraries/profiler

Features

Installation

Usage

API

PerformanceTest

Constructor

Methods

PerformanceFunction

Constructor

PerformanceTestOptions

Technical Overview

Overcoming Millisecond Precision Limitations

Selecting Appropriate Values for trials and iterations

Trials

Iterations

Example

Interpreting Results

Understanding Result Variability

Example: Two Consecutive Executions

Analyzing the Variability

Factors Contributing to Variability

Interpreting Results

Statistical Methodology

Two-Tailed Independent Samples t-Test

Significance Level

Interpreting p-values

Contributing

Selecting Appropriate Values for `trials` and `iterations`