@bedrock-libraries/profiler
v1.6.1
Published
Common utilities for use with minecraft scripting modules
Downloads
5
Readme
@bedrock-libraries/profiler
A performance testing library for Minecraft Bedrock Edition scripting, designed to help you compare the execution time of different functions and determine if the differences are statistically significant.
Features
- Compare the performance of two functions
- Run multiple trials for statistically significant results
- Calculate average execution time and standard deviation
- Perform statistical analysis to determine if performance differences are significant
- Support for setup and cleanup operations before and after trials
Installation
To install the package, run the following command in your project directory:
npm install -D @bedrock-libraries/profiler
Usage
Here's a basic example of how to use the PerformanceTest
class:
import { PerformanceTest, PerformanceFunction } from '@bedrock-libraries/profiler';
// Define two functions to compare
const setDynamicProperty = new PerformanceFunction('setDynamicProperty', i => {
player.setDynamicProperty(`apples`, i);
});
const setScoreboard = new PerformanceFunction('setScoreboard', i => {
obj.setScore(player, i);
});
// Create a new performance test
const perfTest = new PerformanceTest(functionA, functionB, 50, 10000);
// Run the test
await perfTest.executeTests();
API
PerformanceTest
The main class for running performance tests.
Constructor
constructor(
funcA: PerformanceFunction,
funcB: PerformanceFunction,
trials: number = 30,
iterations: number = 1000,
options?: PerformanceTestOptions
)
funcA
: The first function to testfuncB
: The second function to testtrials
: The number of trials to run (must be >= 30)iterations
: The number of times each function is executed per trialoptions
: Additional options (optional)
Methods
executeTests(): Promise<void>
: Runs the performance test and logs the results
PerformanceFunction
A class representing a function to be tested.
Constructor
constructor(
description: string,
fn: PerfIterationTestFn,
beforeEachTrial?: PerfTestTrialFn,
afterEachTrial?: PerfTestTrialFn,
beforeAllTrials?: PerfTestAllTrialFn,
afterAllTrials?: PerfTestAllTrialFn
)
description
: A text description of the functionfn
: The function to be testedbeforeEachTrial
: Function to run before each trial (optional)afterEachTrial
: Function to run after each trial (optional)beforeAllTrials
: Function to run once before all trials (optional)afterAllTrials
: Function to run once after all trials (optional)
PerformanceTestOptions
An interface for additional options when creating a PerformanceTest
.
interface PerformanceTestOptions {
skipControl?: boolean;
logger?: (arg: string) => void;
}
skipControl
: Iftrue
, the execution time of the control function will not be subtracted from each triallogger
: A custom logging function (defaults toconsole.warn
)
Technical Overview
Overcoming Millisecond Precision Limitations
In the Minecraft Bedrock scripting environment, the only available timing function is Date.now()
, which provides millisecond precision. This level of precision can be insufficient for accurately measuring the performance of fast-executing functions, especially in a game engine environment where operations often complete in microseconds.
To overcome this limitation, the @bedrock-libraries/profiler
uses a statistical approach:
Multiple Iterations: Each function is executed multiple times in a single trial. This allows the total execution time to accumulate to a measurable level.
Multiple Trials: The test is repeated across multiple trials to gather a statistically significant sample size.
Statistical Analysis: The results are analyzed using statistical methods to determine if the performance difference between functions is significant.
Selecting Appropriate Values for trials
and iterations
The choice of trials
and iterations
values is crucial for obtaining reliable results:
Trials
- Minimum Value: At least 30 trials are required for the statistical analysis to be valid. This is based on the Central Limit Theorem, which states that the distribution of sample means approximates a normal distribution as the sample size becomes large, generally considered to be 30 or more.
- Recommended Range: 30-100 trials usually provide a good balance between accuracy and execution time.
- Considerations: More trials increase the confidence in your results but also increase the total execution time of your test.
Iterations
- Purpose: The number of iterations determines how many times each function is executed within a single trial.
- Selecting a Value: The goal is to choose a number of iterations that makes each trial last long enough to be measurable with millisecond precision.
- Start with a lower number (e.g., 1000) and observe the results.
- If the execution times are too small to be reliably measured (i.e., mostly 0 ms), increase the number of iterations. The testing framework will attempt to do this for you if the measured time is below 2ms
- If execution times are very large, you may decrease the number of iterations to reduce overall test duration.
- Typical Range: Depending on the complexity of the functions being tested, anywhere from 1,000 to 1,000,000 iterations per trial may be appropriate.
Example
// For very fast operations
const fastFunctionTest = new PerformanceTest(fastFuncA, fastFuncB, 50, 100000);
// For slower operations
const slowFunctionTest = new PerformanceTest(slowFuncA, slowFuncB, 30, 1000);
Interpreting Results
The profiler calculates the average execution time per iteration by dividing the total time for each trial by the number of iterations. It then performs a statistical analysis across all trials to determine if the difference in performance between the two functions is significant.
By using this approach, the @bedrock-libraries/profiler
can provide meaningful performance comparisons even with the millisecond precision limitation of Date.now()
.
Understanding Result Variability
When running performance tests, it's important to understand that even back-to-back executions with the same settings can produce slightly different results. This variability is normal and expected due to various factors in the runtime environment. Let's examine an example:
Example: Two Consecutive Executions
Execution 1:
Average time for setDynamicProperty: 0.003643333333333333 ms (Std Dev: 0.0003692334401612799)
Average time for setScoreboard: 0.006083333333333333 ms (Std Dev: 0.00021668876987168895)
This performance test execution suggests that setDynamicProperty is about 1.67 times faster than setScoreboard.
The difference in performance between setDynamicProperty and setScoreboard is significant (p-value: 0.00100)
Execution 2:
Average time for setDynamicProperty: 0.00364 ms (Std Dev: 0.00017140393911700976)
Average time for setScoreboard: 0.00599 ms (Std Dev: 0.0001900090741934766)
This performance test execution suggests that setDynamicProperty is about 1.65 times faster than setScoreboard.
The difference in performance between setDynamicProperty and setScoreboard is significant (p-value: 0.00100)
Analyzing the Variability
Average Times: The average times for both functions show slight variations between executions. This is normal due to factors like system load, memory state, and other background processes.
Standard Deviations: The standard deviations also vary between executions, indicating different levels of consistency in the measurements.
Performance Ratio: The relative performance (1.67x vs 1.65x faster) is similar but not identical, reflecting the inherent variability in performance measurements.
Statistical Significance: In both cases, the p-value remains the same (0.00100), indicating that despite the small variations, the performance difference remains statistically significant.
Factors Contributing to Variability
Several factors can contribute to variability in performance measurements:
- System Load: Background processes and system activities can affect execution times.
- Memory State: The state of memory (e.g., cache contents) can vary between runs.
- Just-In-Time (JIT) Compilation: For languages with JIT compilation, optimizations may vary between runs.
- Hardware Factors: CPU frequency scaling, thermal throttling, and other hardware-related factors can influence performance.
- Timing Precision: The millisecond precision of
Date.now()
can lead to rounding effects, especially for very fast operations.
Interpreting Results
When interpreting results:
- Look for Consistent Trends: While individual numbers may vary, the overall trend (which function is faster) should remain consistent across multiple runs.
- Consider the Magnitude: Small variations (like in the example above) are normal. Large variations might indicate inconsistent performance or external factors affecting the test.
- Run Multiple Tests: For critical performance assessments, consider running the test multiple times and analyzing the aggregate results.
- Check Statistical Significance: The p-value helps determine if the observed difference is statistically significant, even with small variations in exact numbers.
By understanding and accounting for this variability, you can make more informed decisions based on your performance test results.
Statistical Methodology
Two-Tailed Independent Samples t-Test
The profiler uses a two-tailed independent samples t-test to compare the performance of two functions. This statistical test is appropriate for our use case because:
Two-Tailed: We're interested in detecting differences in performance in either direction (whether function A is faster or slower than function B).
Independent Samples: Each trial of function A is independent of each trial of function B.
t-Test: We're comparing the means of two groups (the execution times of the two functions) and assuming that the underlying population follows a normal distribution.
Significance Level
The profiler uses a significance level (α) of 0.01 for determining whether the performance difference between two functions is statistically significant.
Interpreting p-values
For a two-tailed test with α = 0.01:
- If p < 0.01: Strong evidence of a significant performance difference in either direction.
- If 0.01 ≤ p < 0.05: Suggestive but not conclusive evidence of a difference. Consider running more trials or increasing iterations.
- If p ≥ 0.05: Not enough evidence to conclude there's a significant performance difference.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.