autoheal
v0.0.22
Published
GPT Test driven development. Automatically fix tests and guide GPT to write and fix code using your tests.
Downloads
26
Maintainers
Readme
autoheal CLI
Auto GPT Agent which automatically fixes code based on failing tests.
How does it work?
Tests can be a reliable description of the expected behavior of a program. When structured well, failing test results can be analysed by GPT-4 to determine possible fixes to the code. GPT-4 can then automatically implement fixes and verify they work by running the tests again.
How to use
In your project directory, run:
npx autoheal
Uses OpenAI's GPT-3.5-turbo or GTP-4 APIs. Requires OpenAI API key.
You can press [Enter] during the run to pause the process and provide a hint better guide autoheal.
⚠️ CAUTION
autoheal will modify files in your project Be sure to commit any unsaved changes before running autoheal will run tests with file modifications made by GPTIt may not be wise to run if your test suite has potentially destructive side effects (e.g. modifying a database or connected to remote services)
How well does it work?
This project is still very experimental and may not always produce good results, so run with caution. The following factors can influence the effectiveness of autoheal:
Nature of the bug or feature
Simplier bugs or features that can be resolved in changes to single files will have most success.
Quality of the tests and test failure output
Test failures that provide enough information (diffs, stack traces etc.) to determine possible paths to fix will have best results. Running tests in a mode that only output failing tests may improve results.
Structure and size of the project
Projects with smaller and well-named files have better results. Autoheal's strategy is limited by openAI's token limit, so infers details by file names.
Hints provided
You can provide a freeform hint to autoheal to provide more specific details (e.g., specific files, or possible ways to fix the bug). This can be useful when the test failure output is not enough to determine a fix.
Model used
Using GPT-4 is much more reliable than GPT-3.5-turbo because it generally produces better results and has a larger token limit. I do not have access, but suspect OpenAI's 32k token model will enable much more effective strategies in the near future.
Test Driven Development + AI workflow
GPT-4 is very capable at writing code, however it can be challenging describing the specifics of the software you want to develop to GPT-4 as well as verify the software behaves in the intended way without subtle bugs. Automated tests can serve as a way to precisely describe the specifications of software and to automatically verify intended functionality. TDD can be used to more precisely steer GPT-4's development power.