@dk1a/solidity-stringutils
v0.3.3
Published
StrSlice & Slice library for Solidity
Downloads
30
Maintainers
Readme
StrSlice & Slice library for Solidity
- Types: StrSlice for strings, Slice for bytes, StrChar for characters
- Gas efficient
- Versioned releases, available for both foundry and hardhat
- Simple imports, you only need e.g.
StrSlice
andtoSlice
StrSlice
enforces UTF-8 character boundaries;StrChar
validates character encoding- Clean, well-documented and thoroughly-tested source code
- Optional PRBTest extension with assertions like
assertContains
andassertLt
for both slices and nativebytes
,string
Slice
andStrSlice
are value types, not structs- Low-level functions like memchr, memcmp, memmove etc
Install
Node
yarn add @dk1a/solidity-stringutils
Forge
forge install --no-commit dk1a/solidity-stringutils
StrSlice
import { StrSlice, toSlice } from "@dk1a/solidity-stringutils/src/StrSlice.sol";
using { toSlice } for string;
/// @dev Returns the content of brackets, or empty string if not found
function extractFromBrackets(string memory stuffInBrackets) pure returns (StrSlice extracted) {
StrSlice s = stuffInBrackets.toSlice();
bool found;
(found, , s) = s.splitOnce(toSlice("("));
if (!found) return toSlice("");
(found, s, ) = s.rsplitOnce(toSlice(")"));
if (!found) return toSlice("");
return s;
}
/*
assertEq(
extractFromBrackets("((1 + 2) + 3) + 4"),
toSlice("(1 + 2) + 3")
);
*/
See ExamplesTest.
Internally StrSlice
uses Slice
and extends it with logic for multibyte UTF-8 where necessary.
| Method | Description |
| ---------------- | ------------------------------------------------ |
| len
| length in bytes |
| isEmpty
| true if len == 0 |
| toString
| copy slice contents to a new string |
| keccak
| equal to keccak256(s.toString())
, but cheaper |
concatenate
| add
| Concatenate 2 slices into a new string |
| join
| Join slice array on self
as separator |
compare
| cmp
| 0 for eq, < 0 for lt, > 0 for gt |
| eq
,ne
| ==, != (more efficient than cmp) |
| lt
,lte
| <, <= |
| gt
,gte
| >, >= |
index
| isCharBoundary
| true if given index is an allowed boundary |
| get
| get 1 UTF-8 character at given index |
| splitAt
| (slice[:index], slice[index:]) |
| getSubslice
| slice[start:end] |
search
| find
| index of the start of the first match |
| rfind
| index of the start of the last match |
| | return type(uint256).max
for no matches |
| contains
| true if a match is found |
| startsWith
| true if starts with pattern |
| endsWith
| true if ends with pattern |
modify
| stripPrefix
| returns subslice without the prefix |
| stripSuffix
| returns subslice without the suffix |
| splitOnce
| split into 2 subslices on the first match |
| rsplitOnce
| split into 2 subslices on the last match |
| replacen
| experimental replace n
matches |
| | replacen requires 0 < pattern.len() <= to.len()|
iterate
| chars
| character iterator over the slice |
ascii
| isAscii
| true if all chars are ASCII |
dangerous
| asSlice
| get underlying Slice |
| ptr
| get memory pointer |
Indexes are in bytes, not characters. Indexing methods revert if isCharBoundary
is false.
StrCharsIter
Returned by chars
method of StrSlice
import { StrSlice, toSlice, StrCharsIter } from "@dk1a/solidity-stringutils/src/StrSlice.sol";
using { toSlice } for string;
/// @dev Returns a StrSlice of `str` with the 2 first UTF-8 characters removed
/// reverts on invalid UTF8
function removeFirstTwoChars(string memory str) pure returns (StrSlice) {
StrCharsIter memory chars = str.toSlice().chars();
for (uint256 i; i < 2; i++) {
if (chars.isEmpty()) break;
chars.next();
}
return chars.asStr();
}
/*
assertEq(removeFirstTwoChars(unicode"📎!こんにちは"), unicode"こんにちは");
*/
| Method | Description |
| ---------------- | ------------------------------------------------ |
| asStr
| get underlying StrSlice of the remainder |
| len
| remainder length in bytes |
| isEmpty
| true if len == 0 |
| next
| advance the iterator, return the next StrChar |
| nextBack
| advance from the back, return the next StrChar |
| count
| returns the number of UTF-8 characters |
| validateUtf8
| returns true if the sequence is valid UTF-8 |
dangerous
| unsafeNext
| advance unsafely, return the next StrChar |
| unsafeCount
| unsafely count chars, read the source for caveats|
| ptr
| get memory pointer |
count
, validateUtf8
, unsafeCount
consume the iterator in O(n).
Safe methods revert on an invalid UTF-8 byte sequence.
unsafeNext
does NOT check if the iterator is empty, may underflow! Does not revert on invalid UTF-8. If returned StrChar
is invalid, it will have length 0. Otherwise length 1-4.
Internally next
, unsafeNext
, count
all use _nextRaw
. It's very efficient, but very unsafe and complicated. Read the source and import it separately if you need it.
StrChar
Represents a single UTF-8 encoded character. Internally it's bytes32 with leading byte at MSB.
It's returned by some methods of StrSlice
and StrCharsIter
.
| Method | Description |
| ---------------- | ------------------------------------------------ |
| len
| character length in bytes |
| toBytes32
| returns the underlying bytes32
value |
| toString
| copy the character to a new string |
| toCodePoint
| returns the unicode code point (ord
in python) |
| cmp
| 0 for eq, < 0 for lt, > 0 for gt |
| eq
,ne
| ==, != |
| lt
,lte
| <, <= |
| gt
,gte
| >, >= |
| isValidUtf8
| usually true |
| isAscii
| true if the char is ASCII |
Import StrChar__
(static function lib) to use StrChar__.fromCodePoint
for code point to StrChar
conversion.
len
can return 0
only for invalid UTF-8 characters. But some invalid chars may have non-zero len! (use isValidUtf8
to check validity). Note that 0x00
is a valid 1-byte UTF-8 character, its len is 1.
isValidUtf8
can be false if the character was formed with an unsafe method (fromUnchecked, wrap).
Slice
import { Slice, toSlice } from "@dk1a/solidity-stringutils/src/Slice.sol";
using { toSlice } for bytes;
function findZeroByte(bytes memory b) pure returns (uint256 index) {
return b.toSlice().find(
bytes(hex"00").toSlice()
);
}
See using {...} for Slice global
in the source for a function summary. Many are shared between Slice
and StrSlice
, but there are differences.
Internally Slice has very minimal assembly, instead using memcpy
, memchr
, memcmp
and others; if you need the low-level functions, see src/utils/
.
Assertions (PRBTest extension)
import { PRBTest } from "@prb/test/src/PRBTest.sol";
import { Assertions } from "@dk1a/solidity-stringutils/src/test/Assertions.sol";
contract StrSliceTest is PRBTest, Assertions {
function testContains() public {
bytes memory b1 = "12345";
bytes memory b2 = "3";
assertContains(b1, b2);
}
function testLt() public {
string memory s1 = "123";
string memory s2 = "124";
assertLt(s1, s2);
}
}
You can completely ignore slices if all you want is e.g. assertContains
for native bytes
/string
.
Acknowledgements
- Arachnid/solidity-stringutils - I basically wanted to make an updated version of solidity-stringutils
- rust - most similarities are in names and general structure; the implementation can't really be similar (solidity doesn't even have generics)
- paulrberg/prb-math - good template for solidity data structure libraries with
using {...} for ... global
- brockelmore/memmove - good assembly memory management examples