print-nonascii
v0.0.3
Published
UNIX CLI that prints lines that contain non-ASCII characters.
Downloads
2
Maintainers
Readme
Contents
- print-nonascii: print lines that contain non-ASCII characters.
- Examples
- Installation
- Usage
- License
- Changelog
print-nonascii: print lines that contain non-ASCII characters.
print-nonascii
is a Unix CLI that locates lines in text files or
stdin input that contain non-ASCII characters, which is helpful when
diagnosing character encoding problems.
Lines can be printed as-is and/or using abstract representations of non-ASCII characters in one of several formats; namely:
-v
,--caret
... the same representationcat -v
uses, based on caret notation.--bash
... per-byte two-digit hex. escape sequences such as\xc3
--psh
... PowerShell Unicode escape sequences such as`u{20ac}
for€
Note: --psh
only works correctly with properly UTF-8-encoded input.
Line numbers can be prepended on request, and output for multiple input files is by default preceded with headers identifying each input file.
Caveat: For now, no automated tests are run before releases.
Examples
# Create a test file with 1 line containing a non-ASCII character.
$ cat <<'EOF' > /tmp/test.txt
one
twö
three
EOF
# Print only lines that have non-ASCII characters, as-is.
$ print-nonascii /tmp/test.txt
twö
# Print only lines that have non-ASCII characters, with line numbers:
$ print-nonascii -n /tmp/test.txt
2:twö
# Print only lines that have non-ASCII characters, using PowerShell
# Unicode escape-sequence notation (--psh), preceded by the
# line as-is (--raw).
# The Unicode code point of character "ö" is U+00F6:
$ print-nonascii --psh --raw /tmp/test.txt
twö
tw`u{f6}
# Ditto with line numbers and per-byte Bash escape sequences:
$ print-nonascii --bash --raw /tmp/test.txt
twö
tw\xc3\xb6
# Simulate input from multiple files by specifying the same file
# twice, so as to show the headers identifying each input file
# (suppress with -b).
# Note that each header line (invisibly) starts with control
# character U+0001, so as to allow more predictable
# identification of header lines in the output.
$ print-nonascii -n /tmp/test.txt /tmp/test.txt
### /tmp/test.txt
2:twö
### /tmp/test.txt
2:twö
Installation
Prerequisites
- When installing from the npm registry: macOS and Linux
- When installing manually: any Unix platform with
bash
that also hasperl
installed.
Installation from the npm registry
With Node.js installed, install the package as follows:
[sudo] npm install print-nonascii -g
Note:
Note: Even if you don't use Node.js, its package manager, npm
, works across platforms and is easy to install; try curl -L https://git.io/n-install | bash
- Whether you need
sudo
depends on how you installed Node.js / io.js and whether you've changed permissions later; if you get anEACCES
error, try again withsudo
. - The
-g
ensures global installation and is needed to putprint-nonascii
in your system's$PATH
.
Manual installation
- Download the CLI as
print-nonascii
. - Make it executable with
chmod +x print-nonascii
. - Move it or symlink it to a folder in your
$PATH
, such as/usr/local/bin
(macOS) or/usr/bin
(Linux).
Usage
Find concise usage information below; for complete documentation, read the manual online, or, once installed, run man print-nonascii
(print-nonascii --man
if installed manually).
$ print-nonascii --help
Prints lines that contain non-ASCII characters.
print-nonascii [--<mode> [-r]] [-n] [-b] [file ...]
print-nonascii -q [file ...]
--<mode> prints abstract representations of non-ASCII chars.; one of:
--caret, -v ... use caret notation, as cat -v would.
--bash ... represent non-ASCII bytes as \xhh
--psh ... (PowerShell) represent non-ASCII Unicode characters as
Unicode escape sequences: <backtick>u{h...}
-r, --raw ... with --<mode>, print each matching line as-is too, first.
-n, --line-number ... prefix the output lines with their line number from
the original file, using format "<line-number>:" - decimal line numbers,
no padding, no space before or after the ":"
-b, --bare ... suppress per-input-filename headers
-q ... quiet mode: produce no output; signal presence of non-ASCII chars.
with exit code 0; exit code 100 signals that there are none.
Standard options: --help, --man, --version, --home
License
Copyright (c) 2017 Michael Klement [email protected] (http://same2u.net), released under the MIT license.
Acknowledgements
This project gratefully depends on the following open-source components, according to the terms of their respective licenses.
npm dependencies below have an optional suffix denoting the type of dependency: the absence of a suffix denotes a required run-time dependency; (D)
denotes a development-time-only dependency, (O)
an optional dependency, and (P)
a peer dependency.
npm dependencies
Changelog
Versioning complies with semantic versioning (semver).
v0.0.3 (2017-09-11):
- [enhancement] Header lines are now only printed for input files that produce at least 1 output line.
v0.0.2 (2017-09-10):
- [fix] Header line is no longer printed twice when
--<mode>
is combined with--raw
. - Header line now uses a tab char. to separate prefix
###
from the filename.
- [fix] Header line is no longer printed twice when
v0.0.1 (2017-09-10):
- Initial release.