pdb-images
v2.5.0
Published
Generates images from mmCIF/BCIF files
Downloads
15
Maintainers
Readme
PDBImages
PDBImages is a command-line tool for generating images of macromolecular structures from mmCIF or binary CIF structure files based on Mol*.
Installing as a command-line tool
PDBImages is available in the npm registry. You can install it globally on your machine (requires Node.js >= 18):
npm install -g pdb-images
Usage
NOTE: The following examples assume you installed PDBImages globally with npm install -g pdb-images
. If you installed locally in the current directory (npm install pdb-images
), use npx pdb-images
instead of pdb-images
. If you cloned the git repository and built it, use node ./lib/cli/pdb-images.js
instead of pdb-images
.
Print help:
pdb-images --help
Generate all images for PDB entry 1ad5
and save in directory data/output_1ad5/
, with default settings:
pdb-images 1ad5 data/output_1ad5/
Another example, with all command line arguments given:
pdb-images 1hda data/output_1hda/ \
--input test_data/structures/1hda.cif \
--input-public https://www.ebi.ac.uk/pdbe/entry-files/download/1hda.bcif \
--mode pdb \
--api-url https://www.ebi.ac.uk/pdbe/api \
--api-retry \
--no-api \
--size 500x500 300x200 \
--render-each-size \
--type entry assembly \
--view front \
--opaque-background \
--no-axes \
--show-hydrogens \
--show-branched-sticks \
--ensemble-shades \
--allow-lowest-quality \
--date 2023-04-20 \
--clear \
--log DEBUG
Input
Input is a structure file in mmCIF (.cif
) or binary CIF (.bcif
) format. The input file can be also compressed by GZIP (.cif.gz
, .bcif.gz
). If the --input
option is not given, the input file will be retrieved from a public source (https://www.ebi.ac.uk/pdbe/entry-files/download/{id}.bcif
for PDB mode, https://alphafold.ebi.ac.uk/files/{id}.cif
for AlphaFold mode). However, for this to work in AlphaFold mode, the user has to specify full identifier of a model in AlphaFold DB, e.g. "AF-Q5VSL9-F1-model_v4", not only "Q5VSL9". There is a persisting issue with .bcif
files provided by AlphaFold DB (this might be fixed in the future), .cif
files are processed without problems (therefore the default public source is set to .cif
for AlphaFold mode).
Supplementary input data will be retrieved from the PDBe API. The default API URL is https://www.ebi.ac.uk/pdbe/api
but can be changed by the --api-url
option. The URL can use http:
, https:
, or file:
protocol; using file:
protocol allows the user to "plug in" custom data from a local directory, e.g. --api-url 'file://path-to-this-repository/test_data/api'
. When using this approach, the organization of the files in the directory and the format of these file must imitate the PDBe API endpoints; see test_data/api/
directory for a demonstration. If the program cannot find a specific file in the directory, it will print a warning and proceed as if the API returned an empty JSON response ({}
).
Overview of accessed API endpoints (will be prefixed by the API URL and {id}
will be replaced by the entry ID (i.e. the first command line argument)):
/pdb/entry/molecules/{id}
– for entity names in the image captions (not essential)/pdb/entry/summary/{id}
– for preferred assembly information (not essential)/pdb/entry/modified_AA_or_NA/{id}
– for modified residue data (essential formodres
images)/mappings/{id}
,/nucleic_mappings/{id}
– for SIFTS domain mappings (essential fordomain
images)/validation/residuewise_outlier_summary/entry/{id}
– for validation report data (essential forvalidation
images)
With the --no-api
option, API will not be used at all. Running without API will affect the program's behavior as follows:
- the image types that vitally depend on the API data (i.e.
domain
,modres
,validation
) will not be generated; - some features can behave slightly differently (entity names for captions will be retrieved from the structure file instead of the API data;
entity
images will be rendered using the first assembly instead of the preferred assembly); - the final self-check, whether all expected images have been generated, will be skipped.
The legacy PDB file format is not directly supported by pdb-images
. For convenience, this package provides a script for conversion of PDB files to mmCIF, which can then be passed to pdb-images
. However, correct behavior with the converted files cannot be guaranteed, as the internal logic of the PDB format is fundamentally different from mmCIF, and this conversion should not be relied on. Use original mmCIF files whenever possible. Usage:
pdb2cif data/1ad5.pdb data/1ad5.cif
Output
Image files
The program creates a collection of image types. Each scene can be rendered in different views (front, side, top; --view
option) and in different resolutions (--size
option). Besides the rendered images in PNG format, the program also saves .molj
files (Mol* plugin states, aka snapshots, which can be loaded in Mol*) and .caption.json
files (image captions).
(Names of the individual files may be a bit confusing, as they were inherited from an older image generation process. See section Generated image types for explanation of the filenames.)
Summary files
After generating all images, two summary files are created:
{pdb}_filelist
contains the list of created images{pdb}.json
contains the structured list of created images, including their captions and some other metadata.
These summary files contain filenames without suffixes, e.g. 1ad5_deposited_chain_front
instead of the full filename 1ad5_deposited_chain_front_image-800x800.png
. To get full filenames, you must combine the filenames in the "image"
sections and the suffixes in the "image_suffix"
section of the JSON summary file (e.g 1ad5_deposited_chain_front
+ _image-800x800.png
-> 1ad5_deposited_chain_front_image-800x800.png
).
If the output directory contains older files from previous runs, these will also be included in the summary files (run with --clear
to remove any older files instead). If you only want to update the summary files based on the current contents of the output directory without generating any new images, run with --type
(without specifying any type).
After creating all output files, the program will perform a self-check, i.e. it will compare the expected list of output files (based purely on API data, agnostic to the structure file) with the actual list of generated output files. In case that any expected file is missing, the program will print an error message, save the expected file list to {id}_expected_files.txt
, and terminate with a non-zero exit code. This self-check is skipped when using --no-api
.
Generated image types
PDBImages generates many types of images. By default, it will create all image types that make sense for the selected mode (pdb
/alphafold
) and entry. Alternatively, the user can select a subset of image types by the --type
option. These are all the available types:
entry
– Create images of the whole deposited structure, colored by chains and colored by entities (i.e. chemically distinct molecules).- –>
{pdb}_deposited_chain_{view}_image-{size}.png
- –>
{pdb}_deposited_chemically_distinct_molecules_{view}_image-{size}.png
- –>
assembly
– For each assembly listed in the mmCIF file, create images of the whole assembly, colored by chains and colored by entities.- –>
{pdb}_assembly_{assembly}_chain_{view}_image-{size}.png
- –>
{pdb}_assembly_{assembly}_chemically_distinct_molecules_{view}_image-{size}.png
- –>
entity
- For each distinct entity, create an image of the preferred assembly with this entity highlighted. This excludes the water entity. If an entity is not present in the preferred assembly, the program will instead use the first assembly where this entity is present (e.g. entity 5 in 7nys). If an entity is not present in any assembly, the deposited model will be used instead (e.g. entity 3 in 6ml1).- –>
{pdb}_entity_{entity}_{view}_image-{size}.png
- –>
domain
– Create images for SIFTS mappings (CATH, SCOP, Pfam, Rfam). Namely, for each combination of SIFTS family and entity, select a chain belonging to that entity and create an image of the chain with highlighted SIFTS domain(s). If there are domains from the same family in different entities, process each of them separately. If there are multiple domains from the same family in the same entity but in different chains, process just one of the chains. If there are multiple domains from the same family within one chain, render this chain with each domain highlighted in a different color (choose the chain with most domain in such case). Requires API.- –>
{pdb}_{entity}_{chain}_{source}_{family}_image-{size}.png
- –>
ligand
– For each distinct non-polymer entity in the structure (with the exception of water), create an image of this molecule highlighted plus its surrounding. If there are multiple instances of the same entity, only process one of them.- –>
{pdb}_ligand_{ligand}_image-{size}.png
- –>
modres
– For each distinct modified residue in the structure, create an image of the preferred assembly with all instances of this modified residue highlighted. Requires API.- –>
{pdb}_modres_{modres}_{view}_image-{size}.png
- –>
bfactor
– Create an image of the deposited structure in putty representation with color-coded B-factors. Skip if the structure is not from a diffraction method (thus B-factors are not available).- –>
{pdb}_bfactor_image-{size}.png
- –>
validation
– Create an image of the deposited structure with color-coded validation data. Requires API.- –>
{pdb}_validation_geometry_deposited_image-{size}.png
- –>
plddt
– Create an image of the deposited structure with color-coded pLDDT values. This is only for--mode alphafold
.- –>
{pdb}_modres_{modres}_image-{size}.png
- –>
all
– A shortcut to create all meaningful image types (i.e. all butplddt
inpdb
mode,plddt
inalphafold
mode).
By default, some image types are rendered in three views (front, side, top view) with axis arrows shown in the left bottom corner, while other image types are only rendered in front view without axis arrows. This can be changed by the --view
and --no-axes
options.
By default, the images are rendered in one resolution, 800x800. This can be changed by the --size
option. If multiple sizes are provided (e.g. --size 100x100 800x800 1600x1600
), only the largest size (measured by area) will be rendered and the others will be obtained by resizing (use --render_each_size
to render each size separately).
If you use --size
without any value, no PNG images will be rendered but captions (.caption.json
) and state files (.molj
) will still be created.
Overview of the command-line arguments
positional arguments:
entry_id Entry identifier (PDB ID or AlphaFoldDB ID).
output_dir Output directory.
optional arguments:
-h, --help show this help message and exit
-v, --version Print version info and exit.
--input INPUT Input structure file path or URL (.cif, .bcif,
.cif.gz, .bcif.gz).
--input-public INPUT_PUBLIC
Input structure URL to use in saved Mol* states (.molj
files) (cif or bcif format).
--mode {pdb,alphafold}
Mode.
--api-url API_URL PDBe API URL (can use http:, https:, or file: protocol).
Default: https://www.ebi.ac.uk/pdbe/api.
--api-retry Retry any failed API call up to 5 times, waiting
random time (up to 30 seconds) before each retry.
--no-api Do not use PDBe API at all (some images will be
skipped, some entity names will be different in
captions, etc.).
--size [SIZE ...] One or more output image sizes, e.g. 800x800 200x200.
Default: 800x800. Only the largest size is rendered,
others are obtained by resizing unless
--render_each_size is used. Use without any value to
disable image rendering (only create captions and MOLJ
files).
--render-each-size Render image for each size listed in --size, instead
of rendering only the first size and resampling to the
other sizes.
--type [{entry,assembly,entity,domain,ligand,modres,bfactor,validation,plddt,all} ...]
One or more image types to be created. Use "all" as a
shortcut for all types. See README.md for details on
image types. Default: all. Use without any value to
skip all types (only create summary files from
existing outputs).
--view {front,all,auto}
Select which views should be created for each image
type (front view / all views (front, side, top) / auto
(creates all views only for these image types: entry,
assembly, entity, modres, plddt)). Default: auto.
--opaque-background Render opaque background in images (default:
transparent background).
--no-axes Do not render axis indicators aka PCA arrows (default:
render axes when rendering the same scene from
multiple view angles (front, side, top)).
--show-hydrogens Show hydrogen atoms in ball-and-stick visuals
(default: always ignore hydrogen atoms).
--show-branched-sticks
Show semi-transparent ball-and-stick visuals for
branched entities (i.e. carbohydrates) in addition to
the default 3D-SNFG visuals.
--ensemble-shades Show individual models within an ensemble in different
shades of the base color (lighter and darker),
default: use the same colors for all models.
--allow-lowest-quality
Allow any quality level for visuals, including
"lowest", which is really ugly (default: allow only
"lower" quality level and better).
--force-bfactor Force outputting "bfactor" image type even if the structure is
not from X-ray (this might be necessary for custom mmCIF files
with missing information about experimental methods).
--date DATE Date to use as "last_modification" in the caption JSON
(default: today's date formatted as YYYY-MM-DD).
--clear Remove all contents of the output directory before
running.
--log {ALL,TRACE,DEBUG,INFO,WARN,ERROR,FATAL,MARK,OFF}
Set logging level. Default: INFO.
Run in Docker
NOTE: Docker image for PDBImages uses Xvfb, which results in much worse performance compared to running it directly on a machine with GPU (see FAQ).
Get image from repository and run
docker run -v ~/data/output_1ad5:/out pdbegroup/pdb-images 1ad5 /out
Build and run
docker build . -t pdb-images # if you run it on the same architecture as build
docker build . -t pdb-images --platform linux/amd64 # if you need it for a different architecture
docker run -v ~/data/output_1ad5:/out pdb-images 1ad5 /out
Run in Singularity
singularity build ./pdb-images docker://pdbegroup/pdb-images
singularity run --env XVFB_DIR=~/data/xvfb ./pdb-images 1ad5 ~/data/output_1ad5
It is important to set XVFB_DIR
variable to an existing mounted directory (use --bind
if paths are not mounted automatically). When running multiple jobs in parallel, set a separate XVFB_DIR
for each job.
Including as a dependency
PDBImages is available in the npm registry. You can add it as a dependency to your own package (requires Node.js >= 18):
npm install pdb-images
Then you can call the asynchronous main
function (and others) in your code. This example shows how to call main
from TypeScript code:
import { createArgs } from 'pdb-images/lib/args';
import { main } from 'pdb-images/lib/main';
main(createArgs('1ad5', 'data/output_1ad5/', { size: [{ width: 1600, height: 1200 }], view: 'front', clear: true }));
In TypeScript configuration (tsconfig.js
) use "module": "CommonJS"
.
Development
Install dependencies
npm install
Requires Node.js >= 18. See FAQ if installation fails on the gl
package.
Build
rm -rf ./lib/ # For a clean build
npm run build
Build automatically on file save:
npm run watch
Test
npm run lint
npm run jest
Release
To release a new version of this package:
- Change version in
package.json
- Change version in
src/main.ts
(export const VERSION = ...
) - Run tests (will check if the versions match)
- Update
CHANGELOG.md
- Commit and push to
main
branch (use the version as the commit message, e.g.2.0.0
) - Create a git tag using semantic versioning (e.g.
2.0.0
); do not start the tag with "v" (e.g.v2.0.0
) - GitHub workflow will automatically publish npm package (https://www.npmjs.com/package/pdb-images)
- GitHub workflow will automatically publish Docker images (https://hub.docker.com/r/pdbegroup/pdb-images and dockerhub.ebi.ac.uk/pdbe/packages/pdb-images)
Citing
If you found PDBImages helpful, please cite:
Midlik A, Nair S, Anyango S, Deshpande M, Sehnal D, Varadi M, Velankar S (2023) PDBImages: a command-line tool for automated macromolecular structure visualization. Bioinformatics, 39(12), btad744. https://doi.org/10.1093/bioinformatics/btad744
FAQ
npm install
fails on thegl
package, printing something like:... npm ERR! gyp ERR! not ok ...
This is probably because some dependencies needed to build the
gl
package are missing and/or Python path is not set correctly. Try this:sudo apt-get install -y build-essential libxi-dev libglu1-mesa-dev libglew-dev pkg-config export NODE_GYP_FORCE_PYTHON=$(which python3)
or follow instructions here: https://www.npmjs.com/package/gl#system-dependencies
Installation completed successfully and running
pdb-images --help
works fine, but trying to run image generation gives an error like this:var ext = gl.getExtension('ANGLE_instanced_arrays'); TypeError: Cannot read properties of null (reading 'getExtension')
This will be thrown when X server is not available on the machine, which is a common situation in large computing infrastructures or cloud environments.
The easiest solution is to use
Xvfb
X server:sudo apt-get install xvfb xvfb-run --auto-servernum pdb-images 1ad5 data/output_1ad5/
This approach is used for the GitHub testing workflow (
sudo apt-get install xvfb && xvfb-run --auto-servernum npm run jest
). It is also used in the enclosed Dockerfile.The downside of this approach is that
Xvfb
is a purely software implementation and cannot use GPU (this information cannot be found in any official source but a bunch of people on StackOverflow say so), thus not allowing the full performance potential of PDBImages.Installation completed successfully and running
pdb-images --help
works fine, but trying to run image generation gives an error like this:ReferenceError: fetch is not defined
This is probably because you are using an older version of Node.js. Version 18 or higher is required to run PDBImages.
When you update Node.js, make sure to uninstall the PDBImages package and then install it again:
npm uninstall -g pdb-images npm install -g pdb-images
(use
-g
only if you install globally)