@gros/visualization-site

v0.0.1

Published

5 months ago

Visualization hub.

Downloads

0High
0Medium
0Low

lhelwerda

GROS visualization proxy and site

This repository contains configuration files and static web files for the visualization hub of the GROS project. The proxy is intended to provide access to a Ghost blog and Discourse forum if they are deployed within the environment, as well as to the visualization reports for the organization, which were generated using branch-based builds on a Jenkins instance.

The repository needs to be configured to work within a specific environment. Two stages of reverse proxies are used to safely provide access to the GROS webservers (separate Docker instances, a Jenkins setup and/or direct access). The reverse proxies are as follows:

Caddy for transparent proxy access from a BigBoat dashboard. Several subinstances handle specific domain names. This is only used in an environment where access has to be routed through several VLANs.
NGINX for proxy access and/or static file hosting from a central server listening on specific ports. It can also use Host-based proxying. Complete, direct hosting is possible but may require additional, separate server configuration.
Apache as an alternative to NGINX, for proxy access and/or static file hosting from a central server listening on specific ports. It can also use Host-based proxying. Complete, direct hosting is possible but may require additional, separate server configuration.

Many of the configuration files and web files use Mustache in order to use configuration items and functions to adjust to other environments.

Additionally, this repository contains a Shell script goaccess-report.sh that can be run periodically to generate a server statistics report. It requires installation of GoAccess which analyzes logs and creates an analytics dashboard.

Separate documentation exists for more details on how the second proxy layer works.

Dependencies

While the build of the visualization site may work with simply having Docker installed, the test and production environments will need a more full-fledged installation of other dependencies.

For the tests, a Jenkins installation is assumed with proper OpenJDK Java 8+ and Python 3.6+ (including distutils and virtualenv). The agent that performs the tests must have Docker Compose V2. In addition, tools such as Bash, Git, jq, awk, sed, grep and xargs must be available and GNU-like (POSIX-compliant may not be enough). A SonarQube Scanner must be registered in Jenkins. The server agent that performs the publishing must have Docker Compose V2, Node.js, curl and rsync as well. Details for configuring Jenkins servers and agents are outside the scope of this documentation, although some details may be available in other GROS documentation. The integration tests may be able to be run outside a Jenkins job, but support for this is limited and it requires passing additional environment variables.

The deployment assumes an NGINX or Apache service and potentially a Docker Compose installation some which can be set up using (some of) the configuration files that we build from the templates in this repository.

Tests

This repository contains integration tests for the visualizations. The entire visualization proxy layer setup is emulated within a docker-compose network, with each deployed visualization within its own Docker instance running its Node.js development web service. A Caddy proxy pretends to be the Jenkins server with access to the HTML reports and artifacts.

The tests themselves are run with a Selenium instance using a Chromium browser, which is being controlled by a Python runner. The tests are in the form of Python unit tests which instruct the Selenium instance to navigate to pages and perform actions such as clicking. The unit tests then check if certain elements are visible, contain certain text, or have other properties, and the test passes if this is the case.

The visualizations use their test builds which contain coverage tracking, which is extracted and combined after the test completes. In addition to coverage, Sonar scans check for code smells, and a dependency check searches for vulnerabilities.

At the end of each test, more actions take place. Logs are stored and a screenshot is made of the page, so that there is a visual reference of the state of the page in case a test fails. Finally, an accessibility testing engine is used to verify if the contents of the page conform to various WCAG rules. All of these results are combined as well into a report.

Configuration

The visualization dashboard, connections to other servers as well as common elements such as a navigation bar, are all configured in this repository. The main configuration point is usually the config.json file, which can be copied from lib/config.json to the root of the repository in order to make local adjustments. The following configuration items (all strings, unless noted otherwise) are known:

visualization_url: The URL to the visualizations. This may include a protocol and domain name, but does not need to in case all the resources are hosted on the same domain. The remainder is a path to the root of the visualizations, where the dashboard is found and every visualization (except predictions) has sub-paths below it.
prediction_url: The URL to the prediction site. This may include a protocol and domain name, but does not need to in case all the resources are hosted on the same domain. The remainder is a path to the root of the predictions (without any specific paths below it).
base_url: The absolute URL to the base website where other static resources are located. This is used to link to the base website in the navigation bar and to make certain URLs absolute like schema identifiers and canonical URLs. If it is left empty, then some of those URLs will not have a protocol and domain name or use the current location as base.
blog_url: The URL to the blog. This may include a protocol and domain name, but does not need to in case all the resources are hosted on the same domain. The remainder is a path to the root of the blog. If this is empty, then no blog is available.
discussion_url: The URL to the discussion forum. This may include a protocol and domain name, but does not need to in case all the resources are hosted on the same domain. The remainder is a path to the root of the forum. If this is empty, then no forum is available.
download_url: The URL to default download links on the dashboard. This can be overwritten by specific download URLs by the visualizations and an altered value may work with the redirections provided by the NGINX or Apache proxy, so care should be taken when adjusting this. This may be helpful when direct access to downloads are available. If this is empty, then no download links are shown on the dashboard.
jira_url: URL that is used in some navigation bars to link to a Jira instance.
blog_host: Domain name of an internal server where the blog is hosted.
blog_server: Domain name of an external server that provides access to the blog. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the server available.
discussion_host: Domain name of an internal server where the discussion forum is hosted.
discussion_server: Domain name of an external server that provides access to the discussion forum. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the server available.
visualization_server: Domain name of an external server that provides access to the visualization dashboard and every visualization (except predictions). This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the server available.
www_server: Domain name of an external server that listens on a "www" address. This server redirects to the visualization server. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the redirection service available.
prediction_server: Domain name of an external server that provides access to the predictions. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the server available.
hub_organizations (array): When multiple organizations are hosted in the same environment, an array of objects containing organizations, navigation data and branch names can be added to make them available cross-builds. For tests, they provide a build and serve dummy data for visualizations on multiple branches. The test routes only consider the visualization-site and prediction-site as keys in each object, and their values should provide a branch name that the visualization_branch and prediction_branch may respectively point to for NGINX testing, or the branch names generated using hub_mapping for Apache testing. The organizations objects in this array also define how items in a navigation bar dropdown should appear, with titles, content/alternative text locales, images (plus width/height/style attributes) and external URLs. Finally, a key visualizations may have an array value with the visualizations that the organization's hub hosts.
hub_regex: When multiple organizations are hosted in the same environment, a regular expression can be used to match the organization name which must occur at the start of the path, and place the matched parts into variables for later use in rewrite rules of NGINX or Apache. Use (?<groupname>...) for capturing matches for compatibility across proxy servers. To avoid renumbering issues, this regular expression should not contain unnamed match captures (...).
hub_mapping (object): Groups of environment variable names to replace the matched substrings from the regular expression from hub_regex in. Only used for Apache, when hosting multiple organizations in the same environment. Group keys can be "hub", "visualization" and "prediction", corresponding to the visualization-site itself, the visualizations and the prediction-site, respectively. Each variable within the group has an object with "input", "default", and "output" keys, which are used to build a RewriteMap. The input and default can refer to other variables with dollar signs. The output mapping cannot contain spaces or empty strings in both the keys and values. In addition to the hub_regex variables, the "prediction" group may also transform a branch_organization variable to map paths to organizations in (combined) prediction setups. The mapping is also used for determining the available prediction path setups on the server for the OpenAPI specification, and for copying the published visualizations on master branches to the proper default organizations. In NGINX, one can instead alter variables captured from regular expressions using hub_branch, visualization_branch and prediction_branch.
branch_maps_path: Filesystem path where the rewrite maps for the Apache configuration are expected to reside. The map files are written to httpd/maps in this repository, but may be mapped to another absolute path. This affects the path within the Docker instance during the docker-compose paths, but can also be helpful when using (portions of) the configuration on an Apache server.
hub_redirect: When multiple organizations are hosted in the same environment, variables from a matched organization at the start of the path using hub_regex can be used in a rewrite that redirects to another URL. It is assumed that this configuration value produces an absolute URL. For NGINX, use dollar-sign variables; for Apache, use environment variables named with hub_mapping with %{...} syntax.
hub_branch: Inject some processing steps in the NGINX configuration for the visualizations hub. This should at least determine the branch of a Jenkins build to use for the visualization site. When multiple organizations are hosted in the same environment, variables from a matched organization at the start of the path using hub_regex can be used in further processing.
visualization_branch: Inject some processing steps in the NGINX configuration for a visualization. This should at least determine the branch of a Jenkins build to use for a visualization. When multiple organizations are hosted in the same environment, variables from a matched organization at the start of the path using hub_regex can be used in further processing.
prediction_branch: Inject some processing steps in the NGINX configuration for the prediction site. This should at least determine the branch of a Jenkins build to use for the visualization or prediction site. When multiple organizations are hosted in the same environment, variables from a matched organization at the start of the path using hub_regex can be used in further processing.
jenkins_host: Domain name of an internal server where the Jenkins build system is hosted.
jenkins_path: Path that the Jenkins build system is hosted below. This is useful in situations where Jenkins is hosted on the same domain as other resources and is therefore within a path rather than directly at the root of a domain.
jenkins_direct: Filesystem path of a location where a copy of the visualizations and predictions are available. If this is not empty, the structure within the path must follow that which the copy.sh script makes and supported requests are internally rewritten to this path. If this is empty, then requests for visualizations and predictions are proxied to the Jenkins server.
jenkins_direct_url: URL through which the Jenkins server is available from the location of the build (a Jenkins node with the 'publish' tag, most likely the server itself), when the visualizations and predictions are copied.
jenkins_direct_cert: Filesystem path to a certificate used for validating a HTTPS connection to the Jenkins server, when the visualizations and predictions are copied (for branch information).
jenkins_api_token: Encrypted token that can be used for basic authorization against the Jenkins API for at least branch details of builds.
files_host: Domain name of an internal server where an ownCloud instance is hosted.
files_share_id: Identifier of a published share on an ownCloud instance with files that are made available in addition to the prediction resources. If this is an empty string, then the paths to the files list and specific files are not passed through to ownCloud from the NGINX or Apache proxy.
control_host: Domain name of an internal server where secure resources are hosted, including encryption services and access control checks. This domain must be accessible through HTTPS from the NGINX or Apache proxy. This can be set to an empty string to disable proxy connections to endpoints for encrypting names of developers and checking project group access, as used in some visualizations. An empty string also makes the Caddy proxy configuration unusable, so only leave this empty if a portion of the configuration is used.
websocket_server: Domain name of an external server where a WebSocket for real-time updates of access log analytics is hosted. This name should point (possibly via a Caddy proxy) toward the NGINX or Apache proxy that makes the WebSocket service available (via the GoAccess script).
proxy_nginx (boolean): Whether to use NGINX to provide access to the other servers or host the direct files. If set to false, use Apache HTTP Server instead. Affects the test as well as which configuration is generated.
proxy_range: CIDR range of trusted IP addresses that may host the first layer of proxies in front of the NGINX or Apache proxy, for example the Caddy proxies. Requests from these addresses may provide headers with the real IP address of the original request, which are used instead of the proxy's IP address.
proxy_port_in_redirect (boolean): Whether to make the NGINX or Apache proxy specify the port number when a redirect is generated. If a Caddy proxy is in front of it, then this is most likely not wanted. Similarly, if only portions of the configuration files are used, then this is also not useful to enable.
auth_cert: Filesystem path to a certificate used for validating the HTTPS connection to the control_host. If a $SERVER_CERTIFICATE environment is not set, then this is also used as the path on the Docker host machine during the docker-compose tests to provide this path within the Docker instance.
allow_range (array): CIDR ranges of IP addresses that are allowed to access the access log analytics.
goaccess_path: Filesystem path to the location where the GoAccess interface is hosted. The directory should contain an analytics subpath which holds the actual web interface. This can be set to an empty string to disable the GoAccess endpoint.
goaccess_log_path: Filesystem path to log file location. Access logs stored in this path (including rotated and possibly GZip-encoded logs) are followed.
swagger_openapi_url: URL prefix to use for the OpenAPI specification files that are made available in the Swagger UI. The default value of ./ works well for the Docker instance, but if the OpenAPI files are meant to be stored elsewhere then another path is possible. For production builds of a static file hosting setup with jenkins_direct, this value is set to the absolute path of the direct hosting root, and additional OpenAPI files rather than just the prediction API are made available.
swagger_validator_url: URL to use to connect to the validator. The default value of /validator works well for the Docker compose network. If it is set to an empty string, then the online validator is used. Can be set to none to disable the validator. For production builds of a static file hosting setup with jenkins_direct, we use the online validator.

Configuration items that have keys ending in _url may be processed to direct toward an organization-specific path, in case multiple organizations are hosted in the same environment. The value is searched for the substring $organization, possibly after slashes. These can be replaced with the actual organization that the build is for. In some cases, it is removed only to allow NGINX or Apache rules to add a supported variant of it in front of the path using hub_regex, and in other cases, it may be preserved to be adjusted at a later moment within the visualization or hub. Specific environment variables determine how the substring $organization is replaced.

Note that configuration items that have keys ending in _host or _server may be set to "fake" values when only a portion of the final NGINX or Apache configuration is actually used, for example when some resources are not made available. The configuration values should still be set to valid domain names so that they can be used within the docker-compose network during the tests.

Environment variables

Several environment variables may be used when calling npm run or the Shell scripts, and may affect various contexts, such as resource paths in the HTML pages, the JavaScript-based navigation bar, the proxy server configuration and the test environment. Boolean environment variables must be set to true in order to take effect.

$VISUALIZATION_ORGANIZATION: Determines the organization to use within the URLs defined in the configuration.
$VISUALIZATION_COMBINED (boolean): Determines whether to replace the organization within the URL with /combined.
$NAVBAR_SCOPE: Determines the navigation bar override file to use. If the file navbar.$NAVBAR_SCOPE.js exists, then it is loaded in place of the generated navbar.json file. See files for more details.
$VISUALIZATION_ANONYMIZED (boolean): Determines whether to show a notification on the HTML page indicating that the visualizations contain anonymized data and that some functionality is limited or missing.
$VISUALIZATION_SITE_CONFIGURATION: Path to the main configuration file, defaulting to config.json. If this file does not exist, then lib/config.json is used.
$PREDICTION_CONFIGURATION: Path to the configuration file of the prediction-site visualization, which is then used during tests. By default, the prediction's default configuration is used, which may be incompatible in some complex combined/organization setups. Other visualizations are tested with default configuration.
$VISUALIZATION_NAMES: Optional space-separated list of repository names of visualizations (excluding visualization-site itself) that should be included in the navigation bar, proxy configuration, test environment setup, and publishing phase. By default, all visualizations listed in the file visualizations.json are considered.
$VISUALIZATION_MAX_SECONDS (integer): Number of seconds to wait before are visualizations are compiled and made available during the test setup. By default, 60 seconds is allocated, which may be too short for some nodes.
$REPO_ROOT: Directory to store the Git repositories of the visualizations during the test setup. Relative to the current directory. By default, repos is created as a subdirectory. If another directory is used, then existing clones are left as-is with no up-to-dateness checks and dependencies are backed up so that they do not interfere with the test setup.
$SERVER_CERTIFICATE: Path to the HTTPS certificate to use in the test setup when requesting upstream resources for encryption and access checks. By default, this is the auth_cert from the configuration, but this can be set to use a different path at the host device than in the Docker test setup.
$PUBLISH_PRODUCTION: A boolean variable which indicates whether to perform a copy for potential publication from the current feature branch instead of only doing so when on a main branch. Note that main branches are skipped in this case, so the publication may not have organization-specific paths.
$JENKINS_HOME: Provided by Jenkins and used by the copy.sh script as the root from which to collect artifacts and visualization HTML reports for publishing.
$BRANCH_NAME: Provided by Jenkins Multibranch Pipeline and used by the test environment in order to separate Docker resources when building dependencies for visualizations under test.
$BUILD_NUMBER, $BUILD_TAG, $BUILD_URL, $NODE_NAME: Provided by Jenkins and used by the test environment in order to track build context for reporting and naming purposes.

Files

Whereas the configuration file, usually located at config.json, is likely necessary to be copied (from lib/config.json) and changed, there are other files within the repository that can be modified to adjust what is available on the visualization site. We will briefly introduce these files, as adjustments should be considered more like code changes.

The navigation bar is configured in navbar.json.mustache, with optional contextual overrides in navbar.$NAVBAR_SCOPE.js. The format is defined in the @gros/visualization-ui package for the Navbar class, with the addition that URLs can have $organization substrings replaced, and various Mustache operations can take place to fill the navigation bar with menus/links defined elsewhere, such as visualizations and organizations. navbar.$NAVBAR_SCOPE.js should, if used, refer to navbar.json and use simple JavaScript operations to augment it, though the use of this file may not be necessary when specifics for an organization can be set by changing other files, such as config.json and visualizations.json.

The available visualizations are configured in visualizations.json. The structure is based on the layout of the dashboard, but the items defined in it also determine which visualizations are actually made available in the navigation bar, proxy server configuration as well as within the tests. The JSON object is used as a Mustache structure within the index template and the navigation bar, and so they may contain Mustache items to refer to certain configuration items, such as URLs.

Localization of the visualization site is in lib/locales.json. The messages in it are used when referred from JavaScript code using the Locales class from the @gros/visualization-ui package, or when using the data-message attribute within the HTML templates.

Running

The visualization can be built using Node.js and npm by running npm install and then either npm run watch to start a development server that also refreshes browsers upon code changes, or npm run production to create a minimized bundle. The resulting HTML, CSS and JavaScript is made available in the www directory.

As mentioned in the dependencies, this repository also contains a Dockerfile specification for a Docker image that can perform the installation of the app and dependencies, which allows building the visualization site within there, removing the need for a global Node.js installation. The Jenkinsfile contains appropriate steps for a Jenkins CI deployment, including the tests, visualization building and publishing.

During the build, JavaScript in order to construct a common navigation bar is separated from the main JavaScript bundle in vendor.js. The visualizations and prediction site also refer to this file to display the navigation bar.

A non-static production environment can make use of the generated proxy server configuration in order to deploy the reverse proxy layer(s) that allow access to all the visualizations and other resources. Optionally, the caddy docker compose file can be set up. Next, either the main nginx.conf plus the files generated in the nginx directory can be supplied to the NGINX service, or likewise with httpd.confg and the generated files in the httpd directory for the Apache service. Alternatively, a subset of these files may be used, for example to host all under one domain, with additional local configuration.

Documentation based on JSON schema specification files from various GROS repositories can be built with the doc directory. This requires installing the some Python dependencies, preferably first setting up a virtualenv, then running pip install -r doc/requirements.txt. The file doc/source/conf.py contains configuration for the documentation, including paths or URLs to the schema files, which can be altered to use local paths. When run on the Jenkins server to create static documentation, these paths may be defined through the doc.sh script which locates the JSON schemas as part of archives and modules available in the current workspace and beyond.

A static production environment can make use of the result of the Bash script copy.sh when run on the Jenkins server in order to create a document root directory with the organizational hubs, all the visualizations, JSON schemas, OpenAPI specifications and Swagger UI available for publishing on a static file hosting server. This server may make use of the NGINX or Apache configurations, which in their compiled form properly route the prediction site and error pages, among others. Like other parts, the copy.sh script requires specific configuration for mapping hub paths, selecting organizations, producing paths and URLs and accessing Jenkins for the publishable visualizations, archived files and prediction branches.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme