avro-schema-validator
v0.3.0
Published
A tool to infer AVRO schema's from JSON messages, and to validate it.
Downloads
124
Maintainers
Readme
AVRO-schemas
A repository to maintain AVRO schema's for use in the DRIVER+ test-bed.
Introduction
The test-bed is a message-based system, i.e. its functionality is based on exchanging messages between systems. Therefore, messages become a kind of contract between information producers and consumers. This is different from an API (Application Programmer Interface) based system such as REST, where the API is the contract.
Since messages are so important, the test-bed requires you to define an AVRO-based schema for each message that you want to send. These schema's will be stored in the test-bed's schema registry. As multiple producers may use the same topic, each message is a so-called KeyedMessage, where the key is set to the producer's ID, and the value is the actual message. In this way, we can easily filter messages based on producer's ID.
Registering a new message (schema)
In order to register a new message, therefore, we need two schema's, one for the key and one for the value. By convention, these schema's are named TOPIC_NAME-key
and TOPIC_NAME-value
, and need to be registered before actually sending messages with them. Fortunately, it is no problem to register them multiple times: in case the schema is already present, it will simply be ignored. Additionally, please note that all messages use the same key schema, based on EDXL DE, which can be found here. So you just need to copy it over to your TOPIC_NAME-key
file.
To register a new schema pair, you have several possibilities:
- Use the schema registry-ui by registering both schema's manually. The subject is
TOPIC_NAME-key
andTOPIC_NAME-value
, respectively. Please note that I've experienced schema rejections using the UI, which were accepted by the schema registry. - Use the postj utility.
- Use your adapter, which may contain functionality for it too.
Known schema's
In this repository, we store the known schema's, divided in two main folders:
core
: Contains the core schema's for use in the test-bed, e.g. for logging, configuration and heartbeat. These should always be registered in the schema registry. Also note that core schema's should be prefixed with 'connect-status-' in order to show as system topics. See this issue here.other
: Contains schema's that may be useful during a trial, depending on the systems that are connected.
Each folder is subdivided into folders per topic. The core
topics are preceded by an underscore by convention, as to indicate that we are dealing with system topics.
Conventions
- All messages are Keyed message, i.e. they contain a key (typically, the id of the service as string), and a value.
- Both key and value are encoded using the schema's, named TOPIC-key and TOPIC-value, respectively. Otherwise, the Kafka-REST service does not handle them correctly (for now).
- Type names are written in PascalCase with a leading capital
- Field names are written in camelCase with a small cap
- Default domain is
eu.driver.model.YYY
, where YYY is either core for the core messages, or something else otherwise.
Observations creating an AVRO schema
The AVRO schema is specified here. When creating a new schema from scratch based on an XML schema definition (XSD), most conversions are quite straightforward. However, there were some small issues that I observed while converting the CAP XML Schema definition to a CAP AVRO schema:
- The CAP XSD contains an
any
element, meaning that you can put anything you like in the CAP message. Besides the fact that I don't consider this a good idea, I also do not know how to encode this in AVRO. So for now, I have ignored it. - The CAP schema uses
xs:dateTime
, which is basically a string that can be validated using a regular expression pattern. In AVRO, such functionality seems to be missing, and I represented it using a string. Alternatively, we could use a LogicalType for this, but those must be defined in each adapter. - The CAP schema uses
minOccors = "0"
(optional element). I've converted this to an AVRO UnionType, for example when the type isxs:string
, it becomestype: ["null", "string"], default: null
. - The CAP schema uses
maxOccors = "unbounded"
(array). I've converted this to an AVRO UnionType, for example when the type isxs:string
, it becomestype: ["null", "string", { type: "array", items: "string" }], default: null
. So the element is optional (type is null and default is null), a simple string, or a string array. - The CAP schema contains many types: each type has been converted to an AVRO
enum
with symbols (e.g. forsimpleType
)), orrecord
(e.g. forcomplexType
). As a consequence, the AVRO schema may contain many types, in which case we need to specify the actual (top) type that we will use for validating/encoding/decoding messages.
Additional tooling
This repository also contains a simple JavaScript tool, which can be used to validate a schema against a JSON or XML message, and which can infer a schema too. You can install it using:
npm i -g avro-schema-validator
In case this does not work due to permission issues, you can also run it from a Linux VM or Docker (docker pull node
)
# After installing node.js using the package manager
$ sudo npm i -g avro-schema-validator --unsafe-perm=true --allow-root
avro-schema-validator, v0.2.0
MIT license.
A tool to infer AVRO schema's from JSON messages, and to validate it.
Use avro-schema-validator to infer an AVRO schema based on JSON or XML input,
or validate a JSON message against a schema.
In some cases, a valid JSON message may be considered invalid when wrapped
unions are used, e.g. when you have a property 'content', whose type is
['int', 'float'], in JSON you would need to wrap its value in order to
distinguish between an integer and a float. In case normal parsing fails,
it retries the validation using the -w option (wrapped).
Options
-h, --help Show help text
-f, --file string JSON or XML file that must be validated against a schema, or for
which we want to infer a schema.
-s, --schema string The schema file which is used to validate the JSON or XML file.
-o, --output string Override the default schema file name.
-w, --wrapped If set, use wrapped union types.
Examples
01. Infer a cap.avsc schema from the $ avro-schema-validator cap.json
cap.json file.
02. Infer a schema, and specify the output $ avro-schema-validator cap.json -o
file. mySchema.avsc
03. Validate a cap.json against the $ avro-schema-validator cap.json -s
cap.avsc schema. cap.avsc
04. Validate a wrapped cap.json against the $ avro-schema-validator -w cap.json -s
cap.avsc schema. cap.avsc