@charlescol/schema-manager
v1.0.5
Published
![npm version](https://img.shields.io/npm/v/@charlescol/schema-manager) ![Build Status](https://github.com/charlescol/schema-manager/actions/workflows/npm-publish.yml/badge.svg) ![npm downloads](https://img.shields.io/npm/dm/@charlescol/schema-manager) ![
Downloads
103
Readme
Schema Manager
Automating Schema Versioning and Dependency Management
Why Schema Manager?
- Centralized Schema Management: Unify all your schema files (Avro, Protobuf, JSON) in one place.
- Automated Registration and Versioning: Automatically handles dependencies and publishes schemas in the correct order.
- Flexible and Configurable: Works with Confluent Schema Registry and is extendable to other registries.
- Quick Start Demo: Try the example here in just a few minutes.
Give it a try to see how Schema Manager simplifies schema management for distributed services!
Introduction
In modern microservices architectures, separating concerns is critical for scalability and maintainability. Managing schema files (e.g., Avro, Protobuf, JSON) across services can become complex and error-prone, especially when each microservice is responsible for publishing schemas to a schema registry (e.g., Confluent Schema Registry).
Schema Manager solves this by centralizing schema management and delegation. Instead of allowing microservices to handle schema publication directly, Schema Manager automates the versioning, dependency resolution, and registration of schemas in a centralized repository. This approach keeps microservices lightweight, while Schema Manager handles all the complexity of schema registration and lifecycle management.
This is important to note that Schema Manager is not intended to replace the schema Registry. Instead, it acts as a management layer that centralizes schemas in a single repository and automates their publication to the target registry.
By enforcing this separation of concerns, Schema Manager simplifies schema management across services, making the system more scalable, consistent, and reliable.
An example of integration for Schema Manager in managing all the schemas in a Kafka-oriented application involving multiple microservices can be found in the Example of Integration with Schema Manager section.
Quick Start
Try the example here in just a few minutes
git clone https://github.com/charlescol/schema-manager-example.git
The Schema Manager is distributed via NPM:
npm install @charlescol/schema-manager
After installation, organize your schema files in versioned directories and create a versions.json file to map versions. Then run Schema Manager to automatically register your schemas with the schema registry.
Key Features
- Centralized Schema Management: Maintain all schema files (e.g., .proto, .avsc) in one repository, with versioning handled through a structured approach.
- Automated Registration: Automatically register schema file in the schema registry in the correct order based on their dependencies.
- Version Control: Use
versions.json
files to manage schema versions and ensure consistency across services. - Dependency Resolution: Schema Manager parses the schema files to detect import and package statements, building a dependency graph. It automatically resolves dependencies using topological sorting to ensure schemas are registered in the correct order, with dependent schemas processed first.
- Configurable for Confluent Schema Registry: Seamless integration with Confluent Schema Registry, easily extendable to other registries.
- Error Handling and Logging: It includes error handling for unresolved imports, cyclic dependencies, and failed schema registrations. It logs detailed error messages to help you quickly identify and fix issues.
Available Parsers
| Parser | Class Name | Supported Formats | Description |
| ------------ | ---------------- | --------------------- | ------------------------------------------------------------------------------------------- |
| Avro | AvroParser
| .avro
, .avsc
| Parses Avro schema files, supports extracting dependencies from .avro
and .avsc
files. |
| Protobuf | ProtobufParser
| .proto
| Parses Protobuf schema files, identifies package names and imports to resolve dependencies. |
Please refer to the Parser Documentation for more details on how to create a parser.
Available Registries
| Registry | Class Name | Supported Registries |
| ----------------------------- | ------------------- | ------------------------ |
| Confluent Schema Registry | ConfluentRegistry
| Confluent Kafka |
Please refer to the Registry Documentation for more details on how to create a registry.
Manager Parameters
Below the parameters that can be passed to the Manager
constructor.
| Parameter | Type | Description | Required |
| -------------------------- | -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | -------- |
| schemaRegistry
| AbstractRegistry
| The schema registry to use for registering schemas. | Yes |
| parser
| AbstractParser
| The parser to use for parsing schema files. | Yes |
| dependencyResolutionMode
| DependencyResolutionMode
| The dependency resolution mode to use. Defaults to IMPLICIT
. Set to EXPLICIT
to enforce each file's inclusion in only one version. | No |
Registry Parameters
Below the parameters that can be passed to the AbstractRegistry
constructor.
| Parameter | Type | Description | Required |
| ------------------- | ------------------------- | --------------------------------------------------- | -------- |
| schemaRegistryUrl
| string
| The URL of the schema registry. | Yes |
| headers
| Record<string, unknown>
| Additional headers to include in requests. | No |
| body
| Record<string, unknown>
| Additional body parameters to include in requests. | No |
| queryParams
| Record<string, unknown>
| Additional query parameters to include in requests. | No |
How It Works
Organize Your Schemas:
- Place your files in versioned directories (e.g., v1, v2) and map them in
versions.json
. Please consider the formatting considerations when importing files for Protobuf schemas.
- Place your files in versioned directories (e.g., v1, v2) and map them in
Run Schema Manager:
- Schema Manager parses .proto files to detect import and package statements. Dependencies are resolved using topological sorting to ensure schemas are registered in the correct order.
Automated Registration:
- Once dependencies are resolved, Schema Manager registers the schemas with the schema registry in the correct order.
Scenario Example
Consider a system managing a set of Protobuf schemas for an event-driven architecture. Each schema has multiple versions and dependencies.
Note: Multiple examples, including those mentioned here with different schema types (Protobuf, Avro, etc.), are located within the repository at ./examples
. These examples showcase various use cases and help demonstrate how Schema Manager resolves dependencies and registers schemas for different schema formats. You can explore these to better understand how to set up your own schema management workflow.
In this example, we have one topic "topic1" as well as a common namespace. Our topic contains multiple versions of schema files, with version mapping handled through versions.json files.
File Structure:
example-schemas/
├── topic1/
│ ├── v1/
│ │ ├── data.proto # Schema for v1 data of topic1
│ │ └── model.proto # Schema for v1 model (depends on topic1/v1/data.proto)
│ ├── v2/
│ │ └── data.proto # Schema for v2 data (depends on ./common/v1/entity.proto)
│ └── versions.json # Version mapping for topic1 (v1 and v2)
├── topic2/
│ ├── v1/
│ │ └── data2.proto # Schema for v1 data2 of topic2 (depends on ./common/v1/entity.proto)
│ └── versions.json # Version mapping for topic2 (v1)
├── common/
│ ├── v1/
│ │ └── entity.proto # Schema for test entity
It is important to note that the schema manager handles implicit file import. This means that two different schemas will be published for model.proto (topic1/v1 and topic1/v2) and two different schemas will be published for entity.proto (topic1/v1 and topic2/v1) although they are the same file.
Note: You can force the file to be published only once by using the EXPLICIT mode, in this example this would raise an error because the model.proto and entity.proto files are imported twice.
In our case adding a versions.json in the common directory is not necessary because common directory are not meant to be registered as separate subjects in the schema registry. Instead, they serve as shared dependencies that are imported by other schemas located in the service-specific directories.
versions.json for topic1:
{
"v1": {
"data": "v1/data.proto",
"model": "v1/model.proto"
},
"v2": {
"data": "v2/data.proto",
"model": "v1/model.proto",
"entity": "../common/v1/entity.proto"
}
}
versions.json for topic2:
{
"v1": {
"data2": "v1/data2.proto",
"entity": "../common/v1/entity.proto"
}
}
Schema Manager supports both standard version numbers (e.g., v1, v2) and custom version strings (e.g., v1.0, alpha, beta). This allows flexibility in version naming, while Schema Manager handles versioning and dependency resolution across topics and versions
Note: A single versions.json could have been used to manage all the topics in a centralized way. Additionally, Schema Manager supports multiple versions.json files within the same directory.
Note2: The schema name must be unique across a given version of a topic (this include all the dependencies referenced in the versions.json file for the version).
Schema Registration Order:
- Step 1:
entity.proto
intopic1/v2
is registered becausedata.proto
depends on it. - Step 1:
entity.proto
intopic2/v1
is registered becausedata2.proto
depends on it. - Step 3:
data.proto
intopic1/v1
is registered becausemodel.proto
depends on it. - Step 2:
data.proto
fromtopic1/v2
, is registered becausemodel.proto
depends on it. - Step 5: Once
data.proto
intopic1/v1
is registered,model.proto
fromtopic1/v1
can be registered. - Step 6: Finally,
model.proto
fromtopic1/v2
, which depends ondata.proto
fromv2
Usage Example
import { ConfluentRegistry, Manager, ProtobufParser } from '@charlescol/schema-manager';
import * as path from 'path';
const baseDirectory = path.resolve(__dirname, '../schemas'); // Path to the directory containing your schemas
const registry = new ConfluentRegistry({
schemaRegistryUrl: SCHEMA_REGISTRY_URL,
// Below part is optional, used to override queries to the schema registry
body: {
compatibilityGroup: 'application.major.version',
},
queryParams: {
normalize: true,
},
headers: {
'Content-Type': 'application/vnd.schemaregistry.v1+json', // Default value is application/json
},
});
async function main() {
// create a manager and load all schemas
const manager = new Manager({
schemaRegistry: registry,
parser: new ProtobufParser(),
});
await manager.loadAll(baseDirectory, subjectBuilder);
}
// Function to provide, used to build the subject for each schema file.
// This is an example implementation, you can customize it based on your own versioning and naming rules.
function subjectBuilder(fullVersionPath: string, filepath: string): string {
// Extract topic and version
const [topic, version] = fullVersionPath.split('/');
// Extract the filename without extension
const filename = filepath.split('/').pop()?.split('.')[0] || '';
// Return the constructed subject
return `${topic}.${filename}.${version}`;
}
main().catch((error) => {
console.error('Error registering schemas:', error);
process.exit(1);
});
The subjectBuilder
function is responsible for generating the subject name for each schema that is registered in the schema registry. The subject is a unique identifier used by the registry to track schema versions and manage updates. The function takes two parameters:
- fullVersionPath:
string
: The path to the versions.json directory with the version name (key in the json) {pathToVersion}/{versionName} (e.g.,topic1/v1
). - filepath:
string
: This is the relative file path of the schema file (e.g.,example-schemas/v1/model.proto
).
The function above generates the following subject names for the topic1 (fullVersionPath
, filepath
→ generated subject
):
topic1/v1
,topic1/v1/data.proto
→topic1.data.v1
topic1/v1
,topic1/v1/model.proto
→topic1.model.v1
topic1/v2
,topic1/v2/data.proto
→topic1.data.v2
topic1/v2
,topic1/v1/model.proto
→topic1.model.v2
topic1/v2
,common/v1/entity.proto
→topic1.entity.v2
If the file above is saved as publish-schemas.ts
, you can run it with the following command to compile and execute it:
tsc && node dist/publish-schemas.js
Formatting considerations for schemas
In Protobuf files, the import statement should reference only the file name and not the full file path. This is because dependency resolution is managed within the versions.json file, which allows the schema manager to dynamically assign the correct versioned dependencies for each import.
The schema manager supports an implicit import mechanism, enabling the same file to be imported in multiple versions without conflict. This flexibility allows each version of a schema to maintain its own set of dependencies, even if those dependencies differ across versions.
For instance, if a file is used in multiple schema versions with different dependencies in each, the import must not rely on a static dependency path. Instead, each version will resolve dependencies according to its specific versions.json configuration.
Future Plans and Roadmap
We plan to extend Schema Manager to support:
- Support for other schema registries beyond Confluent Schema Registry.
- Addition of bigger set of supported formats, alongside the existing parsers.
- A command-line interface (CLI) to manage schemas and visualize dependencies more easily.
Example of Integration with Schema Manager
A common use case for Schema Manager is managing all the schemas in a Kafka-oriented application involving multiple microservices.
By using a centralized schema registry, you eliminate the need for each microservice to manage schemas independently or duplicate schema code across the services. Instead, each microservice only retrieves the schemas it needs from the centralized registry.
The centralized schema repository is stored in a dedicated repository, which includes an NPM integration to include the Schema Manager. A small script (like the one provided in the Scenario Example section) is used to automatically register and update the schemas. Whenever a change is detected in the versions.json
file within the schema directory, this can trigger a new build and schema registration, typically through a CI/CD pipeline.
Workflow Example:
- Centralized Schema Management: The schema repository is versioned and stored in a central repository. Any changes to the schemas (tracked in
versions.json
) will trigger a new schema build and registration in Confluent Schema Registry. - Microservice Schema Consumption: Each microservice maintains a reference to the schemas it uses. For example, a
schemas.json
file located at the root of each microservice contains a list of schema subjects used by that service. - Schema Retrieval and Code Generation: The
schemas.json
file is used to retrieve the latest version of each schema from the Confluent Schema Registry. The schema code is then generated and can be used for development purpose. - Serialization: Tools like
kafka-protobuf-serializer
(for Java) or@kafkajs/confluent-schema-registry
(for JavaScript) can be used to ensure that the data is serialized using the latest version of the schema, as retrieved from the registry regardless of the entity generated in the previous step.
Benefits:
- Simplified Schema Management: All schemas are managed in one place, avoiding duplication and inconsistencies across services.
- Automation and Consistency: CI/CD integration ensures that schema updates are automatically built and registered.
- Versioning and Compatibility: Each microservice always has access to the latest version of the schemas, while schema changes can be version-controlled and managed centrally.
Contributing
Contributions are welcome! If you have any suggestions or improvements, please open an issue or submit a pull request. To contribute to this project, please refer to the Contributing Guide and the how-to directory.
License
This project is licensed under the MIT License – see the LICENSE file for more details.