npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

rechtspraak-nl

v0.9.5

Published

Utility functions for consuming Rechtspraak.nl open data for Dutch court judgments and generating well-formed linked data

Downloads

108

Readme

Rechtspraak.js

Build Statusnpm versionLicenseCode Climate

This library sanitizes and formalizes the data model for Dutch court judgments published by Rechtspraak.nl. The process is performed on all documents publicly offered by Rechtspraak.nl and are published as a linked data graph (well-formed JSON-LD with a JSON Schema).

Why?

Rechtspraak.nl publishes information about a lot of Dutch court judgments with a rich collection of metadata. Sadly, the data model is ill-described and rife with syntactical errors. Rechtspraak.nl provides no schema for its documents other than an incomplete PDF in natural language and a lot of RDF fields are invalid. It's hard to know what to expect when downloading a document, especially for some of the more esoteric metadata fields.

The purpose of this project is to formalize the data model of Rechtspraak.nl. I have done this by analyzing all existing documents (~2 million) on Rechtspraak.nl to generate a JSON Schema and Typescript typings for the metadata associated with the court judgments. I have corrected some common errors in the source files (mostly to do with not properly encoding URIs) and generate valid JSON-LD (which is compatible with RDF) from them.

This work is a tangible step forward towards machine readable legal data, hence the ease of automated processing (and so the findability and possibility to do data mining) of these documents is improved.

Data

A dump of the sanitized metadata is available at https://rechtspraak.lawreader.nl/_all. This URL will load the complete knowledge graph of Rechtspraak.nl. This page returns a JSON document of about 20 gigabyes in size. The document has two fields:

  1. @context, which provides the URI mappings for the concepts, and
  2. @graph, which is an array filled with the actual data (see JSON-LD specification for more information)

I recommend using a streaming JSON parser like Oboe.js to consume the data.

For accessing subsets of the knowledge graph, you can use most of the API from CouchDB views, ie: https://rechtspraak.lawreader.nl/_all?limit=100&skip=50 will limit your request to 100 docs after the first 50. Mind that you can also use startkey to paginate faster: _all?startkey="ECLI:NL:CBB:2015:5"&limit=50 will fetch the first 50 docs starting at ECLI:NL:CBB:2015:5. Documents are ordered alphabetically by their ids ([European Case Law Identifier)[https://en.wikipedia.org/wiki/European_Case_Law_Identifier]).

I try to stick to the vocabularies used in the source documents (dcterms, and some from the Dutch government), and also introduce relevant fields from schema.org. I've invented my own URIs where appropriate. In time I'm planning to make all of my own URIs resolvable as well.

Tip: use a tool like JSON-LD playground to visualise the data.

Types

Code is written in Typescript, compiled project supplies d.ts typing files along with the Javascript code.

JSON Schema

~ ~ I'm still working on converting the Typescript interface to JSON Schema ~ ~ (for the impatient, look for source files to generate the JSON-LD document)

Rechtspraak.nl metadata gotchas

Here is a list of some of the syntactical errors I encountered in the data offering for Rechtspraak.nl, which are sanitized in this work.

  • Some dcterms:type triples don't have a resourceIdentifier, e.g. ECLI:NL:RBMNE:2016:1637: <dcterms:type rdf:language="nl" resourceIdentifier="">Uitspraak</dcterms:type>
  • Some docs miss .nl in the URI; eg ECLI:NL:CBB:2002:AD9059: psi:type="http://psi.rechtspraak/conclusie"
  • Many URIs aren't encoded properly, most notably the "gevolg" URIs: eg. http://psi.rechtspraak.nl/gevolg#(Gedeeltelijke) vernietiging en zelf afgedaan. Considering the official URI specification, spaces are illegal in URIs.
    • This also applies to some references, eg. in http://data.rechtspraak.nl/uitspraken/content?id=ECLI:NL:HR:1992:AA2957: 1.0:v:BWB:BWBV0001506&artikel=7 (oud)&g=1992-12-23
    • Most dramatically, the URI http://psi.rechtspraak.nl/procedure#&#xA;tussenbeschikking&#xA contains line feeds (see ECLI:NL:RBMNE:2016:1780)

Some issues derived from an earlier report:

  • In general, the W3C RDF validator crashes on input documents

  • The subject of a triple is not always clear. There are two dcterms:modified properties described, and it is unclear which one refers to the date on which the document was modified and which one to the date on which the metadata was modified.

  • Values are usually not typed, for example in the case of dates.

  • Resource identifiers are not always used, when they easily can be. An example is the dcterms:coverage property. This might not seem important, such as in the case of dcterms:accessRights, which is fixed to the string literal public. But RDF processors typically do not treat two equal strings literals as the same concept: URIs are used for that. (Also, properties in the Dublin Core normally define a range which usually imply URIs.)

  • There are some ECLI identifiers that turn up when searching for documents that have a body, but actually do not have a body. Encountered are:

  • Property-specific issues:

    • dcterms:references prefixes the resourceIdentifier attribute with the namespace of the corpus that the referent is in. This is not properly formed RDF.
    • dcterms:subject: when a judgment is about multiple fields, a resource identifier is given that contains both subjects concatenated. An example is http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_socialezekerheidsrecht. It makes more sense to have one URI for 'bestuursrecht' and one URI for 'socialezekerheidsrecht'.
    • psi:zaaknummer doesn't seem to split lists of identifiers correctly. A string like 97/8236 TW, 97/8241 TW is probably two case numbers, not one.
  • The XML defines a prefix that refers to the relative URI bwb-dl. Prefixing to relative URIs is a practice that has been deprecated by W3C.

License

GPL v3. Note that this is a viral open source license. If you create derivatives, you must publish your code under compatible license terms. Please support free software.

Contact

Inquiries go to Maarten Trompper ([email protected])