Valdir Stumm Junior
2 Mins
March 5, 2015

Skinfer: A tool for inferring JSON schemas

Imagine that you have a lot of samples for a certain kind of data in JSON format. Maybe you want to have a better feel of it, know which fields appear in all records, which appear only in some and what are their types. In other words, you want to know the schema for the data that you have.

We'd like to present you skinfer, a tool that we built for inferring the schema from samples in JSON format. Skinfer will take a list of JSON samples and give you one JSON schema that describes all of the samples. (For more information about JSON Schema, we recommend the online book Understanding JSON Schema.)

Install skinfer with pip install skinfer, then generate a schema running the command schema_inferer passing a list of JSON samples (it can be a JSON lines file with all samples or a list of JSON files passed via the command line).

Here is an example of usage with a simple input:

$ cat samples.json $ cat samples.json {"name": "Claudio", "age": 29} {"name": "Roberto", "surname": "Gomez", "age": 72} $ schema_inferer --jsonlines samples.json { "$schema": "http://json-schema.org/draft-04/schema", "required": [ "age", "name" ], "type": "object", "properties": { "age": { "type": "number" }, "surname": { "type": "string" }, "name": { "type": "string" } } }

Once you've generated a schema for your data, you can:

  1. Run it against other samples to see if they share the same schema
  2. Share it with anyone who wants to know the structure of your data
  3. Complement it manually, adding descriptions for the fields
  4. Use a tool like docson to generate a nice page documenting the schema of your data (see example here)

Another interesting feature of Skinfer is that it can also merge a list of schemas, giving you a new schema that describes samples from all previously given schemas. For this, use the json_schema_merger command passing it a list of schemas.

This is cool because you can continuously keep updating a schema even after you've already generated it: you can just merge it with the one you already have.

Feel free to dive into the code, explore the docs and please file any issues that you have on GitHub. 🙂