An efficient drop-in replacement for JSON.

Overview

JCOF: JSON-like Compact Object Format

A more efficient way to represent JSON-style objects.

Status

This format isn't nailed down yet. Most changes will likely be additive, such that existing JCOF documents will remain valid, but nothing is guaranteed. Use at your own risk. In its current form, JCOF is suitable for closed systems where one party controls every producer and consumer and where every implementation can be updated at once.

About

JCOF tries to be a drop-in replacement for JSON, with most of the same semantics, but with a much more compact representation of objects. The main way it does this is to introduce a string table at the beginning of the object, and then replace all strings with indexes into that string table. It also employs a few extra tricks to make objects as small as possible, without losing the most important benefits of JSON. Most importantly, it remains a text-based, schemaless format.

The following JSON object:

{
	"people": [
		{"first-name": "Bob", "age": 32, "occupation": "Plumber", "full-time": true},
		{"first-name": "Alice", "age": 28, "occupation": "Programmer", "full-time": true},
		{"first-name": "Bernard", "age": 36, "occupation": null, "full-time": null},
		{"first-name": "El", "age": 57, "occupation": "Programmer", "full-time": false}
	]
}

could be represented as the following JCOF object:

Programmer;"age""first-name""full-time""occupation";
{"people"[(0,iw"Bob"b"Plumber")(0,is"Alice"b,s0)(0,iA"Bernard"n,n)(0,iV"El"B,s0)]}

Minimized, the JSON is 299 bytes, with 71.5 bytes on average per person object. The JCOF is 134 bytes, with only 17.5 bytes per person object; that's 0.45x the size in total, and 0.23x the size per person object. The reason the JCOF is so much smaller is threefold:

  1. It has a string table, so that strings which occur multiple times only have to be included in the JCOF document once. In this example object, the only duplicated string is "Programmer".
  2. It has an object shapes table, so that object shapes which occur multiple times only have to have their keys encoded once. In this example object, the only duplicated object shape is {"age", "first-name", "full-time", "occupation"}.
  3. It has more compact encodings for various values and syntax. Large integers can be encoded as base 62 rather than base 10, booleans and null are encoded using single characters, and separator characters can be skipped where that results in an unambiguous document.

Rationale

I was making a JSON-based serialization format for a game I was working on, but found myself making trade-offs between space efficiency and descriptive key names, so decided to make a format which makes that a non-issue. I then kept iterating on it until I had what I call JCOF today.

In most cases, you would use plain JSON, or if size is a concern, you would use gzipped JSON. But there are times when size is a concern and you can't reasonably use gzip; for example, gzipping stuff from JavaScript in the browser is inconvenient until TextEncoderStream is supported in Firefox, and having a smaller uncompressed encoding can be an advantage some cases even where gzip is used. I've also observed significant reductions in size between compressed JSON and compressed JCOF in certain cases.

I'm publishing it because other people may find it useful too. If you don't find it useful, feel free to disregard it.

Reference implementations

The only reference implementation currently is the javascript one, in implementations/javascript/jcof.js. It's published on NPM here: https://www.npmjs.com/package/jcof

Benchmarks

This is the sizes of various documents in JSON compared to JCOF (from the test suite):

tiny.json:
  JSON: 299 bytes
  JCOF: 134 bytes (0.448x)
circuitsim.json:
  JSON: 8315 bytes
  JCOF: 2093 bytes (0.252x)
pokemon.json:
  JSON: 219635 bytes
  JCOF: 39650 bytes (0.181x)
pokedex.json:
  JSON: 56812 bytes
  JCOF: 23132 bytes (0.407x)
madrid.json:
  JSON: 37960 bytes
  JCOF: 11923 bytes (0.314x)
meteorites.json:
  JSON: 244920 bytes
  JCOF: 87028 bytes (0.355x)
comets.json:
  JSON: 51949 bytes
  JCOF: 37480 bytes (0.721x)

The format

Here's the grammar which describes JCOF:

grammar ::= string-table ';' object-shape-table ';' value

string-table ::= (string (','? string)*)?
string ::= plain-string | json-string
plain-string ::= [a-zA-Z0-9]+
json-string ::= [https://datatracker.ietf.org/doc/html/rfc8259#section-7]

object-shape-table ::= (object-shape (',' object-shape)*)?
object-shape ::= object-key (':'? object-key)*
object-key ::= base62 | json-string
base62 ::= [0-9a-zA-Z]+

value ::=
  array-value |
  object-value |
  number-value |
  string-value |
  bool-value |
  null-value

array-value ::= '[' (value (','? value)*)? ']'
object-value ::= shaped-object-value | keyed-object-value
shaped-object-value ::= '(' base62 (','? value)* ')'
keyed-object-value ::= '{' (key-value-pair (','? key-value-pair)*)? '}'
key-value-pair ::= object-key ':'? value
number-value ::= 'i' base62 | 'I' base62 | 'finf' | 'fInf' | 'fnan' | float-value
float-value ::= '-'? [0-9]+ ('.' [0-9]+)? (('e' | 'E') ('-' | '+')? [0-9]+)?
string-value ::= 's' base62 | json-string
bool-value ::= 'b' | 'B'
null-value ::= 'n'

See the bottom of the readme for a railroad diagram.

In addition to the grammar, you should know the following:

Many separators are optional

The grammar contains optional separators (','?, ':'?). These separators can be skipped if either the character before or the character after is any of the following: [, ], {, }, (, ), ,, : or ". This saves a bunch of bytes. JCOF generators can choose to always emit separators, but parsers must accept JCOF documents with missing separators.

The string table

All JCOF objects start with a string table, which is a list of strings separated by an optional ,.

The object shapes table

An "object shape" is defined as a list of keys. If you have a bunch of objects with the same keys, it's usually advantageous to define that set of keys once in the object shapes table and encode the objects with the shaped objects syntax. An object shape is a list of object keys optionally separated by :, and the object shape table is a list of object shapes (non-optionally) separated by ,

Base62

Base62 encoding just refers to writing integer numbers in base 62 rather than base 10. This lets us use 0-9, a-z and A-Z as digits. The characters from 0 to 9 represent 0-9, the characters a to z represent 10-35, and the characters A to Z represent 36-61.

Values

A value can be:

  • An array literal: [, followed by 0 or more values, followed by ]
  • A shaped object literal: (, followed by an object shape index, followed by values, followed by )
    • The object shape index is a base62-encoded index into the object shapes table
  • An object literal: {, followed by 0 or more key-value pairs, followed by }
    • A key-value pair is a base62 index into the header, followed by a :, followed by a value
  • A string reference: s followed by a base62 index into the header
  • A JSON string literal
  • A number literal:
    • i followed by a base62 number: A positive integer
    • I followed by a base62 number: A negative integer
    • A floating point number written in decimal, with an optional fractional part and an optional exponent part
  • A bool literal: b: true B: false
  • A null literal: n

Railroad diagram

generated with bnf-railroad-generator

railroad diagram

Comments
  • There is something wrong with the Base 62 section of Readme

    There is something wrong with the Base 62 section of Readme

    Hi mortie,

    Thanks for sharing this repo with us. I was reading the Readme.md file and the Base 62 section seems to be wrong regarding the number representation.

    Cheers, Vitaliano

    opened by legionaryu 2
  • discussion: why not just JSON with some conventions ?

    discussion: why not just JSON with some conventions ?

    The rational of JSON is to propose a trade off between human readable (view and modify in a text editor) and machine processable (easy parser, basic typing, reasonably fast).

    JCOF makes a very small progress to the right (more clever encoding) but a huge regression on the left (unreadable).

    Here the use case seems very tabular data oriented. There are text format like TSV/CSV for that.

    If the point is to reduce the overhead of JSON when encoding tabular data, it can be done as below:

    { "peoples" :
    { "_headers"  : ["age","first-name","full-time","occupation"],
    "_records" : [
    ,[32,"Bob",true,"Plumber"]
    ,[28,"Alice",true,"Programmer"]
    ,[36,"Bernard",null,null]
    ,[57,"El",false,"Programmer"]
    ]}}
    

    Thanks for the experimentation, data encoding is always interesting

    opened by setop 0
  • Why not Protobuff?

    Why not Protobuff?

    If I'm to lose readability in favour of efficiency, I can as well use a community standard https://github.com/protobufjs/protobuf.js. In which cases JCOF would be preferable?

    opened by nskazki 3
  • Streaming format

    Streaming format

    It's trivial to extend this format to support streaming. Here I imagine some specs for it.

    Suggested BNF additions:

    grammar ::= string-table ';' object-shape-table ';' ows value
    streaming-grammar ::= string-table ';' object-shape-table ';' ows (value ';' ows)*
    
    ows ::= *( %x20 / %x0A / %x0D )
    

    Suggested specification phrasing:

    The streaming format is recognized by finding a semicolon after the first complete JCOF value. Parsers not instructed to expect an object stream SHOULD recognize this situation and emit an error.

    Implementations using the streaming grammar SHOULD emit a newline after the object shape table and after every emitted object (at the "ows" production in the BNF). This enables compatibility with existing line-streaming programming styles.

    opened by riking 1
  • Establish a fuzzing harness to demonstrate parser robustness

    Establish a fuzzing harness to demonstrate parser robustness

    Running a fuzzer is a basic quality-of-implementation task for any parser that wants to be widely used. Because you have a canonical reference format, you can easily implement round-trip verification fuzzing.

    // We can round-trip any valid JSON
    function fuzzTargetA(payload) {
      const expected = try { JSON.parse(payload) } catch { return; };
      const result = jcof.decode(jcof.encode(expected));
      if (!check_equivalence(expected, result)) { fail(); }
    }
    
    // We can encode anything we successfully decode, and it decodes without errors
    function fuzzTargetB(payload) {
      const expected = try { jcof.decode(payload) } catch { return; };
      const result = jcof.decode(jcof.encode(decoded));
      if (!check_equivalence(expected, result)) { fail(); }
    }
    
    opened by riking 0
  • Discussion: Float encoding

    Discussion: Float encoding

    At the risk of causing bikeshedding, I wanted to raise a point of discussion regarding how floats are encoded -- given that integers are encoded in Base62 and therefore take up less space than their decimal form, it seems a bit odd that floats are left as-is.

    My first idea was that they could also be encoded in a similar compacted form, such as base36, which JavaScript can do natively with .toString(36), but that isn't particularly great if the float's magnitude is not close to 0 because the expontent e+n notation can't be used unambiguously, and parsing them back from such a format probably isn't easy (definitely not in JavaScript.)

    I'd love to hear more about the rationale behind this decision (if any)! All in all, for the goals that it sets out to accomplish, JCOF is really neat.

    (Edit, this did end up causing bikeshedding, especially in the other issues, sorry for that.)

    opened by paulsnar 11
Owner
Martin Dørum
Martin Dørum
A Drop-in Jalali Replacement for filament DateTimePicker

Filament Jalali Date Time Picker Field This package is a Drop-in replacement for DatePicker and DateTimePicker field type you just need to replace tho

AriaieBOY 8 Dec 3, 2022
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
JCS (JSON Canonicalization Scheme), JSON digests, and JSON Merkle hashes

JSON Hash This package contains the following JSON utilties for Deno: digest.ts provides cryptographic hash digests of JSON trees. It guarantee that d

Hong Minhee (洪 民憙) 13 Sep 2, 2022
Package fetcher is a bot messenger which gather npm packages by uploading either a json file (package.json) or a picture representing package.json. To continue...

package-fetcher Ce projet contient un boilerplate pour un bot messenger et l'executable Windows ngrok qui va permettre de créer un tunnel https pour c

AILI Fida Aliotti Christino 2 Mar 29, 2022
A minimalistic yet efficient way to stringify and revive instances via JSON.

json-instances Social Media Photo by Francisco J. Villena on Unsplash A minimalistic yet efficient way to stringify and revive instances via JSON. If

Andrea Giammarchi 11 Jun 23, 2022
Automatic arxiv->ar5iv link replacement in Chrome.

Automatic arxiv->ar5iv link replacement in Chrome. This chrome extension will automatically replace arxiv.org/pdf/* links with ar5iv links for more we

yobi byte 44 Oct 29, 2022
Replacement for comma.ai backend and useradmin dashboard

Replacement for comma.ai backend and useradmin dashboard. Bundled with a modified version of comma's cabana to allow viewing & analyzing drives.

null 15 Jan 1, 2023
A Hackable Markdown Note Application for Programmers. Version control, AI completion, mind map, documents encryption, code snippet running, integrated terminal, chart embedding, HTML applets, plug-in, and macro replacement.

Yank Note A hackable markdown note application for programmers Download | Try it Online >>> Not ecommended English | 中文说明 [toc]{level: [2]} Highlights

洋子 4.3k Dec 31, 2022
Pretty, customisable, cross browser replacement scrollbars

jScrollPane - cross browser custom scroll bars jScrollPane is a jQuery plugin which allows you to replace a browser's default scroll bars (on an eleme

Kelvin Luck 2.2k Dec 15, 2022
Lightweight plugin for easy responsive images replacement

Responsive image replacement Check out the example here. resonsive-img.js is a lightweight plugin for fast, clean and easy responsive image replacemen

Koen Vendrik 315 Sep 20, 2022
A Cypress plugin that generates test scripts from your interactions, a replacement Cypress Studio for Cypress v10 🖱 ⌨

DeploySentinel Cypress Recorder Plugin Create Cypress tests scripts within the Cypress test browser by simply interacting with your application, simil

DeploySentinel 13 Dec 15, 2022
shell script replacement; write shell scripts in js instead of bash, then run them with a single static binary

yavascript YavaScript is a bash-like script runner which is distributed as a single statically-linked binary. Scripts are written in JavaScript. There

Lily Scott 59 Dec 29, 2022
This package is a replacement for superjson to use in your Remix app

This package is a replacement for superjson to use in your Remix app. It handles a subset of types that superjson supports, but is faster and smaller.

Michael Carter 252 Jan 3, 2023
A modern replacement for jQuery.marquee

vanilla-marquee An es5 vanilla-js implementation of jQuery.marquee Installation npm i vanilla-marquee Usage import marquee from 'vanilla-marquee' ne

null 26 Dec 27, 2022
A beautiful, responsive, highly customizable and accessible replacement for JavaScript's popup boxes. Zero dependencies.Alerts ,dialogs

AsgarAlert (v1) for JS Install <script defer src="/asgar-alert.js"></script> Examples The most basic message: asgar("Hello world!"); A message signali

Asgar Aliyev 5 Dec 20, 2022
Selectator is a jQuery-based replacement for select boxes

DEPRECATED - no longer actively maintained Selectator Selectator is a jQuery-based replacement for select boxes. It supports searching, custom rendere

KODIO 90 Dec 16, 2022
A minimal, pure-CSS Lightbox replacement

CSSBox A simple, pure-CSS Lightbox replacement. An example page is available in the gh-pages branch, or online on GitHub Pages. Why CSSBox? Absolutely

Sylvia van Os 48 Nov 21, 2022