JavaScript Database (JSDB)

A zero-dependency, transparent, in-memory, streaming write-on-update JavaScript database for the Small Web that persists to a JavaScript transaction log.

Use case

A small and simple data layer for basic persistence and querying. Built for us in Small Web places and used in Site.js and Place.

This is not for you to farm people for their data. Surveillance capitalists can jog on now.

Features

Transparent: if you know how to work with arrays and objects and call methods in JavaScript, you already know how to use JSDB? It’s not called JavaScript Database for nothing.
Automatic: it just works. No configuration.
100% code coverage: meticulously tested. Note that this does not mean it is bug free ;)

Limitations

Small Data: this is for small data, not Big Data™.
For Node.js: will not work in the browser. (Although data tables are plain ECMAScript Modules (ESM; es6 modules) and can be loaded in the browser.)
Runs on untrusted nodes: this is for data kept on untrusted nodes (servers). Use it judiciously if you must for public data, configuration data, etc. If you want to store personal data or model human communication, consider end-to-end encrypted and peer-to-peer replicating data structures instead to protect privacy and freedom of speech. Keep an eye on the work taking place around the Hypercore Protocol.
In-memory: all data is kept in memory and, without tweaks, cannot exceed 1.4GB in size. While JSDB will work with large datasets, that’s not its primary purpose and it’s definitely not here to help you farm people for their data, so please don’t use it for that. (If that’s what you want, quite literally every other database out there is for your use case so please use one of those instead.)
Streaming writes on update: writes are streamed to disk to an append-only transaction log as JavaScript statements and are both quick (in the single-digit miliseconds region on a development laptop with an SSD drive) and as safe as we can make them (synchronous at the kernel level).
No schema, no migrations: again, this is meant to be a very simple persistence, query, and observation layer for local server-side data. If you want schemas and migrations, take a look at nearly every other database out there.

Note: the limitations are also features, not bugs. This is a focused tool for a specific purpose. While feature requests are welcome, I do not foresee extending its application scope.

Like this? Fund us!

Small Technology Foundation is a tiny, independent not-for-profit.

We exist in part thanks to patronage by people like you. If you share our vision and want to support our work, please become a patron or donate to us today and help us continue to exist.

Installation

npm i github:small-tech/jsdb

Usage

Here’s a quick example to whet your appetite:

import JSDB from '@small-tech/jsdb'

// Create your database in the test folder.
// (This is where your JSDF files – “tables” – will be saved.)
//
const db = JSDB.open('db')

// Create db/people.js table with some initial data if it
// doesn’t already exist.
if (!db.people) {
  db.people = [
    {name: 'Aral', age: 43},
    {name: 'Laura', age: 34}
  ]

  // Correct Laura’s age. (This will automatically update db/people.js)
  db.people[1].age = 33

  // Add Oskar to the family. (This will automatically update db/people.js)
  db.people.push({name: 'Oskar', age: 8})

  // Update Oskar’s name to use his nickname. (This will automatically update db/people.js)
  db.people[2].name = 'Osky'
}

After running the above script, take a look at the resulting database table in the ./db/people.js file.

(Note: all examples assume that your Node.js project has "type": "module" set in its package.json file and uses ESM modules. Adapt accordingly if you’re using CommonJS. Not that as of version 2.0.0, JSDF files are output in ESM, not CommonJS/UMD format.)

JavaScript Data Format (JSDF)

JSDB tables are written into JavaScript Data Format (JSDF) files. A JSDF file is a plain JavaScript file in the form of an ECMAScript Module (ESM; es6 module) that comprises an append-only transaction log which creates the table in memory. For our example, it looks like this:

export const _ = [ { name: `Aral`, age: 43 }, { name: `Laura`, age: 34 } ];
_[1]['age'] = 33;
_[2] = { name: `Oskar`, age: 8 };
_[2]['name'] = `Osky`;

It’s just JavaScript!

A JSDF file is just JavaScript. Specifically, it is an ECMAScript Module (ESM; es6 module).

The first line is a single assignment/export of all the data that existed in the table when it was created or last loaded.

Any changes to the table made during the last session that it was open are written, one statement per line, starting with the second line.

Since the format contains a UMD-style declaration, you can simply require() a JSDF file as a module in Node.js or even load it using a script tag.

For example, create an index.html file with the following content in the same folder as the other script and serve it locally using Site.js and you will see the data printed out in your browser:

<h1>People</h1>
<ul id='people'></ul>

<script type="module">
  import { _ as people } from '/db/people.js'

  const peopleList = document.getElementById('people')

  people.forEach(person => {
    const li = document.createElement('li')
    li.innerText = `${person.name} (${person.age} years old)`
    peopleList.appendChild(li)
  })
</script>

Note: This is version 2.0 of the JSDF format. Version 1.0 of the format was used in the earlier (CommonJS) version of JSDB and contained a UMD-style declaration. Please use the jsdf-1.0 branch if that’s what you’d prefer but that branch will see no further development. Migrating from version 1.0 to 2.0 is simple but is not handled automatically for you by JSDB for performance reasons. For a basic example, see examples/jsdf-version-1.0-to-version-2.0-migration.

Supported and unsupported data types.

Just because it’s JavaScript, it doesn’t mean that you can throw anything into JSDB and expect it to work.

Supported data types

Number
Boolean
String
Object
Array
Date
Symbol
Custom data types (see below).

Additionally, null and undefined values will be persisted as-is.

Security note regarding strings

Strings are automatically sanitised to escape backticks, backslashes, and template placeholder tokens to avoid arbitrary code execution via JavaScript injection attacks.

The relevant areas in the codebase are linked to below.

If you notice anything we’ve overlooked or if you have suggestions for improvements, please open an issue.

Custom data types

Custom data types (instances of your own classes) are also supported.

During serialisation, class information for custom data types will be persisted.

During deserialisation, if the class in question exists in memory, your object will be correctly initialised as an instance of that class. If the class does not exist in memory, your object will be initialised as a plain JavaScript object.

e.g.,

import JSDB from '@small-tech/jsdb'

class Person {
  constructor (name = 'Jane Doe') {
    this.name = name
  }
  introduceYourself () {
    console.log(`Hello, I’m ${this.name}.`)
  }
}

const db = JSDB.open('db')

// Initialise the people table if it doesn’t already exist.
if (!db.people) {
  db.people = [
    new Person('Aral'),
    new Person('Laura')
  ]
}

// Will always print out “Hello, I’m Laura.”
// (On the first run and on subsequent runs when the objects are loaded from disk.)
db.people[1].introduceYourself()

If you look in the created db/people.js file, this time you’ll see:

export const _ = [ Object.create(typeof Person === 'function' ? Person.prototype : {}, Object.getOwnPropertyDescriptors({ name: `Aral` })), Object.create(typeof Person === 'function' ? Person.prototype : {}, Object.getOwnPropertyDescriptors({ name: `Laura` })) ];

If you were to load the database in an environment where the Person class does not exist, you will get a regular object back.

To test this, you can run the following code:

import JSDB from '@small-tech/jsdb'
const db = JSDB.open('db')

// Prints out { name: 'Laura' }
console.log(db.people[1])

You can find these examples in the examples/custom-data-types folder of the source code.

Unsupported data types

If you try to add an instance of an unsupported data type to a JSDB table, you will get a TypeError.

The following data types are currently unsupported but might be supported in the future:

Map (and WeakMap)
Set (and WeakSet)
Binary collections (ArrayBuffer, Float32Array, Float64Array, Int8Array, Int16Array, Int32Array, TypedArray, Uint8Array, Uint16Array, Uint32Array, and Uint8ClampedArray)

The following intrinsic objects are not supported as they don’t make sense to support:

Intrinsic objects (DataView, Function, Generator, Promise, Proxy, RegExp)
Error types (Error, EvalError, RangeError, ReferenceError, SyntaxError, TypeError, and URIError)

Important security note

JSDF is not a data exchange format.

Since JSDF is made up of JavaScript code that is evaluated at run time, you must only load JSDF files from domains that you own and control and have a secure connection to.

Do not load in JSDF files from third parties.

If you need a data exchange format, use JSON.

Rule of thumb:

JSON is a terrible format for a database but a great format for data exchange.
JSDF is a terrible format for data exchange but a great format for a JavaScript database.

JavaScript Query Language (JSQL)

In the browser-based example, above, you loaded the data in directly. When you do that, of course, you are not running it inside JSDB so you cannot update the data or use the JavaScript Query Language (JSQL) to query it.

To test out JSQL, open a Node.js command-line interface (run node) from the directory that your scripts are in and enter the following commands:

import JSDB from '@small-tech/jsdb'

// This will load test database with the people table we created earlier.
const db = JSDB.open('db')

// Let’s carry out a query that should find us Osky.
console.log(db.people.where('age').isLessThan(21).get())

Note that you can only run queries on arrays. Attempting to run them on plain or custom objects (that are not subclasses of Array) will result in a TypeError. Furthermore, queries only make sense when used on arrays of objects. Running a query on an array of simple data types will not throw an error but will return an empty result set.

For details, see the JSQL Reference section.

Compaction

When you load in a JSDB table, by default JSDB will compact the JSDF file.

Compaction is important for two reasons; during compaction:

Deleted data is actually deleted from disk. (Privacy.)
Old versions of updated data are actually removed. (Again, privacy.)

Compaction may thus also reduce the size of your tables.

Compaction is a relatively fast process but it does get uniformly slower as the size of your database grows (it has O(N) time complexity as the whole database is recreated).

You do have the option to override the default behaviour and keep all history. You might want to do this, for example, if you’re creating a web app that lets you create a drawing and you want to play the drawing back stroke by stroke, etc.

Now that you’ve loaded the file back, look at the ./db/people.js JSDF file again to see how it looks after compaction:

export const _ = [ { name: `Aral`, age: 43 }, { name: `Laura`, age: 33 }, { name: `Osky`, age: 8 } ];

Ah, that is neater. Laura’s record is created with the correct age and Oskar’s name is set to its final value from the outset. And it all happens on the first line, in a single assignment. Any new changes will, just as before, be added starting with the third line.

(You can find these examples in the examples/basic folder of the source code.)

Closing a database

Your database tables will be automatically closed if you exit your script. However, there might be times when you want to manually close a database (for example, to reopen it with different settings, etc.) In that case, you can call the asynchronous close() method on the database proxy.

Here’s what you’d do to close the database in the above example:

async main () {
  // … 🠑 the earlier code from the example, above.

  await db.close()

  // The database and all of its tables are now closed.
  // It is now safe (and allowed) to reopen it.
}

main()

Working with JSON

As mentioned earlier, JSDB writes out its tables as append-only logs of JavaScript statements in what we call JavaScript Data Format (JSDF). This is not the same as JavaScript Object Notation (JSON).

JSON is not a good format for a database but it is excellent – not to mention ubiquitous – for its original use case of data exchange. You can easily find or export datasets in JSON format. And using them in JSDB is effortless. Here’s an example that you can find in the examples/json folder of the source code:

Given a JSON data file of spoken languages by country in the following format:

[
  {
    "country": "Aruba",
    "languages": [
      "Dutch",
      "English",
      "Papiamento",
      "Spanish"
    ]
  },
  {
    "etc.": "…"
  }
]

The following code will load in the file, populate a JSDB table with it, and perform a query on it:

import fs from 'fs'
import JSDB from '@small-tech/jsdb'

const db = JSDB.open('db')

// If the data has not been populated yet, populate it.
if (!db.countries) {
  const countries = JSON.parse(fs.readFileSync('./countries.json', 'utf-8'))
  db.countries = countries
}

// Query the data.
const countriesThatSpeakKurdish = db.countries.where('languages').includes('Kurdish').get()

console.log(countriesThatSpeakKurdish)

When you run it, you should see the following result:

[
  {
    country: 'Iran',
    languages: [
      'Arabic',    'Azerbaijani',
      'Bakhtyari', 'Balochi',
      'Gilaki',    'Kurdish',
      'Luri',      'Mazandarani',
      'Persian',   'Turkmenian'
    ]
  },
  {
    country: 'Iraq',
    languages: [ 'Arabic', 'Assyrian', 'Azerbaijani', 'Kurdish', 'Persian' ]
  },
  { country: 'Syria', languages: [ 'Arabic', 'Kurdish' ] },
  { country: 'Turkey', languages: [ 'Arabic', 'Kurdish', 'Turkish' ] }
]

The code for this example is in the examples/json folder of the source code.

Dispelling the magic and a pointing out a couple of gotchas

Here are a couple of facts to dispel the magic behind what’s going on:

What we call a database in JSDB is just a regular directory on your file system.
Inside that directory, you can have zero or more tables.
A table is a JSDF file.
A JSDF file is an ECMAScript Module (ESM; es6 module) that exports a root data structure (either an object or an array) that may or may not contain data and a sequence of JavaScript statements that mutate it. It is an append-only transaction log that is compacted at load. JSDF files are valid JavaScript files and should import and run correctly under any JavaScript interpreter that supports ESM.
When you open a database, you get a Proxy instance back, not an instance of JSDB.
Similarly, when you reference a table or the data within it, you are referencing proxy objects, not the table instance or the data itself.

How the sausage is made

When you open a database, JSDB loads in any .js files it can find in your database directory. Doing so creates the data structures defined in those files in memory. Alongside, JSDB also creates a structure of proxies that mirrors the data structure and traps (captures) calls to get, set, or delete values. Every time you set or delete a value, the corresponding JavaScript statement is appended to your table on disk.

By calling the where() or whereIsTrue() methods, you start a query. Queries help you search for specific bits of data. They are implemented using the get traps in the proxy.

Gotchas and limitations

Given that a core goal for JSDB is to be transparent, you will mostly feel like you’re working with regular JavaScript collections (objects and arrays) instead of a database. That said, there are a couple of gotchas and limitations that arise from the use of proxies and the impedance mismatch between synchronous data manipulation in JavaScript and the asynchronous nature of file handling:

You can only have one copy of a database open at one time. Given that tables are append-only logs, having multiple streams writing to them would corrupt your tables. The JSDB class enforces this by forcing you to use the open() factory method to create or load in your databases.

You cannot reassign a value to your tables without first deleting them. Since assignment is a synchronous action and since we cannot safely replace the existing table on disk with a different one synchronously, you must first call the asynchronous delete() method on a table instance before assigning a new value for it on the database, thereby creating a new table.

async main () {
  // … 🠑 the earlier code from the example, above.

  await db.people.delete()

  // The people table is now deleted and we can recreate it.

  // This is OK.
  db.people = [
    {name: 'Ed Snowden', age: 37}
  ]

  // This is NOT OK.
  try {
    db.people = [
      {name: 'Someone else', age: 100}
    ]
  } catch (error) {
    console.log('This throws as we haven’t deleted the table first.')
  }
}

main()

There are certain reserved words you cannot use in your data. This is a trade-off between usability and polluting the mirrored proxy structure. JSDB strives to keep reserved words to a minimum.

This is the full list:

Reserved words

As table name close

Property names in data where, whereIsTrue, addListener, removeListener, delete, __table__

Note: You can use the __table__ property from any level of your data to get a reference to the table instance (JSTable instance) that it belongs to. This is mostly for internal use but it’s there if you need it.

	Reserved words
As table name	`close`
Property names in data	`where`, `whereIsTrue`, `addListener`, `removeListener`, `delete`, `__table__`

Table events

You can listen for the following events on tables:

Event name	Description
persist	The table has been persisted to disk.
delete	The table has been deleted from disk.

Example

The following handler will get called whenever a change is persisted to disk for the people table:

db.people.addListener('persist', (table, change) => {
  console.log(`Table ${table.tableName} persisted change ${change.replace('\n', '')} to disk.`)
})

JSQL Reference

The examples in the reference all use the following random dataset. Note, I know nothing about cars, the tags are also arbitrary. Don’t @ me ;)

const cars = [
  { make: "Subaru", model: "Loyale", year: 1991, colour: "Fuscia", tags: ['fun', 'sporty'] },
  { make: "Chevrolet", model: "Suburban 1500", year: 2004, colour: "Turquoise", tags: ['regal', 'expensive'] },
  { make: "Honda", model: "Element", year: 2004, colour: "Orange", tags: ['fun', 'affordable'] },
  { make: "Subaru", model: "Impreza", year: 2011, colour: "Crimson", tags: ['sporty', 'expensive']},
  { make: "Hyundai", model: "Santa Fe", year: 2009, colour: "Turquoise", tags: ['sensible', 'affordable'] },
  { make: "Toyota", model: "Avalon", year: 2005, colour: "Khaki", tags: ['fun', 'affordable']},
  { make: "Mercedes-Benz", model: "600SEL", year: 1992, colour: "Crimson", tags: ['regal', 'expensive', 'fun']},
  { make: "Jaguar", model: "XJ Series", year: 2004, colour: "Red", tags: ['fun', 'expensive', 'sporty']},
  { make: "Isuzu", model: "Hombre Space", year: 2000, colour: "Yellow", tags: ['sporty']},
  { make: "Lexus", model: "LX", year: 1997, colour: "Indigo", tags: ['regal', 'expensive', 'AMAZING'] }
]

Starting a query (the `where()` method)

const carsMadeIn1991 = db.cars.where('year').is(1991).get()

The where() method starts a query.

You call it on a table reference. It takes a property name (string) as its only argument and returns a query instance.

On the returned query instance, you can call various operators like is() or startsWith().

Finally, to invoke the query you use one one of the invocation methods: get(), getFirst(), or getLast().

The anatomy of a query.

Idiomatically, we chain the operator and invocation calls to the where call and write our queries out in a single line as shown above. However, you can split the three parts up, should you so wish. Here’s such an example, for academic purposes.

This starts the query and returns an incomplete query object:

const incompleteCarYearQuery = db.cars.where('year')

Once you call an operator on a query, it is considered complete:

const completeCarYearQuery = incompleteCarYearQuery.is(1991)

To execute a completed query, you can use one of the invocation methods: get(), getFirst(), or getLast().

Note that get() returns an array of results (which might be an empty array) while getFirst() and getLast() return a single result (which may be undefined).

const resultOfCarYearQuery = completeCarYearQuery.get()

Here are the three parts of a query shown together:

const incompleteCarYearQuery = db.cars.where('year')
const completeCarYearQuery = incompleteCarYearQuery.is(1991)
const resultOfCarYearQuery = completeCarYearQuery.get()

Again, idiomatically, we chain the operator and invocation calls to the where() call and write our queries out in a single line like this:

const carsMadeIn1991 = db.cars.where('year').is(1991).get()

Connectives (`and()` and `or()`)

You can chain conditions onto a query using the connectives and() and or(). Using a connective transforms a completed query back into an incomplete query awaiting an operator. e.g.,

const veryOldOrOrangeCars = db.cars.where('year').isLessThan(2000).or('colour').is('Orange').get()

Example

const carsThatAreFunAndSporty = db.cars.where('tags').includes('fun').and('tags').includes('sporty').get()

Result

[
  { make: "Subaru", model: "Loyale", year: 1991, colour: "Fuscia", tags: ['fun', 'sporty'] },
  { make: "Jaguar", model: "XJ Series", year: 2004, colour: "Red", tags: ['fun', 'expensive', 'sporty']},
]

Custom queries (`whereIsTrue()`)

For more complex queries – for example, if you need to include parenthetical grouping – you can compose your JSQL by hand. To do so, you call the whereIsTrue() method on a table instead of the where() method and you pass it a full JSQL query string. A completed query is returned.

When writing your custom JSQL query, prefix property names with valueOf..

Note that custom queries are inherently less safe as you are responsible for sanitising input at the application level to avoid leaking sensitive data. (Basic sanitisation to avoid arbitrary code execution is handled for you by JSDB). Make sure you read through the Security considerations with queries](#security-considerations-with-queries) section if you’re going to use custom queries.

Example

const customQueryResult = db.cars.whereIsTrue(`(valueOf.tags.includes('fun') && valueOf.tags.includes('affordable')) || (valueOf.tags.includes('regal') && valueOf.tags.includes('expensive'))`).get()

Result

[
  { make: 'Chevrolet', model: 'Suburban 1500', year: 2004, colour: 'Turquoise', tags: [ 'regal', 'expensive' ] },
  { make: 'Honda', model: 'Element', year: 2004, colour: 'Orange', tags: [ 'fun', 'affordable' ] },
  { make: 'Toyota', model: 'Avalon', year: 2005, colour: 'Khaki', tags: [ 'fun', 'affordable' ] },
  { make: 'Mercedes-Benz', model: '600SEL', year: 1992, colour: 'Crimson', tags: [ 'regal', 'expensive', 'fun' ] },
  { make: 'Lexus', model: 'LX', year: 1997, colour: 'Indigo', tags: [ 'regal', 'expensive', 'AMAZING' ] }
]

Relational operators

is(), isEqualTo(), equals()
isNot(), doesNotEqual()
isGreaterThan()
isGreaterThanOrEqualTo()
isLessThan()
isLessThanOrEqualTo()

Note: operators listed on the same line are aliases and may be used interchangeably (e.g., isNot() and doesNotEqual()).

Example (is)

const carWhereYearIs1991 = db.cars.where('year').is(1991).getFirst()

Result (is)

{ make: "Subaru", model: "Loyale", year: 1991, colour: "Fuscia", tags: ['fun', 'sporty'] }

Example (isNot)

const carsWhereYearIsNot1991 = db.cars.where('year').isNot(1991).get()

Result (isNot)

[
  { make: "Chevrolet", model: "Suburban 1500", year: 2004, colour: "Turquoise", tags: ['regal', 'expensive'] },
  { make: "Honda", model: "Element", year: 2004, colour: "Orange", tags: ['fun', 'affordable'] },
  { make: "Subaru", model: "Impreza", year: 2011, colour: "Crimson", tags: ['sporty', 'expensive']},
  { make: "Hyundai", model: "Santa Fe", year: 2009, colour: "Turquoise", tags: ['sensible', 'affordable'] },
  { make: "Toyota", model: "Avalon", year: 2005, colour: "Khaki", tags: ['fun', 'affordable'] },
  { make: "Mercedes-Benz", model: "600SEL", year: 1992, colour: "Crimson", tags: ['regal', 'expensive', 'fun'] },
  { make: "Jaguar", model: "XJ Series", year: 2004, colour: "Red", tags: ['fun', 'expensive', 'sporty'] },
  { make: "Isuzu", model: "Hombre Space", year: 2000, colour: "Yellow", tags: ['sporty'] },
  { make: "Lexus", model: "LX", year: 1997, colour: "Indigo", tags: ['regal', 'expensive', 'AMAZING'] }
]

Note how getFirst() returns the first item (in this case, an object) whereas get() returns the whole array of results.

The other relational operators work the same way and as expected.

String subset comparison operators

startsWith()
endsWith()
includes()
startsWithCaseInsensitive()
endsWithCaseInsensitive()
includesCaseInsensitive()

The string subset comparison operators carry out case sensitive string subset comparisons. They also have case insensitive versions that you can use.

Example (`includes()` and `includesCaseInsensitive()`)

const result1 = db.cars.where('make').includes('su').get()
const result2 = db.cars.where('make').includes('SU').get()
const result3 = db.cars.where('make').includesCaseInsensitive('SU')

Result 1

[
  { make: "Isuzu", model: "Hombre Space", year: 2000, colour: "Yellow", tags: ['sporty']}
]

Since includes() is case sensitive, the string 'su' matches only the make Isuzu.

Result 2

[]

Again, since includes() is case sensitive, the string 'SU' doesn’t match the make of any of the entries.

Result 3

[
  { make: "Subaru", model: "Impreza", year: 2011, colour: "Crimson", tags: ['sporty', 'expensive'] },
  { make: "Isuzu", model: "Hombre Space", year: 2000, colour: "Yellow", tags: ['sporty'] }
]

Here, includesCaseInsensitive('SU') matches both the Subaru and Isuzu makes due to the case-insensitive string comparison.

Array inclusion check operator

includes()

The includes() array inclusion check operator can also be used to check for the existence of an object (or scalar value) in an array.

Note that the includesCaseInsensitive() string operator cannot be used for this purpose and will throw an error if you try.

Example (`includes()` array inclusion check):

const carsThatAreRegal = db.cars.where('tags').includes('regal').get()

Result (`includes()` array inclusion check)

[
  { make: "Chevrolet", model: "Suburban 1500", year: 2004, colour: "Turquoise", tags: ['regal', 'expensive'] },
  { make: "Mercedes-Benz", model: "600SEL", year: 1992, colour: "Crimson", tags: ['regal', 'expensive', 'fun']},
  { make: "Lexus", model: "LX", year: 1997, colour: "Indigo", tags: ['regal', 'expensive', 'AMAZING'] }
]

Security considerations with queries

JSDB (as of version 1.1.0), attempts to carry out basic sanitisation of your queries for you to avoid Little Bobby Tables.

That said, you should still sanitise your queries at the application level, if you’re using custom queries via whereIsTrue(). Basic sanitisation will protect you from arbitrary code execution but it will not protect you from, for example, someone passing || valueOf.admin === true to attempt to access private information. You should be vigilant in your sanitisation when using whereIsTrue() and stick to using where() whenever possible.

The current sanitisation strategy is two-fold and is executed at time of query execution:

Remove dangerous characters (statement terminators, etc.):
- Semi-colon (;)
- Backslash (\)
- Backtick (`)
- Plus sign (+)
- Dollar sign ($)
- Curly brackets ({})
Reasoning: remove symbols that could be used to create valid code so that if our sieve (see below) doesn’t catch an attempt, the code will throw an error when executed, which we can catch and handle.
Use a sieve to remove expected input. If our sieve contains any leftover material, we immediately return an empty result set without executing the query.

During query execution, if the query throws (due to an injection attempt that was neutralised at Step 1 but made it through the sieve), we simply catch the error and return an empty result set.

The relevant areas in the codebase are linked to below.

If you notice anything we’ve overlooked or if you have suggestions for improvements, please open an issue.

Performance characteristics

The time complexity of reads and writes are both O(1).
Reads are fast (take fraction of a millisecond and are about an order of magnitude slower than direct memory reads).
Writes are fast (in the order of a couple of milliseconds on tests on a dev machine).
Initial table load time and full table write/compaction times are O(N) and increase linearly as your table size grows.

Suggested limits

Break up your database into multiple tables whenever possible.
Keep your table sizes under 100MB.

Hard limits

Your database size is limited by available memory.
If your database size is larger than > ~1.3GB, you should start your node process with a larger heap size than the default (~1.4GB). E.g., to set aside 8GB of heap space:

node --max-old-space-size=8192 why-is-my-database-so-large-i-hope-im-not-doing-anything-shady.js

Memory Usage

The reason JSDB is fast is because it keeps the whole database in memory. Also, to provide a transparent persistence and query API, it maintains a parallel object structure of proxies. This means that the amount of memory used will be multiples of the size of your database on disk and exhibits O(N) memory complexity.

Initial load time and full table write/compaction both exhibit O(N) time complexity.

For example, here’s just one sample from a development laptop using the simple performance example in the examples/performance folder of the source code which creates random records that are around ~2KB in size each:

Number of records	Table size on disk	Memory used	Initial load time	Full table write/compaction time
1,000	2.5MB	15.8MB	85ms	45ms
10,000	25MB	121.4MB	845ms	400ms
100,000	250MB	1.2GB	11 seconds	4.9 seconds

(The baseline app used about 14.6MB without any table in memory. The memory used column subtracts that from the total reported memory so as not to skew the smaller dataset results.)

Note: For tables > 500GB, compaction is turned off and a line-by-line streaming load strategy is implemented. If you foresee your tables being this large, you (a) are probably doing something nasty (and won’t mind me pointing it out if you’re not) and (b) should turn off compaction from the start for best performance. Keeping compaction off from the start will decrease initial table load times. Again, don’t use this to invade people’s privacy or profile them.

Development

Please open an issue before starting to work on pull requests.

Testing

Clone this repository.
npm i
npm test

For code coverage, run npm run coverage.

Note: lib/LineByLine.js is excluded from coverage as it is the inlined version of n-readlines. The tests for it can be found as part of that library.

Also, as JSDB has no runtime dependencies, you only have to run npm i if you want to run the test or make a distribution build.

Building

You can now build a 32KB distribution version of the module:

npm run build

Find the distribution build in dist/index.js.

To run the tests on the distribution build, use npm run test-dist.

Ideas for post 2.0.0.

Implement transactions.
╰─ Ensure 100% code coverage for transactions.
╰─ Document transactions.
╰─ Add transaction example.
Implement indices.
╰─ Ensure 100% code coverage for indices.
╰─ Document indices.
╰─ Add indices example.

Related projects, inspiration, etc.

Like this? Fund us!

Small Technology Foundation is a tiny, independent not-for-profit.

We exist in part thanks to patronage by people like you. If you share our vision and want to support our work, please become a patron or donate to us today and help us continue to exist.

Copyright

Use case

Currently, a JavaScript Data Format file stores data in transactions that atomically correspond to a single update of the in-memory data graph. It does so persisting primitive value sets as they are and by serialising the set action of complex values using JSON.stringify().

This is perfectly fine for arrays and plain objects but does not work custom objects (class instances) or for intrinsic objects* like Date instances.

Example:

const JSDB = require('@small-tech/jsdb')

class Person {
  constructor (name = 'Jane Doe') {
    this.name = name
  }
  introduceYourself () {
    console.log(`Hello, I’m ${this.name}.`)
  }
}

const db = JSDB.open('db')

// Initialise the people table if it doesn’t already exist.
if (!db.people) {
  db.people = [
    new Person('Aral'),
    new Person('Laura')
  ]
}

The first time you run this, since the Person instances are in memory as created (not as loaded from the database), you will be able to call the introduceYourself() method:

db.people.where('name').is('Laura').getFirst().introduceYourself()

// Outputs: Hello, I’m Laura.

However, on subsequent runs, you will get an error:

TypeError: db.people.where(...).is(...).getFirst(...).introduceYourself is not a function

This is because the loaded-in objects are plain objects, not instances of the Person class. They are currently stored as below:

_[0] = JSON.parse(`{"name":"Aral"}`);
_[1] = JSON.parse(`{"name":"Laura"}`);

Proposed solution

During the set handler of the data proxy, we can check whether an object is a plain object (obj.constructor.name === 'Object') or a custom one (all others, except for arrays, etc. See notes on intrinsic objects at end) and write out the code to recreate it (if the class exists) when the table is loaded back in. The statements are written out in a single statement/transaction. If this is not possible via chaining, etc., it should be implemented as a IIFE:

e.g., for the example above, the relevant transactions in the table would be:

_[0] = Object.create(typeof Person === 'function' ? Person.prototype : {}, Object.getOwnPropertyDescriptors(JSON.parse(`{"name":"Aral"}`)));
_[1] = Object.create(typeof Person === 'function' ? Person.prototype : {}, Object.getOwnPropertyDescriptors(JSON.parse(`{"name":"Laura"}`)));

Update: Note, the above will not work with transactions due to the JSON serialisation of the own properties. Instead, we must create a bare instance first and then populate its properties recursively as we do with regular objects and arrays, etc.

Prerequisites

Transactions

Other effects

I was considering implementing fast compaction for smaller tables (i.e., ideally under ~65MB, where string handling begins to slow down, or 1GB, the upper limit of string size) where the whole table would be compacted using a synchronous JSON.stringify() into a single serialised JSON string as part of a single JSON.parse() statement. This would provide an orders of magnitude speed increase in compaction for smaller tables over what we do now, which is to replay and persist the in-memory object graph.

However, if we implement support for custom objects, we won’t be able to implement this for obvious reasons.

Note that some intrinsic objects, like Date, will require special casing.

e.g., We can detect and implement support for Date like this:

_[0] = new Date('<date string from (new Date()).toJSON()>'>

I need to do more research into other collections, etc., like Map, Set, TypedArray, ArrayBuffer, etc.

Mirrors: internal issue #7

Append-only means no row delete?

I like jsdb! So thanks :-)

I couldn't find in the docs or source how to delete a row in an array/table. Is that a feature or am I looking in the wrong place? I see updating works but I'm not sure how to delete a row.

I could resort to deleting the table and replacing it with the table-without-that-row but that seems brutish :-)

Thanks again!

opened by nielsbom 4

Serialization error when object property name starts with @

Assuming a JSDB table called local_people has been initialized, the following code:

db.local_people.push({
          id: id,
          actor: {
            "@context": ["https://www.w3.org/ns/activitystreams"],  
          }
});

results in creating the following local_people.js:

globalThis._ = [  ];
(function () { if (typeof define === 'function' && define.amd) { define([], globalThis._); } else if (typeof module === 'object' && module.exports) { module.exports = globalThis._ } else { globalThis.local_people = globalThis._ } })();
_[0] = { id: `REDACTED`, actor: { @context: [ `https://www.w3.org/ns/activitystreams` ] } };

upon stopping the site.js server and restarting it, it fails to reload the data into memory:

_[0] = { id: `REDACTED`, actor: { @context: [ `https://www.w3.org/ns/activitystreams` ] } };
                                  ^

SyntaxError: Invalid or unexpected token
    at wrapSafe (internal/modules/cjs/loader.js:1070:16)
    at Module._compile (internal/modules/cjs/loader.js:1120:27)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1176:10)
    at Module.load (internal/modules/cjs/loader.js:1000:32)
    at Function.Module._load (internal/modules/cjs/loader.js:899:14)
    at Module.require (internal/modules/cjs/loader.js:1042:19)
    at require (internal/modules/cjs/helpers.js:77:18)
    at JSTable.load (/usr/local/bin/node_modules/@small-tech/jsdb/lib/JSTable.js:161:20)
    at new JSTable (/usr/local/bin/node_modules/@small-tech/jsdb/lib/JSTable.js:58:12)
    at /usr/local/bin/node_modules/@small-tech/jsdb/lib/JSDB.js:113:30

opened by DJSundog 4

Benchmark

Hi,

congratulation, it's a great project. Do you plan to make a benchmark where you compare the speed of jsdb with others (e.g nedb, lokijs, dexie...etc)?
out of scope

opened by icebob 3
Save to local storage ?

For a browser-only use of JSDB, how can I use JSDB without a server and store to localstorage ?

and further to this, can I still use a server but implement a "local storage - first" solution that will allow offline use ?

opened by johnoscott 1
Enable storage and retrieval of custom and intrinsic objects
Use case

Currently, a JavaScript Data Format file stores data in transactions that atomically correspond to a single update of the in-memory data graph. It does so persisting primitive value sets as they are and by serialising the set action of complex values using JSON.stringify().

This is perfectly fine for arrays and plain objects but does not work custom objects (class instances) or for intrinsic objects* like Date instances.

Example:

const JSDB = require('@small-tech/jsdb') class Person { constructor (name = 'Jane Doe') { this.name = name } introduceYourself () { console.log(`Hello, I’m ${this.name}.`) } } const db = JSDB.open('db') // Initialise the people table if it doesn’t already exist. if (!db.people) { db.people = [ new Person('Aral'), new Person('Laura') ] }

The first time you run this, since the Person instances are in memory as created (not as loaded from the database), you will be able to call the introduceYourself() method:

db.people.where('name').is('Laura').getFirst().introduceYourself() // Outputs: Hello, I’m Laura.

However, on subsequent runs, you will get an error:

TypeError: db.people.where(...).is(...).getFirst(...).introduceYourself is not a function

This is because the loaded-in objects are plain objects, not instances of the Person class. They are currently stored as below:

_[0] = JSON.parse(`{"name":"Aral"}`); _[1] = JSON.parse(`{"name":"Laura"}`);

Proposed solution

During the set handler of the data proxy, we can check whether an object is a plain object (obj.constructor.name === 'Object') or a custom one (all others, except for arrays, etc. See notes on intrinsic objects at end) and write out the code to recreate it (if the class exists) when the table is loaded back in. The statements are written out in a single statement/transaction. If this is not possible via chaining, etc., it should be implemented as a IIFE:

e.g., for the example above, the relevant transactions in the table would be:

_[0] = Object.create(typeof Person === 'function' ? Person.prototype : {}, Object.getOwnPropertyDescriptors(JSON.parse(`{"name":"Aral"}`))); _[1] = Object.create(typeof Person === 'function' ? Person.prototype : {}, Object.getOwnPropertyDescriptors(JSON.parse(`{"name":"Laura"}`)));

Update: Note, the above will not work with transactions due to the JSON serialisation of the own properties. Instead, we must create a bare instance first and then populate its properties recursively as we do with regular objects and arrays, etc.

Prerequisites

Transactions

Other effects

I was considering implementing fast compaction for smaller tables (i.e., ideally under ~65MB, where string handling begins to slow down, or 1GB, the upper limit of string size) where the whole table would be compacted using a synchronous JSON.stringify() into a single serialised JSON string as part of a single JSON.parse() statement. This would provide an orders of magnitude speed increase in compaction for smaller tables over what we do now, which is to replay and persist the in-memory object graph.

However, if we implement support for custom objects, we won’t be able to implement this for obvious reasons.

Note that some intrinsic objects, like Date, will require special casing.

e.g., We can detect and implement support for Date like this:

_[0] = new Date('<date string from (new Date()).toJSON()>'>

I need to do more research into other collections, etc., like Map, Set, TypedArray, ArrayBuffer, etc.

Mirrors: internal issue #7
opened by aral 1

Multiline strings cause crash on read

Reproduction

import JSDB from '@small-tech/jsdb'

const test = JSDB.open('test')

test.table = {
  s: `a
  multiline
  string`
}

console.log(test.table)

What happens

First run: OK (no bug in creation). Output:

   💾    ❨JSDB❩ No database found at /home/aral/sandbox/jsdb-multiline-text-test/test; creating it.
   💾    ❨JSDB❩ Creating and persisting table table…
   💾    ❨JSDB❩  ╰─ Created and persisted table in 1.224 ms.
   💾    ❨JSDB❩ Table table initialised.
{ s: 'a\n  multiline\n  string' }

Second run: crash with error:

   💾    ❨JSDB❩ Loading table table…
   💾    ❨JSDB❩  ╰─ Loading table synchronously.
undefined:1
_ = { 's': `a
             

SyntaxError: Unexpected end of input
    at JSTable.load (file:///home/aral/sandbox/jsdb-multiline-text-test/node_modules/@small-tech/jsdb/lib/JSTable.js:164:12)
    at new JSTable (file:///home/aral/sandbox/jsdb-multiline-text-test/node_modules/@small-tech/jsdb/lib/JSTable.js:57:12)
    at file:///home/aral/sandbox/jsdb-multiline-text-test/node_modules/@small-tech/jsdb/lib/JSDB.js:112:30
    at Array.forEach (<anonymous>)
    at JSDB.loadTables (file:///home/aral/sandbox/jsdb-multiline-text-test/node_modules/@small-tech/jsdb/lib/JSDB.js:109:61)
    at new JSDB (file:///home/aral/sandbox/jsdb-multiline-text-test/node_modules/@small-tech/jsdb/lib/JSDB.js:82:12)
    at Function.open (file:///home/aral/sandbox/jsdb-multiline-text-test/node_modules/@small-tech/jsdb/lib/JSDB.js:43:38)
    at file:///home/aral/sandbox/jsdb-multiline-text-test/index.js:3:19
    at ModuleJob.run (internal/modules/esm/module_job.js:169:25)
    at async Loader.import (internal/modules/esm/loader.js:177:24)

opened by aral 0

Is not using .js files as a DB on servers quite dangerous and prone to cause vulnerabilities?

This all sounds nice to store data in .js files directory, however, is not this quite dangerous when you think about it?

The problem

Because as soon as you use it for some sensitive data of some kind – which almost always happens when you use a database, even aggregations of what seems to be quite public data can be sensitive – it gets somewhat critical. (And BTW; you talk about privacy in your Readme, so clearly you consider the fact that there could be private data in the DBs.) The technical issue of course is easy: By default, when .js files are placed on a webserver, anyone can download them, as .js files are of course intended to be used like databases.

This is not at all theoretical. After all data leaks are very common and simply downloading files from random webservers under well-known locations is somewhat shockingly incredible common, so that even tools exist to automate that. See also the list of breaches and linked news articles here which have all that in common. And people forget .git directories or saved nano/vim backup copies of files… so it is easy to forget. Or database dumps… etc. See the talk also linked there e.g.. I could go on here, talk about a vulnerability/problem in a project I am affiliated with, where we ended up using a config.ini.php instead of config.ini file, because that is guaranteed to be parsed by most web servers by default and thus won't expose sensitive data. (That's also why most public CMSes based on PHP e.g. do it like this…)

Solution?

It may be obvious to you that of course you should not use the project like this/keep the database files private, but stating the obvious is always a good idea when it comes to security. As such, at least document that you (obviously, as said, but well… keeping .ini files private is also obvious, yet it happens that these are exposed due to accidental misconfiguration etc.…) should somehow place the files into a directory not-accessible by any webserver. Maybe you can also do some further hardening with user permissions, SeLinux or webserver onfiguration… E.g. what may an – arguably ugly – workaround is if you save your files as database.js.php, which makes the files being non-accessible if they are on a PHP-compatible webserver. Otherwise, a different file ending such as .jsdb or another not-served by webservers file extension may be used – though of course the protection effect depends on how webservers treat such a file extension by default. (After all, remember people are lazy… manually adjusting webserver configs is likely not being done. :wink:)

Also, of course, it should be remembered people may take this into care when directly working with this here, respectively, if they know they work with that here. However, when used in third-party projects like your Site.js that information/risk can soon be forgotten about.

opened by rugk 1

Feature request: update row

Use case:

If we assume a JSDB table initialized as

db.cars = [
  { make: "Mazda", model: "Miato", color: "red", year: 1992 },
  { make: "GMC", model:" Timmy", color: "black", year: 1987 },
  { make: "Ford", model: "F5", color": blue", year: 1987 }
]

I would like to be able to update data in the table, either by querying for row index/indices, like

let editRowIndex = db.cars.where('model').is('Miato').getFirstIndex()
// editRowIndex now equals 0

let manyRowIndices = db.cars.where('year').is(1987).getIndices()
// manyRowIndices now equals [1, 2]

or by having access to an update method, like

let result = db.cars.where('model').is('F5').getFirst().update({ model: 'F150' });
// result is true if data was successfully persisted after merging updated fields into db.cars[2]

let multiresult = db.cars.where('year').isLessThan(2000).get().update({ needsSmogTest: true });
// multiresult is true if all matching rows were successfully persisted

As an alternative to the second version, I'd be fine with having the update(newData) method hanging off the query pre-get (so, db.cars.where('model').is('F5').updateFirst(newData) and the like.

Thoughts?

Transactions
A transaction is a series of updates on a table that are either all persisted together or are not persisted at all. They enable a table to have atomicity.

In JSDB, we write out our tables in JavaScript Data Format (JSDF), which is a transaction log in pure JavaScript. (Transaction log, in this case – confusingly – refers to a single update, not an atomic batch update).

To implement transactions (atomicity for batch changes) in JSDF, we can wrap multiple changes into a single-line immediately-invoked function expression (IIFE) and add that to our append-only log.

e.g.,

// ↑ other statements (function () { _[0].name = 'Aral'; _[0].age = 43; })()

Without a transaction, the changes would be written out like this instead:

// ↑ other statements _[0].name = 'Aral'; _[0].age = 43;

In keeping with as transparent an API as possible, here is one way we can implement this functionality:

db.people[0].__transaction-1__.name = 'Aral' db.people[0].__transaction-1__.age = 43 await db.people.__transaction-1__.persist() console.log('Transaction persisted.')

At this point we can be reasonably certain that transaction-1 has persisted to disk.

Note 1: transaction-1 is an arbitrary name, you can use any string that’s a valid identifier but it must start and end with two underscores and you must use the same identifier for any change that you want to include in your transaction.

Note 2: atomic transactions will only be implemented at the table-level (not the database-level, which would be a far messier undertaking and can be avoided by designing your tables accordingly).

Mirrors internal issue: #1
opened by aral 0