A MongoDB-like database built on top of Hyperbee with support for indexing

Last update: Dec 12, 2022

Related tags

Overview

hyperbeedeebee

A MongoDB-like database built on top of Hyperbee with support for indexing

WIP: There may be breaking changes in the indexing before the v1.0.0 release, don't use this for anything you don't mind migrating in the future.

Based on this design

Usage

npm i --save hyperbeedeebee

const Hyperbee = require('hyperbee')
// This module handles networking and storage of hypercores for you
const SDK = require('hyper-sdk')
const {DB} = require('hyperbeedeebee')

const {Hypercore} = await SDK()

// Initialize a hypercore for loading data
const core = new Hypercore('example')
// Initialize the Hyperbee you want to use for storing data and indexes
const bee = new Hyperbee(core)

// Create a new DB
const db = new DB(bee)

// Open up a collection of documents and insert a new document
const doc = await db.collection('example').insert({
  hello: 'World!'
})

// doc._id gets set to an ObjectId if you don't specify it
console.log(doc)

// Iterate through data as it's loaded (streaming)
// Usually faster and more memory / CPU efficient
for await (let doc of db.collection('example').find({
  clout: {
    $gt: 9000
  },
})) {
  console.log(doc)
}

// Create an index for properties in documents
// This drastically speeds up queries and is necessary for sorting by fields
await db.collection('example').createIndex('createdAt')

// Get all results in an array
// Can skip some results and limit total for pagination
const killbots = await db.collection('example')
  .find({type: 'killbot'})
  .sort('createdAt', -1)
  .skip(30)
  .limit(100)

// Get a single document that matches the query
const eggbert = await db.collection('example').findOne({name: 'Eggbert'})

Data Types

HyperbeeDeeBee uses MongoDB's BSON data types for encoding data. You can import the bson library bundled with HyperbeeDeeBee using the following code:

const { BSON } = require('hyperbeedeebee')

From there you can access any of the following data types:

Binary,
Code,
DBRef,
Decimal128,
Double,
Int32,
Long,
UUID,
Map,
MaxKey,
MinKey,
ObjectId,
BSONRegExp,
BSONSymbol,
Timestamp

TODO:

Important Differences From MongoDB

There is a single writer for a hyperbee and multiple readers
The indexing means that readers only need to download small subsets of the full dataset (if you index intelligently)
No way to do "projections" so keep in mind you're always downloading the full document to disk
Subset of find() API is implemented, no Map Reduce API, no $or/$and since it's difficult to optimize
You can only sort by indexed fields, otherwise there's no difference from loading all the data and sorting in memory
Fully open source under AGPL-3.0 and with mostly MIT dependencies.

Indexing considerations:

Indexes are super important to make your applications snappy and to reduce the overall CPU/Bandwidth/Storage usage of queries.

If you do a search by fields that aren't indexed, you'll end up downloading the full collection (this is potentially really slow)
The order of fields in the index matters, they're used to create an ordered key based on the values
If you want to sort by a field, make sure it's the first field in an index
You can have indexed fields before the sorted field if they are only used for $eq operations, this is due to the database's ability to turn them into a prefix to speed up the search.
If an index cannot be found to satisfy a sort the query will fail.
If you're using $gt/$lt/$gte/$lte in your query, they will perform best if the same considerations as the sort are applied.
If the fields in the index can be used to rule out a document as matching, then you can avoid loading more documents and doing fewer overall comparisons on data.
If your field is a unicode string which has 0x00 bytes in it, then the sorting might break due to the way BSON serializes unicode strings. Proceed with caution!

Comments

Account for BSON encoding lower bits from int64

int64 data (like Dates, or Longs) is currently encoded with the lower bits first (e.g. 4 lowest bytes, then 4 highest bytes). source

This is pretty much the exact opposite of what we need.

This breaks ordered searching in indexes. We've glossed over this by having dates created quickly enough in tests that usually only the lowest of the low bits would change within the span of the test.

I think we're going to need to ditch BSON for the key generation and jump into custom logic. 🤷

This'll require a breaking change.
bug

opened by RangerMauve 7
Add new index key generation code using CBOR

Fixes #6

Added a new index version v2.0 type which uses CBOR to generate index keys.

Had to do some funky stuff to make sure the key prefix is correct for $eq and $gt queries since cbor does length prefixing for it's lists (not sure why that wasn't an issue for BSON 🤔)

opened by RangerMauve 2

Simple find() receives error

Simple code gives following error:

code:

	const doc = await db.collection('example').insert({cid: cid})

	const docs = await db.collection('example').find({cid: cid})

error:

(node:12899) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'equals' of undefined
    at compareEq (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:595:30)
    at queryCompare (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:547:17)
    at matchesQuery (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:533:10)
    at processDoc (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:434:15)
    at Cursor.[Symbol.asyncIterator] (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:503:48)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async Cursor.then (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:383:24)
(Use `node --trace-warnings ...` to show where the warning was created)

opened by moskalyk 2

Blobs support
Opening this to track the idea.

This is a feature that could live here or at a higher layer, but one thing I found useful in CTZN was support for blobs which could be attached to records. That way you could share things like profile pics and post attachments under the same "logical db."

In CTZN, I implemented this in basically the same way hyperdrive does it - I had a second hypercore which contained the blobs, and then included records in the bee which indexed into the blobs. Hyperbee's metadata has a field for a contentFeed which we could use for this.

The challenges to something like this:

It breaks from the standard hyperbee URLs a bit, because - wherever you place the blob pointer record - you want that to be able to resolve to the underlying blob. Something like hyper://key/my-record/1/blobs/1 which outputs the binary.

On record deletes, you need to be sure to delete the attached blobs as well, or else they'll accumulate indefinitely.

An interesting aside -- it would make a lot of sense to gzip blobs when they're stored. Then the API can specify whether to gunzip when the blob is requested. Not only does this save space, but if you don't gunzip server side, the browser can do it on render and you're minimizing bytes over the wire as well as on disk.
opened by pfrazee 2
Multiwrite through multifeed?

Would it be possible to use https://github.com/kappa-db/multifeed or something similar to add multi-writer support? Would that break any key assumptions in hyperbeedeebee?

opened by rjmackay 2

Unable to sort by index that is not at field position 0

It appears that if you create a new index with an array of fields, and then try to sort that collection by any field that is not at index 0, the query will break.

await db.collection('example').createIndex(['createdAt', 'example'])

await db.collection('example').insert({ example: 1, createdAt: new Date() })
await db.collection('example').insert({ example: 2, createdAt: new Date() })
await db.collection('example').insert({ example: 3, createdAt: new Date() })

const docs1 = await db.collection('example').find().sort('createdAt', -1) // Works
const docs2 = await db.collection('example').find().sort('example', -1) // This does not work

opened by hexadecible 1

Ability to support Autobee

Hello Mauve, Please find here is a first draft. Still need to write Autobee specific test cases. Please have a look when you get the chance and review and I welcome any critics. Thanks, Jamps

opened by Jampard 0
Support Autobee, second attempt

Nice. Thank you for looking into this.

Is the purpose of having this new "base" parameter everywhere to overcome issues with autobee.sub not working correctly? Would it make sense to instead require autobee to have that method implemented correctly so that we don't need to add a special case for it?

It would be nice if autobee exposed an API similar to Hyperbee so that it could be passed into the hyperbeedeebee constructor without extra effort.

Also, mind removing the yarn.lock and adding it to the gitignore? Generally lockfiles are needed for applications rather than reusable libraries.

Hello Mauve, Thanks a lot for the comment. I didn't thought it was possible to reach the same API, because of the sub function. I gave it a go and it worked! All the changes are now in the autodeebee.js file (ex SimpleAutobee) Still got more tests to add specifically to Autobase. I tried it in my app and it works so it sounds good.

Cheers, Jamps

opened by Jampard 3

Owner

Into distributed systems, moving data between peers, and mixed reality. Fediverse: @[email protected]

GitHub

Database manager for MySQL, PostgreSQL, SQL Server, MongoDB, SQLite and others. Runs under Windows, Linux, Mac or as web application

2k Dec 30, 2022

The Wholesome App. A project that allows you to upload images directly to MongoDB Atlas into your collection, a faster cloud database.

The Wholesome App. A project that allows you to upload images directly to MongoDB Atlas into your collection, a faster cloud database. To upload your cute and wholesome images.

2 Jul 17, 2022

A node.js locks library with support of Redis and MongoDB

locco A small and simple library to deal with race conditions in distributed systems by applying locks on resources. Currently, supports locking via R

5 Dec 13, 2022

A Node.js ORM for MySQL, SQLite, PostgreSQL, MongoDB, GitHub and serverless service like Deta, InspireCloud, CloudBase, LeanCloud.

Dittorm A Node.js ORM for MySQL, SQLite, PostgreSQL, MongoDB, GitHub and serverless service like Deta, InspireCloud, CloudBase, LeanCloud. Installatio

21 Dec 25, 2022

A simple UI client for most SQL Engines written in Electron. It is compatible with Windows, Mac, Ubuntu / Debian and Redhat. It supports most dialects of RMBDs like MySQL, Microsoft SQL Server, Postgres, SQLite and has limited supports for Cassandra, MongoDB and Redis.

sqlui-native sqlui-native is a simple UI client for most SQL Engines written in Electron. It is compatible with most desktop OS's and support most dia

49 Jan 1, 2023

🔥 Dreamy-db - A Powerful database for storing, accessing, and managing multiple database.

Dreamy-db About Dreamy-db - A Powerful database for storing, accessing, and managing multiple databases. A powerful node.js module that allows you to

24 Dec 22, 2022

DolphinDB JavaScript API is a JavaScript library that encapsulates the ability to operate the DolphinDB database, such as: connecting to the database, executing scripts, calling functions, uploading variables, etc.

DolphinDB JavaScript API English | 中文 Overview DolphinDB JavaScript API is a JavaScript library that encapsulates the ability to operate the DolphinDB

6 Dec 12, 2022

Run SPARQL/SQL queries directly on Virtuoso database with connection pool support.

?? virtuoso-connector Package that allows you to create a direct connection to the Virtuoso database and run queries on it. Connection can be used to

6 Nov 15, 2022

A simple Node.js ORM for PostgreSQL, MySQL and SQLite3 built on top of Knex.js

bookshelf.js Bookshelf is a JavaScript ORM for Node.js, built on the Knex SQL query builder. It features both Promise-based and traditional callback i

6.3k Jan 2, 2023

Lovefield is a relational database for web apps. Written in JavaScript, works cross-browser. Provides SQL-like APIs that are fast, safe, and easy to use.

Lovefield Lovefield is a relational database written in pure JavaScript. It provides SQL-like syntax and works cross-browser (currently supporting Chr

6.8k Jan 3, 2023

A simple easy-to-use database, built for beginners.

ByteDatabase: Built for Beginners Table of Content Features Installation Changelog Quick Examples Contributors Features Persistent Storage: Data store

9 Nov 20, 2022

Illustration of issues around use of top-level await in Vite apps

vite-top-level-await-repro Illustration of issues around use of top-level await in Vite apps: https://github.com/vitejs/vite/issues/5013 Top-level awa

6 Apr 25, 2022

Plant trees to harvest apples. Sell apples to buy upgrades. Sell apples and wares to make the most money and top the leaderboard!

Happy Harvesters Apple Orchard Game A browser based game that is also mobile friendly so you can finally say "How about THEM apples". Table of Content

7 May 21, 2022

MongoDB object modeling designed to work in an asynchronous environment.

Mongoose Mongoose is a MongoDB object modeling tool designed to work in an asynchronous environment. Mongoose supports both promises and callbacks. Do

25.2k Dec 30, 2022

The Official MongoDB Node.js Driver

MongoDB NodeJS Driver The official MongoDB driver for Node.js. NOTE: v3.x released with breaking API changes. You can find a list of changes here. Ver

9.6k Dec 28, 2022

TypeScript ORM for Node.js based on Data Mapper, Unit of Work and Identity Map patterns. Supports MongoDB, MySQL, MariaDB, PostgreSQL and SQLite databases.

TypeScript ORM for Node.js based on Data Mapper, Unit of Work and Identity Map patterns. Supports MongoDB, MySQL, MariaDB, PostgreSQL and SQLite datab

5.4k Dec 31, 2022

A MongoDB-like database built on top of Hyperbee with support for indexing

Related tags

Overview

hyperbeedeebee

Usage

Data Types

TODO:

Important Differences From MongoDB

Indexing considerations:

Comments

Account for BSON encoding lower bits from int64

Add new index key generation code using CBOR

Simple find() receives error

Blobs support

Multiwrite through multifeed?

Unable to sort by index that is not at field position 0

Ability to support Autobee

Support Autobee, second attempt

Owner

Database manager for MySQL, PostgreSQL, SQL Server, MongoDB, SQLite and others. Runs under Windows, Linux, Mac or as web application

The Wholesome App. A project that allows you to upload images directly to MongoDB Atlas into your collection, a faster cloud database.

A node.js locks library with support of Redis and MongoDB

A Node.js ORM for MySQL, SQLite, PostgreSQL, MongoDB, GitHub and serverless service like Deta, InspireCloud, CloudBase, LeanCloud.

A simple UI client for most SQL Engines written in Electron. It is compatible with Windows, Mac, Ubuntu / Debian and Redhat. It supports most dialects of RMBDs like MySQL, Microsoft SQL Server, Postgres, SQLite and has limited supports for Cassandra, MongoDB and Redis.

🔥 Dreamy-db - A Powerful database for storing, accessing, and managing multiple database.

DolphinDB JavaScript API is a JavaScript library that encapsulates the ability to operate the DolphinDB database, such as: connecting to the database, executing scripts, calling functions, uploading variables, etc.

Run SPARQL/SQL queries directly on Virtuoso database with connection pool support.

A simple Node.js ORM for PostgreSQL, MySQL and SQLite3 built on top of Knex.js

Lovefield is a relational database for web apps. Written in JavaScript, works cross-browser. Provides SQL-like APIs that are fast, safe, and easy to use.

A simple easy-to-use database, built for beginners.

Illustration of issues around use of top-level await in Vite apps

Plant trees to harvest apples. Sell apples to buy upgrades. Sell apples and wares to make the most money and top the leaderboard!

MongoDB object modeling designed to work in an asynchronous environment.

The Official MongoDB Node.js Driver

TypeScript ORM for Node.js based on Data Mapper, Unit of Work and Identity Map patterns. Supports MongoDB, MySQL, MariaDB, PostgreSQL and SQLite databases.

🍹 MongoDB ODM for Node.js apps based on Redux

A high performance MongoDB ORM for Node.js

📇 Generates and parses MongoDB BSON UUIDs