A MongoDB-like database built on top of Hyperbee with support for indexing

Overview

hyperbeedeebee

A MongoDB-like database built on top of Hyperbee with support for indexing

WIP: There may be breaking changes in the indexing before the v1.0.0 release, don't use this for anything you don't mind migrating in the future.

Based on this design

Usage

npm i --save hyperbeedeebee
const Hyperbee = require('hyperbee')
// This module handles networking and storage of hypercores for you
const SDK = require('hyper-sdk')
const {DB} = require('hyperbeedeebee')

const {Hypercore} = await SDK()

// Initialize a hypercore for loading data
const core = new Hypercore('example')
// Initialize the Hyperbee you want to use for storing data and indexes
const bee = new Hyperbee(core)

// Create a new DB
const db = new DB(bee)

// Open up a collection of documents and insert a new document
const doc = await db.collection('example').insert({
  hello: 'World!'
})

// doc._id gets set to an ObjectId if you don't specify it
console.log(doc)

// Iterate through data as it's loaded (streaming)
// Usually faster and more memory / CPU efficient
for await (let doc of db.collection('example').find({
  clout: {
    $gt: 9000
  },
})) {
  console.log(doc)
}

// Create an index for properties in documents
// This drastically speeds up queries and is necessary for sorting by fields
await db.collection('example').createIndex('createdAt')

// Get all results in an array
// Can skip some results and limit total for pagination
const killbots = await db.collection('example')
  .find({type: 'killbot'})
  .sort('createdAt', -1)
  .skip(30)
  .limit(100)

// Get a single document that matches the query
const eggbert = await db.collection('example').findOne({name: 'Eggbert'})

Data Types

HyperbeeDeeBee uses MongoDB's BSON data types for encoding data. You can import the bson library bundled with HyperbeeDeeBee using the following code:

const { BSON } = require('hyperbeedeebee')

From there you can access any of the following data types:

Binary,
Code,
DBRef,
Decimal128,
Double,
Int32,
Long,
UUID,
Map,
MaxKey,
MinKey,
ObjectId,
BSONRegExp,
BSONSymbol,
Timestamp

TODO:

  • Sketch up API
  • Insert (with BSON encoding)
  • Find all docs
  • Find by _id
  • Find by field eq (no index)
  • Find by array field includes
  • Find by number field $gt/$gte/$lt/$lte
    • Numbers
    • Dates
  • Find using $in operator
  • Find using $all operator
  • Find using $exists operator
  • Index fields
  • Sort by index (with find)
  • Indexed find by field $eq
  • Flatten array for indexes
  • Get field values from index key without getting the doc
  • Find on fields that aren't indexed
  • Indexed find for $exists
  • Indexed find by number field
  • Indexed find for $in
  • Indexed find for $all
  • Hint API (specify index to use)
  • Test if iterators clean up properly
  • More efficient support for $gt/$gte/$lt/$lte indexes
  • More efficient support for $all indexes
  • More efficient support for $in indexes
  • Detect when data isn't available from peers and emit an error of some sort instead of waiting indefinately.

Important Differences From MongoDB

  • There is a single writer for a hyperbee and multiple readers
  • The indexing means that readers only need to download small subsets of the full dataset (if you index intelligently)
  • No way to do "projections" so keep in mind you're always downloading the full document to disk
  • Subset of find() API is implemented, no Map Reduce API, no $or/$and since it's difficult to optimize
  • You can only sort by indexed fields, otherwise there's no difference from loading all the data and sorting in memory
  • Fully open source under AGPL-3.0 and with mostly MIT dependencies.

Indexing considerations:

Indexes are super important to make your applications snappy and to reduce the overall CPU/Bandwidth/Storage usage of queries.

  • If you do a search by fields that aren't indexed, you'll end up downloading the full collection (this is potentially really slow)
  • The order of fields in the index matters, they're used to create an ordered key based on the values
  • If you want to sort by a field, make sure it's the first field in an index
  • You can have indexed fields before the sorted field if they are only used for $eq operations, this is due to the database's ability to turn them into a prefix to speed up the search.
  • If an index cannot be found to satisfy a sort the query will fail.
  • If you're using $gt/$lt/$gte/$lte in your query, they will perform best if the same considerations as the sort are applied.
  • If the fields in the index can be used to rule out a document as matching, then you can avoid loading more documents and doing fewer overall comparisons on data.
  • If your field is a unicode string which has 0x00 bytes in it, then the sorting might break due to the way BSON serializes unicode strings. Proceed with caution!
Comments
  • Account for BSON encoding lower bits from int64

    Account for BSON encoding lower bits from int64

    int64 data (like Dates, or Longs) is currently encoded with the lower bits first (e.g. 4 lowest bytes, then 4 highest bytes). source

    This is pretty much the exact opposite of what we need.

    This breaks ordered searching in indexes. We've glossed over this by having dates created quickly enough in tests that usually only the lowest of the low bits would change within the span of the test.

    I think we're going to need to ditch BSON for the key generation and jump into custom logic. 🤷

    This'll require a breaking change.

    bug 
    opened by RangerMauve 7
  • Add new index key generation code using CBOR

    Add new index key generation code using CBOR

    Fixes #6

    Added a new index version v2.0 type which uses CBOR to generate index keys.

    Had to do some funky stuff to make sure the key prefix is correct for $eq and $gt queries since cbor does length prefixing for it's lists (not sure why that wasn't an issue for BSON 🤔)

    opened by RangerMauve 2
  • Simple find() receives error

    Simple find() receives error

    Simple code gives following error:

    code:

    	const doc = await db.collection('example').insert({cid: cid})
    
    	const docs = await db.collection('example').find({cid: cid})
    

    error:

    (node:12899) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'equals' of undefined
        at compareEq (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:595:30)
        at queryCompare (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:547:17)
        at matchesQuery (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:533:10)
        at processDoc (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:434:15)
        at Cursor.[Symbol.asyncIterator] (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:503:48)
        at processTicksAndRejections (internal/process/task_queues.js:95:5)
        at async Cursor.then (/Users/mgrok/Projects/unicode/reed/threads-server/node_modules/hyperbeedeebee/index.js:383:24)
    (Use `node --trace-warnings ...` to show where the warning was created)
    
    opened by moskalyk 2
  • Blobs support

    Blobs support

    Opening this to track the idea.

    This is a feature that could live here or at a higher layer, but one thing I found useful in CTZN was support for blobs which could be attached to records. That way you could share things like profile pics and post attachments under the same "logical db."

    In CTZN, I implemented this in basically the same way hyperdrive does it - I had a second hypercore which contained the blobs, and then included records in the bee which indexed into the blobs. Hyperbee's metadata has a field for a contentFeed which we could use for this.

    The challenges to something like this:

    • It breaks from the standard hyperbee URLs a bit, because - wherever you place the blob pointer record - you want that to be able to resolve to the underlying blob. Something like hyper://key/my-record/1/blobs/1 which outputs the binary.
    • On record deletes, you need to be sure to delete the attached blobs as well, or else they'll accumulate indefinitely.

    An interesting aside -- it would make a lot of sense to gzip blobs when they're stored. Then the API can specify whether to gunzip when the blob is requested. Not only does this save space, but if you don't gunzip server side, the browser can do it on render and you're minimizing bytes over the wire as well as on disk.

    opened by pfrazee 2
  • Multiwrite through multifeed?

    Multiwrite through multifeed?

    Would it be possible to use https://github.com/kappa-db/multifeed or something similar to add multi-writer support? Would that break any key assumptions in hyperbeedeebee?

    opened by rjmackay 2
  • Unable to sort by index that is not at field position 0

    Unable to sort by index that is not at field position 0

    It appears that if you create a new index with an array of fields, and then try to sort that collection by any field that is not at index 0, the query will break.

    await db.collection('example').createIndex(['createdAt', 'example'])
    
    await db.collection('example').insert({ example: 1, createdAt: new Date() })
    await db.collection('example').insert({ example: 2, createdAt: new Date() })
    await db.collection('example').insert({ example: 3, createdAt: new Date() })
    
    const docs1 = await db.collection('example').find().sort('createdAt', -1) // Works
    const docs2 = await db.collection('example').find().sort('example', -1) // This does not work
    
    opened by hexadecible 1
  • Ability to support Autobee

    Ability to support Autobee

    Hello Mauve, Please find here is a first draft. Still need to write Autobee specific test cases. Please have a look when you get the chance and review and I welcome any critics. Thanks, Jamps

    opened by Jampard 0
  • Support Autobee, second attempt

    Support Autobee, second attempt

    Nice. Thank you for looking into this.

    Is the purpose of having this new "base" parameter everywhere to overcome issues with autobee.sub not working correctly? Would it make sense to instead require autobee to have that method implemented correctly so that we don't need to add a special case for it?

    It would be nice if autobee exposed an API similar to Hyperbee so that it could be passed into the hyperbeedeebee constructor without extra effort.

    Also, mind removing the yarn.lock and adding it to the gitignore? Generally lockfiles are needed for applications rather than reusable libraries.

    Hello Mauve, Thanks a lot for the comment. I didn't thought it was possible to reach the same API, because of the sub function. I gave it a go and it worked! All the changes are now in the autodeebee.js file (ex SimpleAutobee) Still got more tests to add specifically to Autobase. I tried it in my app and it works so it sounds good.

    Cheers, Jamps

    opened by Jampard 3
Owner
Into distributed systems, moving data between peers, and mixed reality. Fediverse: @[email protected]
null
Database manager for MySQL, PostgreSQL, SQL Server, MongoDB, SQLite and others. Runs under Windows, Linux, Mac or as web application

Database manager for MySQL, PostgreSQL, SQL Server, MongoDB, SQLite and others. Runs under Windows, Linux, Mac or as web application

DbGate 2k Dec 30, 2022
The Wholesome App. A project that allows you to upload images directly to MongoDB Atlas into your collection, a faster cloud database.

The Wholesome App. A project that allows you to upload images directly to MongoDB Atlas into your collection, a faster cloud database. To upload your cute and wholesome images.

Gourav Singh Rawat 2 Jul 17, 2022
A node.js locks library with support of Redis and MongoDB

locco A small and simple library to deal with race conditions in distributed systems by applying locks on resources. Currently, supports locking via R

Bohdan 5 Dec 13, 2022
A Node.js ORM for MySQL, SQLite, PostgreSQL, MongoDB, GitHub and serverless service like Deta, InspireCloud, CloudBase, LeanCloud.

Dittorm A Node.js ORM for MySQL, SQLite, PostgreSQL, MongoDB, GitHub and serverless service like Deta, InspireCloud, CloudBase, LeanCloud. Installatio

Waline 21 Dec 25, 2022
🔥 Dreamy-db - A Powerful database for storing, accessing, and managing multiple database.

Dreamy-db About Dreamy-db - A Powerful database for storing, accessing, and managing multiple databases. A powerful node.js module that allows you to

Dreamy Developer 24 Dec 22, 2022
DolphinDB JavaScript API is a JavaScript library that encapsulates the ability to operate the DolphinDB database, such as: connecting to the database, executing scripts, calling functions, uploading variables, etc.

DolphinDB JavaScript API English | 中文 Overview DolphinDB JavaScript API is a JavaScript library that encapsulates the ability to operate the DolphinDB

DolphinDB 6 Dec 12, 2022
Run SPARQL/SQL queries directly on Virtuoso database with connection pool support.

?? virtuoso-connector Package that allows you to create a direct connection to the Virtuoso database and run queries on it. Connection can be used to

Tomáš Dvořák 6 Nov 15, 2022
A simple Node.js ORM for PostgreSQL, MySQL and SQLite3 built on top of Knex.js

bookshelf.js Bookshelf is a JavaScript ORM for Node.js, built on the Knex SQL query builder. It features both Promise-based and traditional callback i

Bookshelf.js 6.3k Jan 2, 2023
Lovefield is a relational database for web apps. Written in JavaScript, works cross-browser. Provides SQL-like APIs that are fast, safe, and easy to use.

Lovefield Lovefield is a relational database written in pure JavaScript. It provides SQL-like syntax and works cross-browser (currently supporting Chr

Google 6.8k Jan 3, 2023
A simple easy-to-use database, built for beginners.

ByteDatabase: Built for Beginners Table of Content Features Installation Changelog Quick Examples Contributors Features Persistent Storage: Data store

CloudTeam 9 Nov 20, 2022
Illustration of issues around use of top-level await in Vite apps

vite-top-level-await-repro Illustration of issues around use of top-level await in Vite apps: https://github.com/vitejs/vite/issues/5013 Top-level awa

Rich Harris 6 Apr 25, 2022
Plant trees to harvest apples. Sell apples to buy upgrades. Sell apples and wares to make the most money and top the leaderboard!

Happy Harvesters Apple Orchard Game A browser based game that is also mobile friendly so you can finally say "How about THEM apples". Table of Content

Scott Rohrig 7 May 21, 2022
MongoDB object modeling designed to work in an asynchronous environment.

Mongoose Mongoose is a MongoDB object modeling tool designed to work in an asynchronous environment. Mongoose supports both promises and callbacks. Do

Automattic 25.2k Dec 30, 2022
The Official MongoDB Node.js Driver

MongoDB NodeJS Driver The official MongoDB driver for Node.js. NOTE: v3.x released with breaking API changes. You can find a list of changes here. Ver

mongodb 9.6k Dec 28, 2022
TypeScript ORM for Node.js based on Data Mapper, Unit of Work and Identity Map patterns. Supports MongoDB, MySQL, MariaDB, PostgreSQL and SQLite databases.

TypeScript ORM for Node.js based on Data Mapper, Unit of Work and Identity Map patterns. Supports MongoDB, MySQL, MariaDB, PostgreSQL and SQLite datab

MikroORM 5.4k Dec 31, 2022
🍹 MongoDB ODM for Node.js apps based on Redux

Lightweight and flexible MongoDB ODM for Node.js apps based on Redux. Features Flexible Mongorito is based on Redux, which opens the doors for customi

Vadim Demedes 1.4k Nov 30, 2022
A high performance MongoDB ORM for Node.js

Iridium A High Performance, IDE Friendly ODM for MongoDB Iridium is designed to offer a high performance, easy to use and above all, editor friendly O

Sierra Softworks 570 Dec 14, 2022
📇 Generates and parses MongoDB BSON UUIDs

uuid-mongodb Generates and parses BSON UUIDs for use with MongoDB. BSON UUIDs provide better performance than their string counterparts. Inspired by @

Carmine DiMascio 96 Nov 21, 2022