Node starter kit for semantic-search. Uses Mighty Inference Server with Qdrant vector search.

Overview

Mighty Starter

This project provides a complete and working semantic search application, using Mighty Inference Server, Qdrant Vector Search, and an example Node.js express application.

Background

Getting started with vector search or semantic search is an enormous undertaking. It can take weeks or months to assemble various ad-hoc technologies before you have something that can actually run in production. Also, much of this tooling and knowledge is scattered around and it is difficult to know what to look for.

A vector search project involves understanding and tuning at all layers of the following:

  • Content acquisition and structure
  • Base model selection and testing
  • Inference runtime and model conversion
  • Vector search engine choice and config
  • Extract-transform-load) glue
  • Search UI and API
  • Docker composition

This application was created as a starter kit for all of the above, and provides a forkable Docker compose that you can quickly adapt to your own needs quickly.

You can use it to scrape a website's sitemap and have a complete search application up and running in minutes, or use the provided example content from https://outdoors.stackexchange.com (CC BY 4.0).

How to use it

Prerequisites

You'll need docker and a recent version of node.js (tested on v16).

There is zero Python required! The entire stack runs on Node and Rust technologies, and is very lightweight and fast.

The project has been tested and works well on Linux and Mac Intel. Mac M1 support is in development.

Installation

Simply clone this repository, then start the servers with docker compose up (or docker compose up -d to run in detached mode).

Example Outdoors content

With the docker systems running, you can infer and index the outdoors content by simply running ./index.sh

Index a website from a sitemap!

It's also possible to scrape and index any website that has a sitemap.xml file available. Simply run the following: ./website.sh [name] [https://example.com/sitemap.xml] (where [name] is any name you give and replace the example sitemap with your own.

What's inside?

  • Qdrant is used as the vector search engine
  • Mighty Inference Server is used for inference with the sentence-transformers model https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1
  • Node.js and Express form the basic Search UI and API
  • mighty-batch is used for text processing and ETL
  • some simple scripts (index.sh, website.sh, tools/load.js) to orchestrate scraping and loading

How fast is it?

Once indexed and running, the search is very fast. Requests return on average in about 20 to 50ms on a recent laptop.

Content indexing can take a little while (the bottleneck is usually the crawling of the site). On average you can expect between 200ms and 500ms per page. Inference can be accelerated by using a Mighty cluster and mighty-batch. See this post for more details: https://max.io/blog/encoding-the-federal-register.html

You might also like...

Remix starter kit with Tailwind CSS family of libraries: Headless UI, Radix UI, VechaiUI, daisyUI, and more

Remix starter kit with Tailwind CSS family of libraries: Headless UI, Radix UI, VechaiUI, daisyUI, and more

Remix Tailwind Starter Kit Remix starter kit with Tailwind CSS v3 family of libraries. Example demo to combine the best Tailwind-related ecosystem suc

Dec 18, 2022

A starter kit for beginners to obsidian

A starter kit for beginners to obsidian

This is a starter kit for beginners to obsidian. It is a simplified version of my vault with my daily and weekly note templates, the folder structure for my periodic notes (daily, weekly, monthly etc) and the plugin settings I use.

Dec 21, 2022

⏪ Rewinds – Remix Tailwind Starter Kit with Tailwind CSS, Headless UI, Radix UI, and more

⏪ Rewinds – Remix Tailwind Starter Kit with Tailwind CSS, Headless UI, Radix UI, and more

⏪ Rewinds – Remix Tailwind Starter Kit Rewinds is a Remix starter kit with Tailwind CSS v3 family of libraries. This is an example demo to combine the

Dec 24, 2022

A starter kit for scaffold-eth projects

🧰 scaffold-eth-cli As simple as running this in your terminal: npx scaffold-eth Clones scaffold-eth into the current folder as fast as possible ⚡️ ⁉

Jun 11, 2022

starter kit for interacting with zora api, protocol, and creator toolkit

This is a Next.js project bootstrapped with create-next-app. Getting Started First, run the development server: npm run dev # or yarn dev Open http://

Oct 20, 2022

Web Starter Kit - a workflow for multi-device websites

Web Starter Kit - a workflow for multi-device websites

Overview Web Starter Kit is an opinionated boilerplate for web development. Tools for building a great experience across many devices and performance

Dec 31, 2022

Simple lazy responsive starter kit for CraftCMS 4 Projects.

Simple lazy responsive starter kit for CraftCMS 4 Projects.

Lazy Craft CMS 4 Boilerplate Simple lazy responsive starter kit for CraftCMS 4 Projects. Requirements RTFM Craft CMS 4 Requirements Configs Duplicate

Sep 2, 2022

This project is based on my nodejs starter kit. Simple CRUD project.

nodejs-crud-project Author - Akhil Sharma This project uses the nodeJS-starter project on my github. Just a demo to show you could build any project w

Dec 16, 2022

This is a starter templete for svelte kit and maplibre.

This is a starter template for maplibre and svelte Clone this application by running git clone https://github.com/Thuhaa/svelte-maplibre-starter.git A

Nov 28, 2022
A flexible gateway for running ML inference jobs through cloud providers or your own GPU. Powered by Replicate and Cloudflare Workers.

Cogflare (Working title) Cogflare is a Cloudflare Workers application that aims to simplify running distributed ML inference jobs through a central AP

NightmareBot 14 Dec 12, 2022
A personal semantic search engine capable of surfacing relevant bookmarks, journal entries, notes, blogs, contacts, and more, built on an efficient document embedding algorithm and Monocle's personal search index.

Revery ?? Revery is a semantic search engine that operates on my Monocle search index. While Revery lets me search through the same database of tens o

Linus Lee 215 Dec 30, 2022
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
This repo is accompanying a tutorial that is meant to be a simple introduction to vector search JavaScript engineers who use MongoDB Atlas.

hello-vector-search A simple JavaScript program to run from your computer to vectorize the sample_mflix.movies collection: vectorize_collection.js. A

Marcus 7 Oct 23, 2022
A monorepo that uses the AWS Cloud Development Kit to deploy and configure nanomdm on AWS lambda.

NanoMDM on AWS This repo builds and configures a nanomdm server to run on AWS lambda. It uses the Cloud Development Kit and tries to follow best pract

Stevie Clifton 4 May 26, 2022
Bun-Bakery is a web framework for Bun. It uses a file based router in style like svelte-kit. No need to define routes during runtime.

Bun Bakery Bun-Bakery is a web framework for Bun. It uses a file based router in style like svelte-kit. No need to define routes during runtime. Quick

Dennis Dudek 44 Dec 6, 2022
Fully dockered starter kit for Elm with Hasura

elm-hasura-dockered This repo contains a Elm-Hasura starter kit for rapid+typesafe web application development on open source foundations. Elm is grea

Cies Breijs 41 Dec 9, 2022
This repository serves as a starter kit for doing simple TDD exercise

This repository serves as a starter kit for doing simple TDD exercise

adylanrff 3 Feb 19, 2022