A tiny search engine.

Overview

nanosearch

A tiny search engine.

Suitable for in-browser use, this provides n-gram based search results.

Quickstart

import { SearchEngine } from '@toastdriven/nanosearch';

// Create a search engine.
const engine = new SearchEngine();

// Index some documents.
// First parameter is the unique document ID, second is the document text.
engine.add("abc", "The dog is a 'hot dog'.");
engine.add("def", "Dogs > Cats");
engine.add("ghi", "the quick brown fox jumps over the lazy dog");
engine.add("jkl", "Am I lazy, or just work smart?");

// Then, you can let the user search on the engine...
let myDogResults = engine.search("my dog");
myDogResults.count(); // 3

for(let res of myDogResults.iterator()) {
  console.log(res.docId); // ex: "def"
  console.log(res.score); // ex: 0.2727272727272727
}

// ...including limiting results (to just one)...
let lazyResults = engine.search("lazy");
let topResult = lazyResults.at(0);
console.log(topResult);

// ...or making pages of ten results!
let dogResults = engine.search("dogs");
let pageOne = dogResults.slice(0, 10);
let pageTwo = dogResults.slice(10, 20);
console.log(pageOne);
console.log(pageTwo);

Installation

$ npm install @toastdriven/nanosearch

Requirements

  • ES6 (or similar translation/polyfill)

Tests

$ git clone [email protected]:toastdriven/nanosearch.git
$ cd nanosearch
$ npm install
$ npm test

Docs

$ git clone [email protected]:toastdriven/nanosearch.git
$ cd nanosearch
$ npm install
$ ./node_modules/.bin/jsdoc -r -d ~/Desktop/out --package package.json --readme README.md src

License

New BSD

Comments
  • Changed searching to return a Results object instead.

    Changed searching to return a Results object instead.

    A fundamental problem with the existing API when searching is that you don't get a total result count back. This is fine for auto-complete-style applications (where you're rarely going to be showing more than a handful of results), but makes providing paginated results more difficult.

    This changes the SearchEngine.search API slightly, to drop the previous start & limit arguments, as well as returning a Results object instead of just a sliced array.

    The new Results object provides an iterator, the total results count, specific offset access, and slicing (where the dropped start & limit can be re-applied) as it's API. This makes dealing with larger result sets more palatable, and offers the ability to provide the user better information.

    Unfortunately, it's backward-incompatible. Sucks to have to do a major revision bump so soon, & poor planning on my part.

    opened by toastdriven 0
  • Renamed the package to `nanosearch`.

    Renamed the package to `nanosearch`.

    Resolves #8.

    Previously, this package was known as minisearch. However, there's already a long-standing package on NPM with the same name. And we're currently smaller than 500 lines (w/ docs!), so nanosearch it is.

    opened by toastdriven 0
  • Rename package

    Rename package

    I didn't look before naming the repository, but there's already a MiniSearch out there (https://www.npmjs.com/package/minisearch).

    To avoid confusion, this should be renamed before the initial release.

    opened by toastdriven 0
  • Added term position support.

    Added term position support.

    Resolves #1, resolves #2, resolves #3 .

    NOT BACKWARD COMPATIBLE! This should be the last backward-incompatible change before the proper 1.0.0 release.

    This paves the way for better scoring, highlighting, etc. In the process of this work, we also gain better Unicode support, as well as index versioning & compatibility checks.

    TODO:

    • [x] Added versioning to the index
    • [x] Rewrite the preprocessor process code, iterating over the document body & generating a list instead of effectively just .toLowerCase().replace().split()
    • [x] Change the preprocessor process api to emit a list of [[word, position], [word, position], ...] values
    • [ ] ~Change the tokenizer tokenize api to include the position as well~ Unneeded
    • [x] Change the index to store lists of positions instead of just the count
    • [x] Change scoring to check the length of the lists instead of just the count
    opened by toastdriven 0
  • More/better tests

    More/better tests

    I cheated out on this due to time (there's only so much you can do in a single lunch hour!), so there's really only some integration-y tests right now. Actual unit tests for each method of the library would be good.

    opened by toastdriven 0
  • Store term positions

    Store term positions

    A big deficiency with the existing code is that we're currently only storing the number of time a term appears in a document.

    Ideally, we'd be storing a list of the positions of the terms instead. This would allow for better scoring (e.g. "how close are the terms") & better querying (e.g. exact matches).

    Unfortunately, this is non-trivial to implement (& backward-incompatible):

    • [ ] Rewrite the preprocessor process code, iterating over the document body & generating a list instead of effectively just .toLowerCase().replace().split()
    • [ ] Change the preprocessor process api to emit a list of [[word, position], [word, position], ...] values
    • [ ] Change the tokenizer tokenize api to include the position as well
    • [ ] Change the index to store lists of positions instead of just the count
    • [ ] Change scoring to check the length of the lists instead of just the count
    opened by toastdriven 0
  • Added benchmarks.

    Added benchmarks.

    This is more experimental than something to directly merge.

    Using the complete works of Shakespeare from Folger, this adds an in-browser benchmark. This comprises a 5.2Mb corpus as plain text.

    The results on a recent MacBook look like:

    Starting benchmark...
    Indexing...
    Done indexing 42 documents.
    Elapsed: 1.018 seconds
    
    Searching...
    Searching on "what light"...
    Found 42 results.
    Top 3 Results: lucrece_TXT_FolgerShakespeare.txt, shakespeares-sonnets_TXT_FolgerShakespeare.txt, romeo-and-juliet_TXT_FolgerShakespeare.txt
    
    Searching on "romeo"...
    Found 42 results.
    Top 3 Results: romeo-and-juliet_TXT_FolgerShakespeare.txt, the-comedy-of-errors_TXT_FolgerShakespeare.txt, titus-andronicus_TXT_FolgerShakespeare.txt
    
    Searching on "hamlet"...
    Found 42 results.
    Top 3 Results: hamlet_TXT_FolgerShakespeare.txt, romeo-and-juliet_TXT_FolgerShakespeare.txt, the-merchant-of-venice_TXT_FolgerShakespeare.txt
    
    Searching on "denmark"...
    Found 42 results.
    Top 3 Results: hamlet_TXT_FolgerShakespeare.txt, titus-andronicus_TXT_FolgerShakespeare.txt, pericles_TXT_FolgerShakespeare.txt
    
    Searching on "the fool"...
    Found 42 results.
    Top 3 Results: the-phoenix-and-turtle_TXT_FolgerShakespeare.txt, henry-v_TXT_FolgerShakespeare.txt, twelfth-night_TXT_FolgerShakespeare.txt
    
    Searching on "whether tis nobler to suffer"...
    Found 42 results.
    Top 3 Results: the-phoenix-and-turtle_TXT_FolgerShakespeare.txt, macbeth_TXT_FolgerShakespeare.txt, venus-and-adonis_TXT_FolgerShakespeare.txt
    
    Searching on "shepherdess"...
    Found 42 results.
    Top 3 Results: the-winters-tale_TXT_FolgerShakespeare.txt, the-comedy-of-errors_TXT_FolgerShakespeare.txt, venus-and-adonis_TXT_FolgerShakespeare.txt
    
    Done searching.
    Elapsed: 0.003 seconds (for all queries)
    
    
    Total terms: 5836
    Total terms starting with "a": 370
    First 10 "a" terms: ado, abo, ake, are, arb, ara, a, and, aul, ael
    Index size (as JSON): 18.8 Mb
    

    Searches are returning all documents (due to a lack of stop words & indexing some tiny/common n-grams). Regardless, performance is very decent.

    opened by toastdriven 0
  • LocalStorage and/or IndexedDB support

    LocalStorage and/or IndexedDB support

    Being able to use the index between page refreshes would be ideal. Some kind of local storage within the user's browser.

    LocalStorage is easy, but has size restrictions. IndexedDB lacks those size restrictions, but I've never poked at it.

    opened by toastdriven 0
Owner
Daniel Lindsley
I program computers, make music, play games, curate animated gifs. Allergic to seriousness. He/Him
Daniel Lindsley
An efficient (and the fastest!) way to search the web privately using Brave Search Engine

Brave Search An efficient (and the fastest) way to search the web privately using Brave Search Engine. Not affiliated with Brave Search. Tested on Chr

Jishan Shaikh 7 Jun 2, 2022
Tiny and powerful JavaScript full-text search engine for browser and Node

MiniSearch MiniSearch is a tiny but powerful in-memory fulltext search engine written in JavaScript. It is respectful of resources, and it can comfort

Luca Ongaro 2k Jan 3, 2023
A tiny search engine.

nanosearch A tiny search engine. Suitable for in-browser use, this provides n-gram based search results. Quickstart import { SearchEngine } from '@toa

Daniel Lindsley 10 Aug 24, 2022
A tiny search engine.

nanosearch A tiny search engine. Suitable for in-browser use, this provides n-gram based search results. Quickstart import { SearchEngine } from '@toa

Daniel Lindsley 10 Aug 24, 2022
Tesodev-search-app - Personal Search App with React-Hooks

Tesodev-search-app Personal Search App with React-Hooks View on Heroku : [https://tesodev-staff-search-app.herokuapp.com/] Instructions Clone this rep

Rahmi Köse 1 Nov 10, 2022
Instant spotlight like search and actions in your browser with Sugu Search.

Sugu Search Instant spotlight like search and actions in your browser with Sugu Search. Developed by Drew Hutton Grab it today for Firefox and Chrome

Drew Hutton (Yoroshi) 9 Oct 12, 2022
🍭 search-buddy ultra lightweight javascript plugin that can help you create instant search and/or facilitate navigation between pages.

?? search-buddy search-buddy is an open‑source ultra lightweight javascript plugin (* <1kb). It can help you create instant search and/or facilitate n

Michael 4 Jun 16, 2022
Node starter kit for semantic-search. Uses Mighty Inference Server with Qdrant vector search.

Mighty Starter This project provides a complete and working semantic search application, using Mighty Inference Server, Qdrant Vector Search, and an e

MAX.IO LLC 8 Oct 18, 2022
Allows users to quickly search highlighted items on Wikipedia. Inspired by the "search Wikipedia" function on the kindle mobile app.

wikipedia-search Allows users to quickly search highlighted items on Wikipedia. Inspired by the "search Wikipedia" function on the kindle mobile app.

Laith Alayassa 18 Aug 15, 2022
A plugin for Obsidian (https://obsidian.md) that adds a button to its search view for copying the Obsidian search URL.

Copy Search URL This plugin adds a button to Obsidian's search view. Clicking it will copy the Obsidian URL for the current search to the clipboard. T

Carlo Zottmann 6 Dec 26, 2022
🟢 Music player app with a modern homepage, fully-fledged music player, search, lyrics, song exploration features, search, popular music around you, worldwide top charts, and much more.

Music-player-app see the project here. 1. Key Features 2. Technologies I've used Key Features: ?? Fully responsive clean UI. ?? Entirely mobile respo

suraj ✨ 3 Nov 16, 2022
A tiny, efficient, fuzzy search that doesn't suck

▒ μFuzzy A tiny, efficient, fuzzy search that doesn't suck Introduction This is my fuzzy ?? . There are many like it, but this one is mine. uFuzzy is

Leon Sorokin 1.9k Dec 25, 2022
Search Engine for YouTuber Ali Abdaal's videos

Ali Abdaal Search Engine This is a personalized search engine for my favorite YouTubers, Ali Abdaal. I used selenium to scrape all his videos, youtube

Hassan El Mghari 24 Oct 14, 2022
Chappe - 🧑‍💻 Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine.

Chappe Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine. Chappe is a Developer Do

Valerian Saliou 146 Jan 1, 2023
MLPleaseHelp is a simple ML resource search engine.

README MLPleaseHelp is a simple ML resource search engine. How To Use You can use this search engine right now at https://jgreenemi.github.io/MLPlease

Joseph Greene 5 Jan 20, 2021
🧑‍💻 Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine.

Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine. Chappe is a Developer Docs buil

Crisp (OSS) 146 Jan 1, 2023
Yu-Gi-Oh! Card Search Engine

Yu-Gi-Oh! Card Search Engine Buscador de cartas de Yu-Gi-Oh, os resultados são apresentados em PT-BR. Algumas cartas podem não ser encontradas devido

Evandro Fadul 11 Apr 14, 2022
Omnisearch is a search engine that "just works".

Omnisearch for Obsidian Omnisearch is a search engine that "just works". Type what you're looking for, and it will instantly show you the most relevan

Simon Cambier 400 Jan 3, 2023
Yu-Gi-Oh! Card Search Engine

Yu-Gi-Oh! Card Search Engine Buscador de cartas de Yu-Gi-Oh, os resultados são apresentados em PT-BR. Algumas cartas podem não ser encontradas devido

Evandro Fadul 11 Apr 14, 2022