A tiny search engine.

Daniel Lindsley

Last update: Aug 24, 2022

Related tags

Overview

`nanosearch`

A tiny search engine.

Suitable for in-browser use, this provides n-gram based search results.

Quickstart

import { SearchEngine } from '@toastdriven/nanosearch';

// Create a search engine.
const engine = new SearchEngine();

// Index some documents.
// First parameter is the unique document ID, second is the document text.
engine.add("abc", "The dog is a 'hot dog'.");
engine.add("def", "Dogs > Cats");
engine.add("ghi", "the quick brown fox jumps over the lazy dog");
engine.add("jkl", "Am I lazy, or just work smart?");

// Then, you can let the user search on the engine...
let myDogResults = engine.search("my dog");
myDogResults.count(); // 3

for(let res of myDogResults.iterator()) {
  console.log(res.docId); // ex: "def"
  console.log(res.score); // ex: 0.2727272727272727
}

// ...including limiting results (to just one)...
let lazyResults = engine.search("lazy");
let topResult = lazyResults.at(0);
console.log(topResult);

// ...or making pages of ten results!
let dogResults = engine.search("dogs");
let pageOne = dogResults.slice(0, 10);
let pageTwo = dogResults.slice(10, 20);
console.log(pageOne);
console.log(pageTwo);

Installation

$ npm install @toastdriven/nanosearch

Requirements

ES6 (or similar translation/polyfill)

Tests

$ git clone [email protected]:toastdriven/nanosearch.git
$ cd nanosearch
$ npm install
$ npm test

Docs

$ git clone [email protected]:toastdriven/nanosearch.git
$ cd nanosearch
$ npm install
$ ./node_modules/.bin/jsdoc -r -d ~/Desktop/out --package package.json --readme README.md src

License

New BSD

Comments

Changed searching to return a Results object instead.

A fundamental problem with the existing API when searching is that you don't get a total result count back. This is fine for auto-complete-style applications (where you're rarely going to be showing more than a handful of results), but makes providing paginated results more difficult.

This changes the SearchEngine.search API slightly, to drop the previous start & limit arguments, as well as returning a Results object instead of just a sliced array.

The new Results object provides an iterator, the total results count, specific offset access, and slicing (where the dropped start & limit can be re-applied) as it's API. This makes dealing with larger result sets more palatable, and offers the ability to provide the user better information.

Unfortunately, it's backward-incompatible. Sucks to have to do a major revision bump so soon, & poor planning on my part.

opened by toastdriven 0
Renamed the package to `nanosearch`.

Resolves #8.

Previously, this package was known as minisearch. However, there's already a long-standing package on NPM with the same name. And we're currently smaller than 500 lines (w/ docs!), so nanosearch it is.

opened by toastdriven 0
Rename package

I didn't look before naming the repository, but there's already a MiniSearch out there (https://www.npmjs.com/package/minisearch).

To avoid confusion, this should be renamed before the initial release.

opened by toastdriven 0
Added term position support.
Resolves #1, resolves #2, resolves #3 .

NOT BACKWARD COMPATIBLE! This should be the last backward-incompatible change before the proper 1.0.0 release.

This paves the way for better scoring, highlighting, etc. In the process of this work, we also gain better Unicode support, as well as index versioning & compatibility checks.

TODO:

[x] Added versioning to the index

[x] Rewrite the preprocessor process code, iterating over the document body & generating a list instead of effectively just .toLowerCase().replace().split()

[x] Change the preprocessor process api to emit a list of [[word, position], [word, position], ...] values

[ ] ~Change the tokenizer tokenize api to include the position as well~ Unneeded

[x] Change the index to store lists of positions instead of just the count

[x] Change scoring to check the length of the lists instead of just the count
opened by toastdriven 0
More/better tests

I cheated out on this due to time (there's only so much you can do in a single lunch hour!), so there's really only some integration-y tests right now. Actual unit tests for each method of the library would be good.

opened by toastdriven 0
Store term positions
A big deficiency with the existing code is that we're currently only storing the number of time a term appears in a document.

Ideally, we'd be storing a list of the positions of the terms instead. This would allow for better scoring (e.g. "how close are the terms") & better querying (e.g. exact matches).

Unfortunately, this is non-trivial to implement (& backward-incompatible):

[ ] Rewrite the preprocessor process code, iterating over the document body & generating a list instead of effectively just .toLowerCase().replace().split()

[ ] Change the preprocessor process api to emit a list of [[word, position], [word, position], ...] values

[ ] Change the tokenizer tokenize api to include the position as well

[ ] Change the index to store lists of positions instead of just the count

[ ] Change scoring to check the length of the lists instead of just the count
opened by toastdriven 0

Added benchmarks.

This is more experimental than something to directly merge.

Using the complete works of Shakespeare from Folger, this adds an in-browser benchmark. This comprises a 5.2Mb corpus as plain text.

The results on a recent MacBook look like:

Starting benchmark...
Indexing...
Done indexing 42 documents.
Elapsed: 1.018 seconds

Searching...
Searching on "what light"...
Found 42 results.
Top 3 Results: lucrece_TXT_FolgerShakespeare.txt, shakespeares-sonnets_TXT_FolgerShakespeare.txt, romeo-and-juliet_TXT_FolgerShakespeare.txt

Searching on "romeo"...
Found 42 results.
Top 3 Results: romeo-and-juliet_TXT_FolgerShakespeare.txt, the-comedy-of-errors_TXT_FolgerShakespeare.txt, titus-andronicus_TXT_FolgerShakespeare.txt

Searching on "hamlet"...
Found 42 results.
Top 3 Results: hamlet_TXT_FolgerShakespeare.txt, romeo-and-juliet_TXT_FolgerShakespeare.txt, the-merchant-of-venice_TXT_FolgerShakespeare.txt

Searching on "denmark"...
Found 42 results.
Top 3 Results: hamlet_TXT_FolgerShakespeare.txt, titus-andronicus_TXT_FolgerShakespeare.txt, pericles_TXT_FolgerShakespeare.txt

Searching on "the fool"...
Found 42 results.
Top 3 Results: the-phoenix-and-turtle_TXT_FolgerShakespeare.txt, henry-v_TXT_FolgerShakespeare.txt, twelfth-night_TXT_FolgerShakespeare.txt

Searching on "whether tis nobler to suffer"...
Found 42 results.
Top 3 Results: the-phoenix-and-turtle_TXT_FolgerShakespeare.txt, macbeth_TXT_FolgerShakespeare.txt, venus-and-adonis_TXT_FolgerShakespeare.txt

Searching on "shepherdess"...
Found 42 results.
Top 3 Results: the-winters-tale_TXT_FolgerShakespeare.txt, the-comedy-of-errors_TXT_FolgerShakespeare.txt, venus-and-adonis_TXT_FolgerShakespeare.txt

Done searching.
Elapsed: 0.003 seconds (for all queries)


Total terms: 5836
Total terms starting with "a": 370
First 10 "a" terms: ado, abo, ake, are, arb, ara, a, and, aul, ael
Index size (as JSON): 18.8 Mb

Searches are returning all documents (due to a lack of stop words & indexing some tiny/common n-grams). Regardless, performance is very decent.

opened by toastdriven 0

LocalStorage and/or IndexedDB support

Being able to use the index between page refreshes would be ideal. Some kind of local storage within the user's browser.

LocalStorage is easy, but has size restrictions. IndexedDB lacks those size restrictions, but I've never poked at it.

opened by toastdriven 0

Owner

Daniel Lindsley

I program computers, make music, play games, curate animated gifs. Allergic to seriousness. He/Him

GitHub https://toastdriven.github.io/nanosearch/

An efficient (and the fastest!) way to search the web privately using Brave Search Engine

Brave Search An efficient (and the fastest) way to search the web privately using Brave Search Engine. Not affiliated with Brave Search. Tested on Chr

7 Jun 2, 2022

Tiny and powerful JavaScript full-text search engine for browser and Node

MiniSearch MiniSearch is a tiny but powerful in-memory fulltext search engine written in JavaScript. It is respectful of resources, and it can comfort

2k Jan 3, 2023

A tiny search engine.

nanosearch A tiny search engine. Suitable for in-browser use, this provides n-gram based search results. Quickstart import { SearchEngine } from '@toa

10 Aug 24, 2022

A tiny search engine.

nanosearch A tiny search engine. Suitable for in-browser use, this provides n-gram based search results. Quickstart import { SearchEngine } from '@toa

10 Aug 24, 2022

Tesodev-search-app - Personal Search App with React-Hooks

Tesodev-search-app Personal Search App with React-Hooks View on Heroku : [https://tesodev-staff-search-app.herokuapp.com/] Instructions Clone this rep

1 Nov 10, 2022

Instant spotlight like search and actions in your browser with Sugu Search.

Sugu Search Instant spotlight like search and actions in your browser with Sugu Search. Developed by Drew Hutton Grab it today for Firefox and Chrome

9 Oct 12, 2022

🍭 search-buddy ultra lightweight javascript plugin that can help you create instant search and/or facilitate navigation between pages.

?? search-buddy search-buddy is an open‑source ultra lightweight javascript plugin (* <1kb). It can help you create instant search and/or facilitate n

4 Jun 16, 2022

Node starter kit for semantic-search. Uses Mighty Inference Server with Qdrant vector search.

Mighty Starter This project provides a complete and working semantic search application, using Mighty Inference Server, Qdrant Vector Search, and an e

8 Oct 18, 2022

Allows users to quickly search highlighted items on Wikipedia. Inspired by the "search Wikipedia" function on the kindle mobile app.

wikipedia-search Allows users to quickly search highlighted items on Wikipedia. Inspired by the "search Wikipedia" function on the kindle mobile app.

18 Aug 15, 2022

A plugin for Obsidian (https://obsidian.md) that adds a button to its search view for copying the Obsidian search URL.

Copy Search URL This plugin adds a button to Obsidian's search view. Clicking it will copy the Obsidian URL for the current search to the clipboard. T

6 Dec 26, 2022

🟢 Music player app with a modern homepage, fully-fledged music player, search, lyrics, song exploration features, search, popular music around you, worldwide top charts, and much more.

Music-player-app see the project here. 1. Key Features 2. Technologies I've used Key Features: ?? Fully responsive clean UI. ?? Entirely mobile respo

3 Nov 16, 2022

A tiny, efficient, fuzzy search that doesn't suck

▒ μFuzzy A tiny, efficient, fuzzy search that doesn't suck Introduction This is my fuzzy ?? . There are many like it, but this one is mine. uFuzzy is

1.9k Dec 25, 2022

Search Engine for YouTuber Ali Abdaal's videos

Ali Abdaal Search Engine This is a personalized search engine for my favorite YouTubers, Ali Abdaal. I used selenium to scrape all his videos, youtube

24 Oct 14, 2022

Chappe - 🧑‍💻 Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine.

Chappe Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine. Chappe is a Developer Do

146 Jan 1, 2023

MLPleaseHelp is a simple ML resource search engine.

README MLPleaseHelp is a simple ML resource search engine. How To Use You can use this search engine right now at https://jgreenemi.github.io/MLPlease

5 Jan 20, 2021

🧑‍💻 Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine.

Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine. Chappe is a Developer Docs buil

146 Jan 1, 2023

A tiny search engine.

Related tags

Overview

nanosearch

Quickstart

Installation

Requirements

Tests

Docs

License

Comments

TODO:

Owner

Daniel Lindsley

An efficient (and the fastest!) way to search the web privately using Brave Search Engine

Tiny and powerful JavaScript full-text search engine for browser and Node

A tiny search engine.

A tiny search engine.

Tesodev-search-app - Personal Search App with React-Hooks

Instant spotlight like search and actions in your browser with Sugu Search.

🍭 search-buddy ultra lightweight javascript plugin that can help you create instant search and/or facilitate navigation between pages.

Node starter kit for semantic-search. Uses Mighty Inference Server with Qdrant vector search.

Allows users to quickly search highlighted items on Wikipedia. Inspired by the "search Wikipedia" function on the kindle mobile app.

A plugin for Obsidian (https://obsidian.md) that adds a button to its search view for copying the Obsidian search URL.

🟢 Music player app with a modern homepage, fully-fledged music player, search, lyrics, song exploration features, search, popular music around you, worldwide top charts, and much more.

A tiny, efficient, fuzzy search that doesn't suck

Search Engine for YouTuber Ali Abdaal's videos

Chappe - 🧑‍💻 Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine.

MLPleaseHelp is a simple ML resource search engine.

🧑‍💻 Developer Docs builder. Write guides in Markdown and references in API Blueprint. Comes with a built-in search engine.

Yu-Gi-Oh! Card Search Engine

Omnisearch is a search engine that "just works".

Yu-Gi-Oh! Card Search Engine

`nanosearch`