Node.js implementation binding for the RWKV.cpp module

Overview

RWKV.cpp NodeJS bindings

Arguably the easiest way to get RWKV.cpp running on node.js.

# Install globally
npm install -g rwkv-cpp-node

# This will start the interactive CLI, 
# which will guide you in downloading, and running the chat model
rwkv-cpp-node

This is not a pure JS solution, and depends on the precompiled RWKV.cpp binaries found here

This currently runs purely on your CPU, while that means you can use nearly anything to run it, you also do not get any insane speed up with a GPU (yet)

What is RWKV?

RWKV, is a LLM which which can switch between "transformer" and "RNN" mode.

This gives it the best of both worlds

  • High scalable training in transformer
  • Low overheads when infering each token in RNN mode

Along with the following benefits

  • Theoretically Infinite context size
  • Embedding support via hidden states

For more details on the math involved, and how this model works on a more technical basis. Refer to the official project

JS CLI demo

If you just want to give it a spin, the fastest way is to use npm. First perform the setup (it will download the RWKV files into your home directory)

# Install globally
npm install -g rwkv-cpp-node

# First run the setup
rwkv-cpp-node --setup

You can then choose a model to download ...

--setup call detected, starting setup process...
RWKV model will be downloaded into ~/.rwkv/
? Select a RWKV raven model to download:  (Use arrow keys)
❯ RWKV raven 1B5 v11 (Small, Fast) - 2.82 GB 
  RWKV raven 7B v11 (Q8_0) - 8.09 GB 
  RWKV raven 7B v11 (Q8_0, multilingual, performs slightly worse for english) - 8.09 GB 
  RWKV raven 14B v11 (Q8_0) - 15.25 GB 
  RWKV Pile 169M (Q8_0, lacks instruct tuning, use only for testing) - 0.24 GB 

PS: The file size equals to the approximate amount of storage and ram your system needs

Subsequently, you can run the interactive chat mode

# Load the interactive chat
rwkv-cpp-node

Which would start an interactive shell session, with something like the following

--------------------------------------
Starting RWKV chat mode
--------------------------------------
Loading model from /root/.rwkv/raven_1b5_v11.bin ...
The following is a conversation between Bob the user and Alice the chatbot.
--------------------------------------
? Bob:  Hi
Alice:  How can I help you?
? Bob:  Tell me something interesting about ravens
Alice:  RAVEN. I am most fascinated by the raven because of its incredible rate of survival. Ravens have been observed to live longer than any other bird, rumored to reach over 200 years old. They have the ability to live for over 1,000 years, a remarkable feat. This makes them the odd man out among birds!

PS: RWKV like all chat models, can and do lie about stuff.

Finally if you want to run a custom model, or just run the benchmark

# If you want to run with a pre downloaded model
rwkv-cpp-node --modelPath "<path to the model bin file>"

# If you want to run the "--dragon" prompt benchmark
rwkv-cpp-node --dragon
rwkv-cpp-node --modelPath "<path to the model bin file>" --dragon

JS Lib Setup

Install the node module

npm i rwkv-cpp-node

Download one of the prequantized rwkv.cpp weights, from hugging face (raven, is RWKV pretrained weights with fine-tuned instruction sets)

Alternatively you can download one of the raven pretrained weights from the hugging face repo. And perform your own quantization conversion using the original rwkv.cpp project

JS Usage

const RWKV = require("RWKV-cpp-node");

// Load the module with the pre-qunatized cpp weights
const raven = new RWKV("<path-to-your-model-bin-files>")

// Call the completion API
let res = raven.completion("RWKV is a")

// And log, or do something with the result
console.log( res.completion )

Advance setup options

// You can setup with the following parameters with a config object (instead of a string path)
const raven = new RWKV({
	// Path to your cpp weights
	path: "<path-to-your-model-bin-files>",

	// Threads count to use, this is auto detected based on your number of vCPU
	// if its not configured
	threads: 8,

	//
	// Cache size for the RKWV state, This help optimize the repeated RWKV calls
	// in use cases such as "conversation", allow it to skip the previous chat computation
	//
	// it is worth noting that the 7B model takes up about 2.64 MB for the state buffer, 
	// meaning you will need atleast 264 MB of RAM for a cachesize of 100
	//
	// This defaults to 50
	// Set to false or 0 to disable
	//
	stateCacheSize: 50
});

Completion API options

// Lets perform a completion, with more options
let res = raven.completion({

	// The prompt to use
	prompt: "<prompt str>",

	// Completion default settings
	// See openai docs for more details on what these do for your output if you do not understand them
	// https://platform.openai.com/docs/api-reference/completions
	max_tokens: 64,
	temperature: 1.0,
	top_p: 1.0,
	stop: [ "\n" ],

	// Streaming of output, either token by token, or the full complete output stream
	streamCallback: function(tokenStr, fullCompletionStr) {
		// ....
	},

	// Existing RWKV hidden state, represented as a Flaot32Array
	// do not use this unless you REALLY KNOW WHAT YOUR DOING
	//
	// This will skip the state caching logic 
	initState: (Special Float32Array)
});

// Additionally if you have a commonly reused instruction set prefix, you can preload this
// using either of the following (requires the stateCacheSize to not be disabled)
raven.preloadPrompt( "<prompt prefix string>" )
raven.completion({ prompt:"<prompt prefix string>", max_tokens:0 })

Completion output format

// The following is a sample of the result object format
let resFormat = {
	// Completion generated
	completion: '<completion string used>',

	// Prompt used
	prompt: '<prompt string used>',

	// Token usage numbers
	usage: {
		promptTokens: 41,
		completionTokens: 64,
		totalTokens: 105,
		// number of tokens in the prompt that was previously cached
		promptTokensCached: 39 
	},

	// Performance statistics of the completion operation
	//
	// the following perf numbers is from a single 
	// `Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz`
	// an old 2014 processor, with 28 vCPU 
	// with the 14B model Q8_0 quantized
	// 
	perf: {
		// Time taken in ms for each segment
		promptTime: 954,
		completionTime: 35907,
		totalTime: 36861,

		// Time taken in ms to process each token at the respective phase
		timePerPrompt: 477, // This excludes cached tokens
		timePerCompletion: 561.046875,
		timePerFullPrompt: 23.26829268292683, // This includes cached tokens (if any)

		// The average tokens per second
		promptPerSecond: 2.0964360587002098, // This excludes cached tokens
		completionPerSecond: 1.7823822652964603,
		fullPromptPerSecond: 42.9769392033543 // This includes cached tokens (if any)
	}
}

What can be improved?

Known issues

  • You need macOS 12 and above

How to run the unit test?

npm run test

Clarification notes

Why didn't you cache the entire prompt?

I intentionally did not cache the last 2 tokens, to avoid sub-optimal performance when the prompt strings, should have been merged as a single token, which would have impacted the quality of result.

For example "Hello" represents a single token of 12092

However if I cached every prompt blindly in full, if you performed multiple calls, character by character. Each subsequent call would continue from the previous cached result in its "incomplet form".

As a result when you finally call "Hello", it can end up consisting of 5 tokens, with 1 character each. (aka ["H","e","l","l","o"]) This leads to extreamly unexpected behaviour in the quality of the model output.

While the example is an extreame case, there are smaller scale off-by-1 example regarding whitespace.

Designated maintainer

@picocreator - is the current maintainer of the project, ping him on the RWKV discord if you have any questions on this project

Special thanks & refrences

@saharNooby - original rwkv.cpp implementation

@BlinkDL - for the main rwkv project

You might also like...

Fintoc.js ES Module - Use the Fintoc widget as an ES module

Fintoc.js ES Module Use the Fintoc widget as an ES module. Installation Install using npm! (or your favourite package manager) # Using npm npm install

May 13, 2022

Template Repository for making your own budder Module. CORE is not included, this is just for the module.

A quick copy of the "How to make your own module" section Check out the official budderAPI repository Template Repository for making your own budder M

Apr 3, 2022

A module federation SDK which is unrelated to tool chain for module consumer.

A module federation SDK which is unrelated to tool chain for module consumer.

hel-micro, 模块联邦sdk化,免构建、热更新、工具链无关的微模块方案 Demo hel-loadash codesandbox hel-loadash git Why hel-micro 如何使用远程模块 仅需要一句npm命令即可载入远程模块,查看下面例子线上示例 1 安装hel-micr

Jan 3, 2023

Inter Process Communication Module for node supporting Unix sockets, TCP, TLS, and UDP. Giving lightning speed on Linux, Mac, and Windows. Neural Networking in Node.JS

Inter Process Communication Module for node supporting Unix sockets, TCP, TLS, and UDP. Giving lightning speed on Linux, Mac, and Windows. Neural Networking in Node.JS

Inter Process Communication Module for node supporting Unix sockets, TCP, TLS, and UDP. Giving lightning speed on Linux, Mac, and Windows. Neural Networking in Node.JS

Dec 9, 2022

A lightweight Nano Node implementation made for wallets, exchanges and other services.

About This is a Light Nano Node implementation made for Wallets, Exchanges and other services. This Node has been built to be compatible with the offi

Jun 25, 2022

A Node.js client & server implementation of the WAMP-like RPC-over-websocket system defined in the OCPP-J protcols.

A Node.js client & server implementation of the WAMP-like RPC-over-websocket system defined in the OCPP-J protcols.

OCPP-RPC A client & server implementation of the WAMP-like RPC-over-websocket system defined in the OCPP-J protcols (e.g. OCPP1.6J and OCPP2.0.1J). Re

Dec 30, 2022

API implementation for the TERA Online retail server (patch 92/100) on Node.js.

tera-api API implementation for the TERA Online retail server (patch 92/100) on Node.js. The API consists of two independent servers running on differ

Nov 30, 2022

Node.js implementation of the Socket SDK client

SYNOPSIS A Node.js adapter for the Socket SDK DESCRIPTION Socket SDK uses a simple uri-based protocol for brokering messages between the render proces

Dec 28, 2022
Owner
RWKV
RWKV is an RNN with Transformer-level LLM performance
RWKV
Grupprojekt för kurserna 'Javascript med Ramverk' och 'Agil Utveckling'

JavaScript-med-Ramverk-Laboration-3 Grupprojektet för kurserna Javascript med Ramverk och Agil Utveckling. Utvecklingsguide För information om hur utv

Svante Jonsson IT-Högskolan 3 May 18, 2022
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
An npm package for demonstration purposes using TypeScript to build for both the ECMAScript Module format (i.e. ESM or ES Module) and CommonJS Module format. It can be used in Node.js and browser applications.

An npm package for demonstration purposes using TypeScript to build for both the ECMAScript Module format (i.e. ESM or ES Module) and CommonJS Module format. It can be used in Node.js and browser applications.

Snyk Labs 57 Dec 28, 2022
A Node.js binding to webview

webview-nodejs English | 中文(简体) A Node.js binding to webview, a tiny cross-platform webview library to build modern cross-platform desktop GUIs using

Winterreisender 12 Dec 13, 2022
A data-binding function for the DOM.

Alert: this library is now deprecated. s2 is its successor. It implements what simulacra does in a better way (using Proxy), and more. Simulacra.js Si

郑达里 541 Nov 18, 2022
An easy peasy UI binding library.

Peasy UI This is the repository for Peasy UI, a small-ish and relatively easy to use UI binding library. Introduction Peasy UI provides uncomplicated

null 8 Nov 8, 2022
Necktie – a simple DOM binding tool

?? Necktie – a simple DOM binding tool Necktie is a library that binds your logic to the Document Object Model elements in an easy way. It has only ~3

Maciej Leśniewski 43 Oct 7, 2022
Userland module that implements the module path mapping that Node.js does with "exports" in package.json

exports-map Userland module that implements the module path mapping that Node.js does with "exports" in package.json npm install exports-map Usage co

Mathias Buus 9 May 31, 2022
Node 18's node:test, as a node module

node-core-test This is a user-land port of node:test, the experimental test runner introduced in Node.js 18. This module makes it available in Node.js

Julian Gruber 62 Dec 15, 2022