Fast Differentiable Tensor Library in JavaScript and TypeScript with Bun + Flashlight

Overview

shumai_logo_light shumai_logo_dark

A fast differentiable tensor library for research in TypeScript and JavaScript. Built with bun + flashlight. ⚠️ This is experimental software! ⚠️

build tests npm Discord GitHub GitHub contributors GitHub commit activity

Quickstart

Install Bun and ArrayFire, then run:

bun install @shumai/shumai

Only macOS and Linux are supported. Linux installs default to GPU computation with CUDA, and macOS to CPU. Detailed install instructions below.

Install is work in progress: please file an issue if you run into problems.

Why build this?

With Shumai, we hope to make

  • Creating datasets
    • JavaScript, with native typed arrays and a JIT compiler, is perfect for twiddling with data before it can be made into big, flat GPU-compatible arrays.
  • Training small models
    • FFI bindings in Bun are crazy fast (~3ns), so JS gets out of the way when training small models
  • Advanced/fine-grained training/inference logic
    • Bun uses the JSC JIT compiler, meaning you can confidently write complex training logic without needing a native C++ implementation
  • Building applications
    • JavaScript has a large HUGE ecosystem, which facilitates better application development

Usage

shumai will always attempt to use an attached GPU or accelerator; although CPU computation will use the ArrayFire CPU backend, which is not well-optimized.

We hope to support the ArrayFire OpenCL backend and other non-ArrayFire tensor backends soon.

If shumai seems unusually slow, please file an issue!

Standard array utilities:

import * as sm from "@shumai/shumai"

// create a 1024 by 1024 tensor, randomly filled with normal distribution
let X = sm.randn([1024, 1024])
let W = sm.identity(1024)
let Y = X.matmul(W)
console.log(Y.shape)

Conversion to and from JavaScript native arrays:

const data : Float32Array = new Float32Array(128)
for (let i = 0; i < 128; ++i) {
  data = Math.random()
}

const X : Tensor = sm.tensor(data)
const pi = sm.scalar(3.14)
const Y = X.mul(pi)

// tensors can be converted back to native JavaScript
const Y_data = Y.toFloat32Array()

// scalar tensors can be converted to JavaScript numbers
const total : number = X.sum().toFloat32()

Gradients:

const W = sm.randn([128, 128])
W.requires_grad = true

const X = sm.randn([128, 128])
const diff = X.sub(W)
const mse = diff.mul(diff).sum()
mse.backward()

W.grad // this gradient is now populated

// copy W without allowing gradient updates
const Y = W.detach()
Y.sum().backward() // nothing changes

Some more examples can be found here.

Supported operators can be found here.

Install

The install procedure is a work in progress! If you have any problems building or installing, we would greatly appreciate filed issues. Please tell us about your platform/OS when you do.

Prerequisites:

  • Ensure you have bun installed (https://bun.sh).
  • Install ArrayFire. macOS users should install ArrayFire's CPU backend; Linux users should install the CUDA backend^.
    • macOS --- ArrayFire can easily be installed with Homebrew:
    brew install arrayfire
    
  • Linux --- instructions can be found here. On Ubuntu, ArrayFire can be installed via package managers (e.g. apt).

Once bun and ArrayFire are installed, install the package and backing libs with bun:

bun install @shumai/shumai

^Linux users can use the CPU backend by swapping the required package.json dependency from @shumai/linux_x64_shumai_flashlight to @shumai/linux_x64_shumai_flashlight_cpu, i.e. running:

sed -i "s/linux_x64_shumai_flashlight/linux_x64_shumai_flashlight_cpu/g" package.json

Building Native Libraries from Source

Note: not required when developing TypeScript/Javascript library components locally.

From source build instructions for:

This process will build the dependent ffi libraries (libflashlight and libflashlight_binding) and pack them using npm pack to generate a @shumai/shumai_*.tgz package. You can then use npm install $PATH_TO_SOURCE/@shumai/shumai-*.tgz to install the package where you'd like.

Building on macOS from Source

First, install ArrayFire CPU with brew install arrayfire.

Build and install Flashlight:

mkdir -p $HOME/usr/ # installing flashlight here
git clone --recursive --depth 1 https://github.com/flashlight/flashlight.git
cd flashlight
mkdir -p build
cd build
cmake .. \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \ # or as specified
  -DFL_ARRAYFIRE_USE_CPU=ON \
  -DFL_ARRAYFIRE_USE_CUDA=OFF \
  -DFL_BUILD_DISTRIBUTED=OFF \
  -DFL_USE_ONEDNN=OFF \
  -DFL_BUILD_TESTS=OFF \
  -DFL_BUILD_EXAMPLES=OFF \
  -DFL_BUILD_SCRIPTS=OFF \
  -DCMAKE_INSTALL_PREFIX=$HOME/usr/
make -j$(nproc)
make install

Build Flashlight bindings for Shumai:

cd shumai
mkdir -p build
cd build
cmake .. -Dflashlight_DIR=$HOME/usr/share/flashlight/cmake/
make -j$(nproc)

Profiling

On macOS, you can record perf with xcrun xctrace record --template "Time Profiler" --launch $(which bun) train.js.

Building on Linux from Source

First install ArrayFire. The Linux build for shumai uses the CUDA backend, but from source, you can build the CPU backend as well (OpenCL support coming soon).

Build and install Flashlight:

mkdir -p $HOME/usr/ # installing flashlight here
git clone --recursive --depth 1 https://github.com/flashlight/flashlight.git
cd flashlight
mkdir -p build
cd build
cmake .. \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \ # or as specified
  -DFL_ARRAYFIRE_USE_CPU=OFF \
  \ # swap with the above to build for CPU
  -DFL_ARRAYFIRE_USE_CUDA=ON \ 
  -DFL_BUILD_DISTRIBUTED=OFF \
  -DFL_USE_ONEDNN=OFF \
  -DFL_BUILD_TESTS=OFF \
  -DFL_BUILD_EXAMPLES=OFF \
  -DFL_BUILD_SCRIPTS=OFF \
  -DCMAKE_INSTALL_PREFIX=$HOME/usr/
make -j$(nproc)
make install

Build bindings for shumai:

mkdir -p build && cd build
cmake .. \
    -DBUILD_SHARED_LIBS=ON \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \ # or as specified
    -Dflashlight_DIR=${FLASHLIGHT_INSTALL_PREFIX}/share/flashlight/cmake \
    -DArrayFire_DIR=${ARRAYFIRE_INSTALL_PREFIX}/share/ArrayFire/cmake # if built from source, else not needed
make -j$(nproc)

Contributing

If you'd like to make changes to the core bindings or ffi, first build from source.

All files ending in *.inl or *_gen.ts are generated. These can be modified by editing scripts/gen_binding.py and running ./scripts/gen_all_binding.sh.

See the CONTRIBUTING file for style guidance and more info on how to help out. 😁

Supported Operations

Some operations are supported as both static functions and methods on existing tensors.

Operation Function Tensor Method (t : Tensor)
rand rand(shape: number[]) : Tensor
randn randn(shape: number[]) : Tensor
full full(shape: number[], val: number) : Tensor
identity identity(dim: number) : Tensor
arange arange(start: number, end: number, step: number = 1) : Tensor
iota iota(dims: number[], tileDims: number[] = [1]) : Tensor
reshape reshape(tensor: Tensor, shape: number[]) : Tensor t.reshape(shape: number[]) : Tensor
transpose transpose(tensor: Tensor, axes: number[]) : Tensor t.transpose(axes: number[]) : Tensor
tile tile(tensor: Tensor, shape: number[]) : Tensor t.tile(shape: number[]) : Tensor
nonzero nonzero(tensor: Tensor) : Tensor t.nonzero() : Tensor
negative negative(tensor: Tensor) : Tensor t.negative() : Tensor
logicalNot logicalNot(tensor: Tensor) : Tensor t.logicalNot() : Tensor
exp exp(tensor: Tensor) : Tensor t.exp() : Tensor
log log(tensor: Tensor) : Tensor t.log() : Tensor
log1p log1p(tensor: Tensor) : Tensor t.log1p() : Tensor
sin sin(tensor: Tensor) : Tensor t.sin() : Tensor
cos cos(tensor: Tensor) : Tensor t.cos() : Tensor
sqrt sqrt(tensor: Tensor) : Tensor t.sqrt() : Tensor
tanh tanh(tensor: Tensor) : Tensor t.tanh() : Tensor
floor floor(tensor: Tensor) : Tensor t.floor() : Tensor
ceil ceil(tensor: Tensor) : Tensor t.ceil() : Tensor
rint rint(tensor: Tensor) : Tensor t.rint() : Tensor
absolute absolute(tensor: Tensor) : Tensor t.absolute() : Tensor
abs abs(tensor: Tensor) : Tensor t.abs() : Tensor
sigmoid sigmoid(tensor: Tensor) : Tensor t.sigmoid() : Tensor
erf erf(tensor: Tensor) : Tensor t.erf() : Tensor
flip flip(tensor: Tensor, dim: number) : Tensor t.flip(dim: number) : Tensor
clip clip(tensor: Tensor, low: Tensor, high: Tensor) : Tensor t.clip(low: Tensor, high: Tensor) : Tensor
roll roll(tensor: Tensor, shift: number, axis: number) : Tensor t.roll(shift: number, axis: number) : Tensor
isnan isnan(tensor: Tensor) : Tensor t.isnan() : Tensor
isinf isinf(tensor: Tensor) : Tensor t.isinf() : Tensor
sign sign(tensor: Tensor) : Tensor t.sign() : Tensor
tril tril(tensor: Tensor) : Tensor t.tril() : Tensor
triu triu(tensor: Tensor) : Tensor t.triu() : Tensor
where where(cond: Tensor, x: Tensor, y: Tensor) : Tensor t.where(x: Tensor, y: Tensor) : Tensor
sort sort(tensor: Tensor, dim: number) : Tensor t.sort(dim: number) : Tensor
add add(tensor: Tensor, other: Tensor) : Tensor t.add(other: Tensor) : Tensor
sub sub(tensor: Tensor, other: Tensor) : Tensor t.sub(other: Tensor) : Tensor
mul mul(tensor: Tensor, other: Tensor) : Tensor t.mul(other: Tensor) : Tensor
div div(tensor: Tensor, other: Tensor) : Tensor t.div(other: Tensor) : Tensor
eq eq(tensor: Tensor, other: Tensor) : Tensor t.eq(other: Tensor) : Tensor
neq neq(tensor: Tensor, other: Tensor) : Tensor t.neq(other: Tensor) : Tensor
lessThan lessThan(tensor: Tensor, other: Tensor) : Tensor t.lessThan(other: Tensor) : Tensor
lessThanEqual lessThanEqual(tensor: Tensor, other: Tensor) : Tensor t.lessThanEqual(other: Tensor) : Tensor
greaterThan greaterThan(tensor: Tensor, other: Tensor) : Tensor t.greaterThan(other: Tensor) : Tensor
greaterThanEqual greaterThanEqual(tensor: Tensor, other: Tensor) : Tensor t.greaterThanEqual(other: Tensor) : Tensor
logicalOr logicalOr(tensor: Tensor, other: Tensor) : Tensor t.logicalOr(other: Tensor) : Tensor
logicalAnd logicalAnd(tensor: Tensor, other: Tensor) : Tensor t.logicalAnd(other: Tensor) : Tensor
mod mod(tensor: Tensor, other: Tensor) : Tensor t.mod(other: Tensor) : Tensor
bitwiseAnd bitwiseAnd(tensor: Tensor, other: Tensor) : Tensor t.bitwiseAnd(other: Tensor) : Tensor
bitwiseOr bitwiseOr(tensor: Tensor, other: Tensor) : Tensor t.bitwiseOr(other: Tensor) : Tensor
bitwiseXor bitwiseXor(tensor: Tensor, other: Tensor) : Tensor t.bitwiseXor(other: Tensor) : Tensor
lShift lShift(tensor: Tensor, other: Tensor) : Tensor t.lShift(other: Tensor) : Tensor
rShift rShift(tensor: Tensor, other: Tensor) : Tensor t.rShift(other: Tensor) : Tensor
minimum minimum(tensor: Tensor, other: Tensor) : Tensor t.minimum(other: Tensor) : Tensor
maximum maximum(tensor: Tensor, other: Tensor) : Tensor t.maximum(other: Tensor) : Tensor
power power(tensor: Tensor, other: Tensor) : Tensor t.power(other: Tensor) : Tensor
matmul matmul(tensor: Tensor, other: Tensor) : Tensor t.matmul(other: Tensor) : Tensor
amin amin(tensor: Tensor, axes: number[] = [], keep_dims: boolean = false) : Tensor t.amin(axes: number[] = [], keep_dims: boolean = false) : Tensor
amax amax(tensor: Tensor, axes: number[] = [], keep_dims: boolean = false) : Tensor t.amax(axes: number[] = [], keep_dims: boolean = false) : Tensor
argmin argmin(tensor: Tensor, axis: number, keep_dims: boolean = false) : Tensor t.argmin(axis: number, keep_dims: boolean = false) : Tensor
argmax argmax(tensor: Tensor, axis: number, keep_dims: boolean = false) : Tensor t.argmax(axis: number, keep_dims: boolean = false) : Tensor
sum sum(tensor: Tensor, axes: number[] = [], keep_dims: boolean = false) : Tensor t.sum(axes: number[] = [], keep_dims: boolean = false) : Tensor
cumsum cumsum(tensor: Tensor, axis: number) : Tensor t.cumsum(axis: number) : Tensor
mean mean(tensor: Tensor, axes: number[] = [], keep_dims: boolean = false) : Tensor t.mean(axes: number[] = [], keep_dims: boolean = false) : Tensor
median median(tensor: Tensor, axes: number[] = [], keep_dims: boolean = false) : Tensor t.median(axes: number[] = [], keep_dims: boolean = false) : Tensor
var var(tensor: Tensor, axes: number[] = [], bias: boolean = false, keep_dims: boolean = false) : Tensor t.var(axes: number[] = [], bias: boolean = false, keep_dims: boolean = false) : Tensor
std std(tensor: Tensor, axes: number[] = [], keep_dims: boolean = false) : Tensor t.std(axes: number[] = [], keep_dims: boolean = false) : Tensor
norm norm(tensor: Tensor, axes: number[] = [], p: number = 2, keep_dims: boolean = false) : Tensor t.norm(axes: number[] = [], p: number = 2, keep_dims: boolean = false) : Tensor
countNonzero countNonzero(tensor: Tensor, axes: number[] = [], keep_dims: boolean = false) : Tensor t.countNonzero(axes: number[] = [], keep_dims: boolean = false) : Tensor
any any(tensor: Tensor, axes: number[] = [], keep_dims: boolean = false) : Tensor t.any(axes: number[] = [], keep_dims: boolean = false) : Tensor
all all(tensor: Tensor, axes: number[] = [], keep_dims: boolean = false) : Tensor t.all(axes: number[] = [], keep_dims: boolean = false) : Tensor

License

shumai is MIT licensed, as found in the LICENSE file.

Comments
  • Extensible statistics

    Extensible statistics

    Tried to keep scope down but the changes are pretty considerable and come with tradeoffs. The big downside of leveraging a consistent stats interface across all layers is that distributed training now requires a bit more effort to process the http results from the remote models. I believe the tradeoffs are well worth it though, and the updated docs attempt to explain these new patterns that should enable some pretty incredible and robust stats in the future (that I also plan to contribute).

    Statistics

    graph TD
      OpA(Op A) --> statsA{{"stats A"}};
      OpB(Op B) --> statsA;
      statsA --> LoggerA{{"LoggerConsole A"}};
      LoggerA --> Stdout(("Stdout"));
      OpC(Op C) --> statsA;
      OpD(Op D) --> statsA;
      statsA --> LoggerB("LoggerCustom B");
      LoggerB --> Disk(("Disk"));
    

    Basic usage of gathering statistics is as simple adding a collector using the default StatsLoggerConsole.

    import { stats, StatsLoggerConsole, rand, matmul } from '@shumai/shumai'
    
    stats.enabled = true // all ops following will capture stats
    
    // perform ops...
    
    stats.enabled = false // all ops following will no longer capture stats
    

    While the above examples may suffice for simple use cases, if you're looking to capture stats across multiple threads, processes, and/or hosts, StatsLoggerHttp has you covered.

    graph TD
      subgraph Host C
        Processor("LoggerHttp Processor")
        style Processor stroke:#222,stroke-width:4px,stroke-dasharray:5 5
      end
      subgraph Host A
        OpA(Op A) --> statsA{{"stats A"}};
        OpB(Op B) --> statsA;
        statsA --> LoggerA{{"LoggerHttp A"}};
        LoggerA --> Processor;
      end
      subgraph Host B
        OpC(Op C) --> statsB{{"stats B"}};
        OpD(Op D) --> statsB;
        statsB --> LoggerB{{"LoggerHttp B"}};
        LoggerB --> Processor;
      end
    
    import { StatsLoggerHttp } from '@shumai/shumai'
    
    stats.logger = new StatsLoggerHttp({ url: 'http://localhost:4242' })
    

    For more custom needs you can supply your own logger:

    import { StatsLogger, StatsLoggerData } from '@shumai/shumai'
    
    class CustomLogger implements StatsLogger {
      async process(data: StatsLoggerData): Promise<void> {
        const summary = data.collector.getSummary()
        console.log('Collector stats:', summary)
      }
    }
    
    stats.logger = new CustomLogger()
    

    By default stack tracing is disabled as it adds 50%+ overhead, but can be enabled via stats.collectStacks = true.

    Scoped Statistics

    If you wish to isolate stats profiling you can do this as well:

    import { collectStats } from '@shumai/shumai'
    
    const scopedStats = collectStats(() => {
      // perform ops...
    }/*, StatsCollectorOptions | StatsLogger */)
    console.log(scopedStats.getSummary())
    
    CLA Signed 
    opened by asilvas 9
  • Making softmax numerically stable

    Making softmax numerically stable

    Modified the softmax function to be numerically stable with large exponents. Method taken from here.

    I am fairly new to autodiff gradient functions, so my implementation of amax may be way off the mark (it certainly looks wrong).

    I originally wrote the below code based on the min/max gradient functions that already exist, but it would not converge my test model (where the current implementation does).

    const mask = ctx.forward_inputs[0]
      .eq(ctx.forward_output)
      .astype(ctx.backward_input.dtype);
    return ctx.backward_input.mul(mask)
    
    CLA Signed 
    opened by joelshepherd 8
  • Transformer encoder

    Transformer encoder

    TransformerPositionalEncoding

    $$ \mathrm{PE}_{i, 2z} = \sin \left( \frac{i}{10000^{2z/d}} \right) $$

    $$ \mathrm{PE}_{i, 2z + 1} = \cos \left( \frac{i}{10000^{2z/d}} \right) $$

    where $i$ is the sequence position, $2z$ and $2z+1$ are the dimensions of the input embedding, and $d$ is the dimensionality of the input embedding.

    The multiplicative factors $\frac{1}{10000^{2z/d}}$ are precomputed during object creation as they are constant for all $i$.

    The full PE is initially precomputed for all $i$ up to 256 (configurable). This is then extended and stored if the module is called with a sequence length larger than the initial value.

    Returns a 2D tensor matching the last two dimensions of the input tensor to TransformerEncoder.

    FeedForward

    Simple 2-layer fully connected neural network with relu activation. This is kept as a private class for now. If we want to make this to be exported it should probably be in a separate file.

    TransformerEncoderLayer

    As described in Vaswani et al.

    TransformerEncoder

    The full encoder half of the Transformer, using a Sequential containing arbitrary number of TransformerEncoderLayers.

    This includes the positional encoding, but does not include any initial embedding of an input sequence into vectors (which would be separately done by e.g. word2vec)

    CLA Signed 
    opened by yushiyangk 8
  • attempt basic error handling from native code

    attempt basic error handling from native code

    Attempts to add a basic implementation of native error handling that works hand in hand w TS Error handling. It has room for improvement in terms of additional functionality, but I think this is a good first step at hashing out a native code error handling API RE #26.

    CLA Signed 
    opened by cryptodeal 5
  • Add Support for `Float16Array`

    Add Support for `Float16Array`

    While working on implementing Tensor Data Types, pretty quickly realized JS TypedArray doesn't implement Float16Array. Some research into solutions revealed that there's an existing library, @petamoriken/float16, that exports Float16Array that's been actively developed since ~2014 and has recently added support for Bun. @petamoriken/float16 Github Repo

    Seems to support Node runtimes (Bun included) as well as browser implementations (seems like the lib was created as they needed a Float16Array when working with WebGL).

    enhancement 
    opened by cryptodeal 5
  • SegmentationFault running `examples/bench.ts` and other examples

    SegmentationFault running `examples/bench.ts` and other examples

    Running in a vanilla docker container FROM flml/flashlight:cuda-latest.

    bun bench.ts
    10 elements...
    JS create 0 tensor               mean: 25.576us    (min: 19.379us, max: 895.272us)
    native create 0 tensor           mean: 4.402us    (min: 2.48us, max: 348.661us)
    JS create random tensor          mean: 23.681us    (min: 19.279us, max: 190.88us)
    native create random tensor      mean: 12.875us    (min: 7.879us, max: 264.586us)
    1000 elements...
    JS create 0 tensor               mean: 26.09us    (min: 20.109us, max: 519.052us)
    native create 0 tensor           mean: 3.65us    (min: 2.25us, max: 354.776us)
    JS create random tensor          mean: 28.209us    (min: 22.628us, max: 388.019us)
    native create random tensor      mean: 12.913us    (min: 7.789us, max: 447.59us)
    100000 elements...
    
    SegmentationFault at 0x0000000000000000
    
    
    ----- bun meta -----
    Bun v0.1.13 (55bdf268) Linux x64 #1 SMP Wed Aug 24 22:24:20 UTC 2022
    AutoCommand:
    Elapsed: 1630ms | User: 957ms | Sys: 262ms
    RSS: 67.11MB | Peak: 1.74GB | Commit: 67.11MB | Faults: 60
    ----- bun meta -----
    

    I'm able to run benchmark from flashlight in the same container. The host is Win11 /w WSL2.

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 515.65.01    Driver Version: 516.94       CUDA Version: 11.7     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  On   | 00000000:21:00.0  On |                  N/A |
    |  0%   45C    P8    26W / 400W |   2120MiB / 12288MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   1  NVIDIA GeForce ...  On   | 00000000:4B:00.0 Off |                  N/A |
    |  0%   35C    P8    19W / 350W |      0MiB / 12288MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    bug 
    opened by asilvas 5
  • Submit latest version of EvalTuner and clean up commits

    Submit latest version of EvalTuner and clean up commits

    Must have been a bit tired last night because I messed up when merging commits to resolve conflicts; resubmitted with the updated code and cleaned up commit history.

    CLA Signed 
    opened by cryptodeal 5
  • Forced GC during backward optimization slows training up to 60x

    Forced GC during backward optimization slows training up to 60x

    https://github.com/facebookresearch/shumai/blob/main/shumai/tensor/tensor.ts#L416

    tensor.update() invocations are very high during backward pass and these GC calls are killing performance. There are a number of directions that will greatly improve the situation but wasn't sure if you had a plan/pattern in mind.

    opened by asilvas 4
  • Implement `dispose`

    Implement `dispose`

    Bun appears to run GC twice; if you uncomment the logged output in destroyTensor (only called @ Garbage Collection), the output is logged multiple times per pointer).

    This causes the test to segfault as first GC, bun finds the pointer in alreadyDestroyed & removes the pointer from the set. 2nd Garbage Collection, destroyTensor fails to locate the pointer in alreadyDestroyed (as it was just cleared @ the prev Garbage Collection run).

    It seems like there might be a bug in Bun, working on a repro to file an issue w Bun as this is likely blocking.

    Once we resolve the above, this will partially implement #50 (still want to implement some equivalent to TFJS tidy in a separate PR).

    CLA Signed 
    opened by cryptodeal 4
  • Add More Tests

    Add More Tests

    Supported Operations Tests

    • [ ] rand
    • [ ] randn
    • [ ] full
    • [ ] identity
    • [ ] arange
    • [ ] iota
    • [x] reshape
    • [x] transpose
    • [x] tile
    • [ ] nonzero
    • [x] negative
    • [ ] logicalNot
    • [x] exp
    • [x] log
    • [ ] log1p
    • [x] sin
    • [x] cos
    • [ ] sqrt
    • [ ] tanh
    • [x] floor
    • [x] ceil
    • [ ] rint
    • [ ] absolute
    • [x] abs
    • [x] sigmoid
    • [x] erf
    • [x] flip (1D Tensor; addtl tests after 100% coverage basic ops)
    • [ ] clip
    • [ ] roll
    • [x] isnan
    • [x] isinf
    • [x] sign
    • [ ] tril
    • [ ] triu
    • [ ] where
    • [ ] sort
    • [x] add
    • [x] sub
    • [x] mul
    • [x] div
    • [ ] eq
    • [ ] neq
    • [ ] lessThan
    • [ ] lessThanEqual
    • [ ] greaterThan
    • [ ] greaterThanEqual
    • [ ] logicalOr
    • [ ] logicalAnd
    • [ ] mod
    • [ ] bitwiseAnd
    • [ ] bitwiseOr
    • [ ] bitwiseXor
    • [ ] lShift
    • [ ] rShift
    • [x] minimum
    • [x] maximum
    • [ ] power
    • [ ] matmul
    • [ ] amin
    • [ ] amax
    • [ ] argmin
    • [ ] argmax
    • [x] sum
    • [ ] cumsum
    • [x] mean
    • [ ] median
    • [ ] var
    • [ ] std
    • [x] norm
    • [ ] countNonzero
    • [ ] any
    • [ ] all

    Tensor Class Methods & Properties Tests

    • [ ] backward
    • [ ] ndim
    • [x] shape (used in tests)
    • [ ] toString
    • [ ] valueOf (used in tests)
    • [ ] asContiguousTensor
    • [x] copy
    • [ ] detach
    • [x] elements (used in tests)
    • [x] toFloat32Array (tested implicitly in valueOf)
    • [x] toFloat32 (tested implicitly in valueOf)
    CLA Signed 
    opened by cryptodeal 4
  • Add gradient fns for log and abs

    Add gradient fns for log and abs

    This adds gradient functions for log and abs.

    I have used and tested these locally for cross entropy and mean absolute error loss functions. If you would like having these in your library too, I am happy to upstream them from my experiment repo too.

    CLA Signed 
    opened by joelshepherd 3
  • Implement `StandardScaler`; add associated tests

    Implement `StandardScaler`; add associated tests

    Implemented BaseScaler abstract class + StandardScaler, which extends the base class. Also added simple unit tests for StandardScaler.

    Fixed a few type errors in shumai/tensor/tensor.ts while working on this.

    CLA Signed 
    opened by cryptodeal 5
  • Examples about training and inference

    Examples about training and inference

    I'd like to build a little feed-forward fully connected thing with just one hidden layer, I looked at the examples but perhaps the most relevant one, train.ts, doesn't seem to work anymore as things like sm.module.sequential and sm.optim.Adam don't seem to exist anymore.

    It would be great to get that example fixed.

    In general it would also be great to get a sort of simpler and more exhaustive "getting started" example, like a tiny model that learns XOR that showcases how to build the network (perhaps with 1 hidden layer for the sake of showcasing how to do it), how to feed it data for training, and how to validate it with more data afterwards.

    At the moment I'm a bit stuck, I have the dataset, I had the network sort of working on top of Brain.js (too slow), but I don't know what Shumai code I should write to recreate the same network and training/testing "pipeline".

    opened by fabiospampinato 2
  • [tracking] Browser Support

    [tracking] Browser Support

    Great work on shumai! I'm very new to bun specifically and javascript in general, but I love the idea.

    I am trying to import shumai into an html page I'm building and am curious how all the pieces work together.

    I have @shumai/shumai installed via bun and can import it using ES6 syntax into a .js file no problem.

    I run bun bun which generates a node_modules.bun which can be copied into node_modules.js by running ./node_modules.bun > node_modules.js

    I can then import a script module <script type="module" src="node_modules.js">...</script> which seems to work.

    However, as is intended by bun, hashes are exported instead of the modules holding the same structure. So now importing and using sm doesn't expose the same API and randn or tensor for example aren't available.

    In your time working with bun, have you figured out a supported way to do this? How would you suggest using shumai in a web page?

    I appreciate your help

    enhancement 
    opened by andrewnc 2
  • WebGPU Backend

    WebGPU Backend

    This will enable browser support. We'll need to shim some files:

    • [ ] io
    • [ ] network
    • [ ] tensor/ffi

    It doesn't make sense to support anything besides WebGPU at this point. WASM + SIMD is around 15-20x slower on my machine[1]. Although WebGL is more widely supported today, it doesn't have the compute features needed for efficient modern ML (transformers etc) and will likely be a deprecated backend for other frameworks when WebGPU comes online.

    [1]: In chrome canary, with Unsafe webGPU enabled try models here: https://tensorflow.github.io/tfjs/e2e/benchmarks/local-benchmark/index.html

    enhancement help wanted 
    opened by bwasti 1
  • Add floating point operation and byte movement counter to every operation

    Add floating point operation and byte movement counter to every operation

    Ideally, each operation would have its theoretical peak performance measured. This would help us easily catch "slow" operations or bottlenecks in models during training

    This information could be added to tensor.stats

    enhancement 
    opened by bwasti 0
  • CUDA backend for asynchronous distributed multi-trainer segfaults (race condition)

    CUDA backend for asynchronous distributed multi-trainer segfaults (race condition)

    When running the distributed test with a CUDA backend, there's a segfault. It can be fixed easily with CUDA_LAUNCH_BLOCKING, but that's not ideal. Below are the commands to repro:

    Server:

    $ bash examples/distributed/serve.sh
    

    Client:

    $ bash examples/distributed/client.sh
    
    bug 
    opened by bwasti 0
Owner
Meta Research
Meta Research
Bun-Bakery is a web framework for Bun. It uses a file based router in style like svelte-kit. No need to define routes during runtime.

Bun Bakery Bun-Bakery is a web framework for Bun. It uses a file based router in style like svelte-kit. No need to define routes during runtime. Quick

Dennis Dudek 44 Dec 6, 2022
A vscode extension to quickly print variable, variable type, tensor shape etc by using shortcuts

Quick Python Print This repo is inspired by "Python Quick Print". "Python Quick Print" can quickly print out variables on the console by using shortcu

weida wang 5 Oct 28, 2022
ndarray/tensor data processing for modern browsers

nadder Easy n-dimensional data manipulation with NumPy syntax. Installation npm i nadder # or yarn add nadder, or pnpm add nadder Usage import { ndarr

null 16 Dec 23, 2022
Adds a flashlight to Mindustry.

mindustry-lighting Adds various types of lighting sources to the player in Mindustry. Flashlight The default light source of the mod. Instead of makin

Antumbra (or Lich) 7 Oct 27, 2022
Vercel NextJS Conf Prism + P5.js flashlight tracking

This project was bootstrapped with Create React App. Below you will find some information on how to perform common tasks. You can find the most recent

Chun Rapeepat 5 Oct 27, 2022
A minimal routing library designed to sit on top of Bun's fast HTTP server.

siopao A minimal routing library designed to sit on top of Bun's fast HTTP server. Based on Radix Tree. Sio=Hot Pao=Bun Installation bun add siopao Us

Robert Soriano 69 Nov 8, 2022
⚡️ A fast, minimalist web framework for the Bun JavaScript runtime

?? Bao.js A fast, minimalist web framework for the Bun JavaScript runtime. ⚡️ Bao.js is 3.7x faster than Express.js and has similar syntax for an easy

Matt Reid 746 Dec 26, 2022
Fast, and friendly Bun web framework

?? KingWorld Fast, and friendly Bun web framework. ⚡️ Faster than Express.js by 8.5x on M1 Max Named after my favorite VTuber (Shirakami Fubuki) and c

SaltyAom 114 Jan 4, 2023
⚡️A minimalistic and sweet router for blazing fast bun

Melonpan is a simple and minimalistic web-router designed to work with Bun, keeping performance in mind. ?? Why Melonpan? no/minimal learning curve De

Hemanth Krishna 66 Jan 6, 2023
🦆 lightning fast duckdb bindings for bun runtime

@evan/duckdb lightning fast duckdb bindings for bun runtime Install bun add @evan/duckdb Features ?? batteries included ?? jit optimized bindings ?? 4

evan 29 Oct 20, 2022
A blazingly fast Bun.js filesystem router, with an unpleasantly smooth experience!

Oily A blazingly fast Bun.js filesystem router, with an unpleasantly smooth experience! Installation · Usage · Examples · Discord Installation Once yo

Aries 22 Dec 19, 2022
TypeScript type definitions for Bun's JavaScript runtime APIs

Bun TypeScript type definitions These are the type definitions for Bun's JavaScript runtime APIs. Installation Install the bun-types npm package: # ya

Oven 73 Dec 16, 2022
🚀 A boilerplate with generic configurations to a Nextjs project with bun, vitest, cicd and etc

?? Next.JS Template with Linter ?? Tools: NextJS Typescript ESLint (Code Pattern) Prettier (Formatter) Husky (Pre-commit) Vitest (Unit/Integration Tes

Rodrigo Victor 8 Dec 18, 2022
A zero-dependency, strongly-typed web framework for Bun, Node and Cloudflare workers

nbit A simple, declarative, type-safe way to build web services and REST APIs for Bun, Node and Cloudflare Workers. Examples See some quick examples b

Simon Sturmer 16 Sep 16, 2022
For web frameworks on Node, on Deno, and on Bun.

Web Framework Bench For web frameworks on Node, on Deno, and on Bun. Fast is not everything, but fast is everything. Motivation There are some benchma

Yusuke Wada 8 Sep 7, 2022
Tiny and expressive web framework for Bun.js

Bagel Bagel is a tiny and expressive web framework for Bun.js for building web APIs. Inspired by Express.js and Koa.js. Here we treat Typescript as fi

KaKeng Loh 34 Nov 25, 2022
Serve static files using Bun.serve or Bao.js

serve-static-bun Serve static files using Bun.serve or Bao.js. Currently in beta. Aiming for similar features as expressjs/serve-static. Install This

Jakob Bouchard 10 Jan 1, 2023
zx inspired shell for Bun/Node.

?? bnx zx inspired shell for Bun/Node. Install bun add bnx # npm install bnx Usage import { $ } from 'bnx' const list = $`ls -l` const files = list.

Robert Soriano 50 Oct 16, 2022
Wrap a function with bun-livereload to automatically reload any imports inside the function the next time it is called

bun-livereload Wrap a function with bun-livereload to automatically reload any imports inside the function the next time it is called. import liveRelo

Jarred Sumner 19 Dec 19, 2022