Live port of Lark's standalone parser to Javascript

Overview

Lark.js

Generate LALR(1) parsers in Javascript

Lark is a popular parsing toolkit for Python.

This project is a live port of the Lark standalone parser to Javascript.

Lark.js takes a .lark grammar, and from it generates a standalone Javascript parser.

Lark grammars

Lark grammars are written in an augmented EBNF (a textual format), and usually use the .lark extension.

Because they are purely declarative, and don't contain code, they are entirely portable between languages.

It is now possible to use Lark grammars in 3 languages: Python, Javascript, and Julia.

Quick links

Install Lark.js

Install lark-js on Python:

    pip install lark-js --upgrade

Generate a Javascript LALR(1) parser

	lark-js my_grammar.lark -o my_parser.js

For help, run:

	lark-js --help

Features

  • LALR(1) parser - Fast and light
  • EBNF grammar
  • Builds a parse-tree (AST) automagically, based on the structure of the grammar
  • Usable in the browser and in Node.js
  • Interactive parsing (step-by-step)
  • Tree utilities (including transformers & visitors)
  • Line & column tracking
  • Standard library of terminals (strings, numbers, names, etc.)
  • Import grammars from Nearley.js

Planned features:

  • Support for Earley

Syntax Highlighting

Lark provides syntax highlighting for its grammar files (*.lark):

Live Port

Usually, ports from one language to another are at risk of falling out of sync as time goes on.

But Lark.js wasn't translated by hand -- 98% of the lines were transpiled directly from Lark's Python code!

That means that futures updates to Lark-Python (fixes, features, etc.) will automatically sync to Lark.js.

License

Lark.js uses the MIT license.

Contribute

Lark.js is accepting pull-requests. If you would like to help, open an issue or find us on gitter.

Sponsoring

Lark.js was made possible with the help of a generous donation by Smore ❤️

If you like Lark, and want to see it grow, please consider sponsoring us!

Comments
  • Parser doesn't provide next expected token type according to the grammar

    Parser doesn't provide next expected token type according to the grammar

    Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/src/kestrel/syntax/kestrel.lark

    When I tried to parse proc2 = GET incomplete statement, the parser does not provide the expected token type for next token when I caught the exception UnexpectedToken.

    image

    Expected result: Based on the grammar

    statement: VARIABLE "=" command
             | command
             
    // "?" at the beginning will inline command
    ?command: get
            | find
            | disp
            | info
            | apply
            | join
            | sort
            | group
            | load
            | save
            | new
            | merge
    get: "get"i ENTITY_TYPE ("from"i DATASRC)? "where"i STIXPATTERNBODY (starttime endtime)?
    

    it should provide the next expected token type ENTITY_TYPE, so we can do something afterwards.

    bug 
    opened by jillyj 10
  • Incorrect column info for unexpected token exception

    Incorrect column info for unexpected token exception

    Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/release/src/kestrel/syntax/kestrel.lark Generated parser: kestrelParser.js.zip

    When parsing this statement var=get, the parser throws the unexpected token exception with

    e.line =1
    e.column=5
    

    However, the column should be 7. image

    Same incorrect column info for the following test strings. var=get file, e.column is 7, but should be 12. var=get file from, e.column is 14, but should be 17. var=get file from abc, e.column is 19, but should be 21.

    opened by jillyj 9
  • Unnecessary options

    Unnecessary options "_plugins" included in the generated parser

    When I tried lark-js to generate the parser from .lark file, the parser includes "_plugins": {} in to the parser file which resulted to the console error.

    Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/src/kestrel/syntax/kestrel.lark

    image

    opened by jillyj 8
  • Get into infinite loop if keeping parsing when encountered errors

    Get into infinite loop if keeping parsing when encountered errors

    Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/src/kestrel/syntax/kestrel.lark Generated parser: kestrelParser.js.zip

    Code

    function handle_errors(e) { return true; }
    
      try {
        treeData = parser.parse("proc2 =", null, handle_errors).children[0];
      } catch (e) {
        console.debug("uncaught error:", e)
      }
    

    Expected: can stop parsing and get error info.

    opened by jillyj 6
  • build: allow any v0 lark version after 0.11.1

    build: allow any v0 lark version after 0.11.1

    Closes #8.

    I know you're running two equivalent packages, lark and lark-parser, but the suggested name over in lark-parser/lark is just lark, so I changed that here too.

    opened by zevisert 5
  • Incorrect line info returned when error occurs

    Incorrect line info returned when error occurs

    Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/src/kestrel/syntax/kestrel.lark Generated parser: kestrelParser.js.zip

    Code to parse string disp a\nproc2 = get. The 1st line disp a is valid string. The 2nd line proc2=get is invalid and should return error. However, the exception I got from parser told me the error occurs at line 1. I checked the parser, and it uses \n as newline char, so it should return line 2 has error.

    function handle_errors(e) { return true; }
    
      try {
        treeData = parser.parse("disp a\nproc2 = get", null, handle_errors).children[0];
      } catch (e) {
        console.debug("uncaught error:", e)
      }
    

    Expected: the exception should return correct line number where the error occurs.

    opened by jillyj 4
  • Keywords are not in the parsed tree

    Keywords are not in the parsed tree

    Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/src/kestrel/syntax/kestrel.lark Based on the grammar

    statement: VARIABLE "=" command
             | command
             
    // "?" at the beginning will inline command
    ?command: get
            | find
            | disp
            | info
            | apply
            | join
            | sort
            | group
            | load
            | save
            | new
            | merge
    get: "get"i ENTITY_TYPE ("from"i DATASRC)? "where"i STIXPATTERNBODY (starttime endtime)?
    

    The from and where are keywords for get command. However, when I parsed this statement "procs2 = GET process FROM procs WHERE [process:pid = 10578]" The parsed tree is as below which does not contain the from and where keywords. Is there any options I can pass to the parser to get those keywords in the parse tree?

    {
      "data": "procs2 = GET process FROM procs WHERE [process:pid = 10578]",
      "children": [
        {
          "type": "VARIABLE",
          "start_pos": 0,
          "value": "procs2",
          "line": 1,
          "column": 1,
          "end_line": 1,
          "end_column": 7,
          "end_pos": 6
        },
        {
          "data": "get",
          "children": [
            {
              "type": "ENTITY_TYPE",
              "start_pos": 13,
              "value": "process",
              "line": 1,
              "column": 14,
              "end_line": 1,
              "end_column": 21,
              "end_pos": 20
            },
            {
              "type": "DATASRC",
              "start_pos": 26,
              "value": "procs",
              "line": 1,
              "column": 27,
              "end_line": 1,
              "end_column": 32,
              "end_pos": 31
            },
            {
              "type": "STIXPATTERNBODY",
              "start_pos": 38,
              "value": "[process:pid = 10578]",
              "line": 1,
              "column": 39,
              "end_line": 1,
              "end_column": 60,
              "end_pos": 59
            }
          ],
          "_meta": {
            "empty": false,
            "line": 1,
            "column": 10,
            "start_pos": 9,
            "container_line": 1,
            "container_column": 10,
            "end_line": 1,
            "end_column": 60,
            "end_pos": 59,
            "container_end_line": 1,
            "container_end_column": 60
          }
        }
      ],
      "_meta": {
        "empty": false,
        "line": 1,
        "column": 1,
        "start_pos": 0,
        "container_line": 1,
        "container_column": 1,
        "end_line": 1,
        "end_column": 60,
        "end_pos": 59,
        "container_end_line": 1,
        "container_end_column": 60
      },
      "type": "statement"
    }
    
    opened by jillyj 4
  • Version incompatibility

    Version incompatibility

    lark-js 0.0.7 is incompatible with the newly-released lark 0.12.0.

    This is because poetry follows semver clause 4 when you use the ^ operator to define dependencies, and treats v0.Y.Z as unstable.

    As an example, while people correctly expect that ^1.2.3 is also be compatible with 1.3.0, this doesn't apply to packages with a major version of 0. In the case of a version like ^0.Y.Z, the only compatible versions are those which only change Z.

    I'm not sure if you intended on also updating and releasing pypi packages of lark-js when new versions of lark are released, à la the live code thing, but as it stands - I can't use lark 0.12 with lark-js. I see that you're working on a v1 release for lark, which is great, but until then, maybe we can use the inequality >= dependency specifiers?

    opened by zevisert 4
  • Unable to get the parsed tree data when statement is incomplete

    Unable to get the parsed tree data when statement is incomplete

    Generated parser: kestrelParser.zip

    Code

    function handle_errors(e) { return true; }
    
      try {
        treeData = parser.parse("proc2 =", null, handle_errors).children[0];
      } catch (e) {
        console.debug("uncaught error:", e)
      }
    

    Expected: can get treeData after parsing even if the statement is incomplete.

    opened by jillyj 3
  • fix: compiler errors

    fix: compiler errors

    Fix the compiler errors addressed in https://github.com/lark-parser/Lark.js/issues/28. Other changes are to remove the extra spaces which are made automatically by IDE.

    opened by jillyj 3
  • Bundling support (WIP don't merge yet)

    Bundling support (WIP don't merge yet)

    • Added support for bundling to: Browser, Node commonJS, Node ESM
    • Volta support
    • Added files field to minimize bundled files
    • Fix to isMap, is_array, and differ Node and browser version by bundling instead of checking (Needed to ESM support)

    TODO

    • [ ] Required final NPM package name
    • [ ] Adding details to package.json such as author (you) and repo (this)
    • [ ] Testing bundles on Node 12 + other bundlers for browser version
    opened by oriSomething 1
  • Error occurs while using the parser generated by Lark.js in React 18.

    Error occurs while using the parser generated by Lark.js in React 18.

    Generated parser file (please unzip it first): kestrelParser.js.zip

    We were using React 17.0.2, Next 12.1.0 and the parser can be imported and created successfully.

    const parser = get_parser({ keep_all_tokens: true });
    

    But after upgrading to React 18.2.0, Next 12.3.2, there is an error while running in the generated parser like below. image

    Do you have any ideas how to solve the issue? Thanks!

    opened by jillyj 0
  • How to I go from an AST tree to a file?

    How to I go from an AST tree to a file?

    Hi, Let's say I use https://github.com/crytic/amarna/blob/main/amarna/grammars/cairo.lark and https://github.com/lark-parser/Lark.js to get the AST tree of a cairo file. What do I use after to generate a cairo file from the AST ? Thank you

    opened by machard 4
  • Various errors when using `|` inside of terminals

    Various errors when using `|` inside of terminals

    I've noticed some errors when using a terminal "production" rule of the form

    T0: T1 | T2 | T3
    

    where all of the given expressions are terminals. These errors only occur in the standalone parser generated by Lark.js; the same grammar will correctly parse an identical string in the python version of lark. I've isolated two hopefully-minimal-enough example cases below.

    This seems to be similar to #21 in that it's related to some Javascript-specific regex foible that gets encountered when agglomerating terminals together via |, but as I'm not super-familiar with the internals of the library I can't be sure. As in #21, replacing VALUE with value everywhere (i.e. replacing the terminal rule with a non-terminal one) causes both of the following examples to parse correctly.

    Example 1

    This grammar:

    ?start: thing
    thing: thing W thing
        | expr
    expr: label W? VALUE
        | VALUE
    label: BARE_WORD W? ":"
    W: /[ \t\n\v\f]/+
    VALUE: NUMBER | BARE_WORD | STRING
    BARE_WORD: /[^\s:\(\)]/+
    STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
    NUMBER: /[0-9]+/
    

    fails with UnexpectedToken when attempting to parse the string "a:b", although running it in the Python version of Lark results in a correct parse.

    Example 2

    This grammar:

    ?start: thing
    thing: label VALUE | VALUE
    label: BARE_WORD W? ":"
    W: /[ \t\n\v\f]/+
    VALUE: NUMBER | BARE_WORD | STRING
    BARE_WORD: /[^\s:\(\)]/+
    STRING: "\"" /((?:\\"|[^\r\n"]))/* "\""
    NUMBER: /[0-9]+/
    

    fails with SyntaxError: Invalid flags supplied to RegExp constructor 'nully' during lexing of the same string "a:b"; the Python version also correctly parses it.

    opened by swwu 0
  • lark-js should advise user when `--start` not passed rather than outputting a cryptic error

    lark-js should advise user when `--start` not passed rather than outputting a cryptic error

    Trying to output a js parser for this matter.lark file fails with the following error:

    $ ~/.local/bin/lark-js matter_grammar.lark -o matter_grammar.js
    ...
    lark.exceptions.GrammarError: Using an undefined rule: NonTerminal('start')
    

    Yet solving this is as simple as passing the correct start symbol with the -s flag with idl start symbol used by this .lark file:

    $ ~/.local/bin/lark-js matter_grammar.lark -s idl -o matter_grammar.js
    

    The lark-js should advise the user that the .lark file has no start symbol, and that the correct start symbol should be passed via --start rather than dumping a raw error backtrace.

    opened by turon 0
  • SyntaxError:  Invalid regular expression returned when parsing

    SyntaxError: Invalid regular expression returned when parsing

    Grammar file: https://github.com/opencybersecurityalliance/kestrel-lang/blob/develop/src/kestrel/syntax/kestrel.lark Generated parser: kestrelParser.js.zip

    When parsing this statement procs2 = GET process abc, the parser throws the exception like below which is not caught by parser. image

    Code

    function handle_errors(e) { return true; }
    
      try {
        treeData = parser.parse(text, null, handle_errors).children[0];
      } catch (e) {
        console.debug("uncaught error:", e)
      }
    

    Expected: This kind of error can be handled by the parser, so we can get the parsing tree and the error info like Unexpected character or Unexpected Token.

    bug 
    opened by jillyj 11
  • Generated JS parser fails if minified

    Generated JS parser fails if minified

    I'm including a generated parser.js as part of a large web project. Webpack with the Terser plugin is configured to minify all of the javascript. The default (I believe) behavior is to rename all non-top-level classes to one or two-letter names.

    But serialized data, like this:

      0: {
        name: 'NUMBER',
        pattern: {
          value:
            '(?:(?:(?:[0-9])+(?:e|E)(?:(?:\\+|\\-))?(?:[0-9])+|(?:(?:[0-9])+\\.(?:(?:[0-9])+)?|\\.(?:[0-9])+)(?:(?:e|E)(?:(?:\\+|\\-))?(?:[0-9])+)?)|(?:[0-9])+)',
          flags: [],
          _width: [1, 4294967295],
          __type__: 'PatternRE',
        },
        priority: 0,
        __type__: 'TerminalDef',
      },
    

    includes names of classes as string literals.

    When the classes are renamed, the data can no longer be deserialized and an exception is thrown.

    Can a dictionary be built that maps the serialized type names to classes without relying on the class names themselves?

    opened by cderossi 8
Releases(0.1.4)
  • 0.1.4(May 16, 2022)

    What's Changed

    • fix: compiler errors by @jillyj in https://github.com/lark-parser/Lark.js/pull/29

    New Contributors

    • @jillyj made their first contribution in https://github.com/lark-parser/Lark.js/pull/29

    Full Changelog: https://github.com/lark-parser/Lark.js/compare/0.1.3...0.1.4

    Source code(tar.gz)
    Source code(zip)
  • 0.1.3(Apr 19, 2022)

    • Updated to sync with Lark 1.1.2

    • Bugfixes

      • to on_error argument of Lark.parse()
      • to Tree.pretty()
      • to line numbers in UnexpectedToken
      • Better support for re-compiling the generated Javascript.

    What's Changed

    • Added python tests by @erezsh in https://github.com/lark-parser/Lark.js/pull/19
    • Fix for issue #17: Renaming classes breaks deserialization by @erezsh in https://github.com/lark-parser/Lark.js/pull/18
    • Another bugfix for issue #17 by @erezsh in https://github.com/lark-parser/Lark.js/pull/20
    • Improvements taken from Lark 1.1.2 by @erezsh in https://github.com/lark-parser/Lark.js/pull/24
    • Fix docs + minor code details by @erezsh in https://github.com/lark-parser/Lark.js/pull/25
    • Bugfix for on_error (found in issue #22) by @erezsh in https://github.com/lark-parser/Lark.js/pull/26

    Full Changelog: https://github.com/lark-parser/Lark.js/compare/0.1.2...0.1.3

    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Mar 10, 2022)

    What's Changed

    • Bugfix for UnexpectedToken.expected: fixed isupper() implementation (issue #14) by @erezsh in https://github.com/lark-parser/Lark.js/pull/16

    Full Changelog: https://github.com/lark-parser/Lark.js/compare/0.1.1...0.1.2

    Source code(tar.gz)
    Source code(zip)
  • 0.1.1(Mar 3, 2022)

    • Fixed compatibility with latest lark.

    • Bugfix for UnexpectedToken.expected

    • Added auto-generated setup.py to support 'pip install -e .' (temporary! official support will soon be added to poetry)

    Full Changelog: https://github.com/lark-parser/Lark.js/compare/0.1.0...0.1.1

    Source code(tar.gz)
    Source code(zip)
  • 0.0.7(Sep 7, 2021)

    • lark.js can now compile iwth the Closure compiler without errors (still warnings to solve)
    • import larkjs now allows running the generator from Python, in addition to the shell.
    • Other small fixes
    Source code(tar.gz)
    Source code(zip)
Owner
Lark - Parsing Library & Toolkit
Lark - Parsing Library & Toolkit
Json-parser - A parser for json-objects without dependencies

Json Parser This is a experimental tool that I create for educational purposes, it's based in the jq works With this tool you can parse json-like stri

Gabriel Guerra 1 Jan 3, 2022
qrcode generation standalone (doesn't depend on external services)

jquery.qrcode.js jquery.qrcode.js is jquery plugin for a pure browser qrcode generation. It allow you to easily add qrcode to your webpages. It is sta

Jerome Etienne 4.8k Dec 29, 2022
A lightweight, standalone package to integrate full PWA features into Remix 💿

Remix PWA PWA integration & support for Remix Remix PWA is a lightweight, standalone npm package that adds full Progressive Web App support to Remix ?

Abdur-Rahman 220 Jan 3, 2023
Standalone Epub reader using Bibi.

bi-epub-reader Epub reader application using Bibi. Features Beautiful epub viewer Open file as associated file type Standalone application You can see

azu 8 Aug 5, 2022
GraphQL Hive provides all the tools the get visibility of your GraphQL architecture at all stages, from standalone APIs to composed schemas (Federation, Stitching)

GraphQL Hive GraphQL Hive provides all the tools the get visibility of your GraphQL architecture at all stages, from standalone APIs to composed schem

Kamil Kisiela 184 Dec 21, 2022
Standalone AJAX library inspired by jQuery/zepto

ajax Standalone AJAX library inspired by jQuery/zepto Installation component-install ForbesLindesay/ajax Then load using: var ajax = require('ajax');

Forbes Lindesay 365 Dec 17, 2022
NFT Marketplace framework to build standalone NFT marketplace or inApp/inGame NFT marketplace

NFT Marketplace This project is a decentalized NFT Marketplace framework which is to be the baseline for you to build standalone NFT marketplace or in

Reddio, inc. 14 Dec 19, 2022
fxDeviantArt.js - a port of fxDeviantArt to JavaScript

fxDeviantArt.js This is a port of fxDeviantArt to JavaScript. Written in TypeScript and compiled to a JS file. Original code is written by Robin Unive

Doruk 2 Feb 18, 2022
Port Scanner with Javascript (NodeJs)

port-scanner Installation npm install pscanner example: const portScan = require("pscanner"); const main = async () => { const isOpen = await port

Pesar 45 Oct 14, 2022
The Javascript and canvas port of MarkovJunior : A Probabilistic Programming Language.

MarkovJunior.js MarkovJunior is a probabilistic programming language where programs are combinations of rewrite rules and inference is performed via c

nullday 17 Nov 15, 2022
Unofficial port of the Sentry SDK for JavaScript to Deno.

Sentry_deno This is an unofficial port of the Sentry SDK (@sentry/browser) to Deno. import * as Sentry from "https://deno.land/x/sentry_deno/main.ts";

Geert-Jan Zwiers 11 Aug 11, 2022
A simple inefficient and buggy JSON parser written in JavaScript. Just a fun project

A simple inefficient and buggy JSON parser written in JavaScript This JSON parser isn't guaranteed to work properly. Its recommended to use builtin JS

Pranav Baburaj 2 Feb 20, 2022
Binary-encoded serialization of JavaScript objects with generator-based parser and serializer

YaBSON Schemaless binary-encoded serialization of JavaScript objects with generator-based parser and serializer This library is designed to transfer l

Gildas 11 Aug 9, 2022
A port of bitcoin-core that will (over time) become TS friendly.

bitcoin-core A modern Bitcoin Core REST and RPC client to execute administrative tasks, multiwallet operations and queries about network and the block

null 6 Nov 22, 2022
AFrame port of Lamina (https://github.com/pmndrs/lamina)

AFrame-Lamina Automated port of Lamina to AFrame <a-lamina geometry="" material="shader:lamina;color:white;lighting:phong;" position="-1 0.5 -3" rotat

Ada Rose Cannon 4 Apr 6, 2022
Fluent for Deno. Port of @the-moebius/fluent.

Fluent for Deno [better_fluent] Deno port of the-moebius/fluent Better Fluent integration for TypeScript/JavaScript. See the original repository for m

Dunkan 5 May 29, 2022
A multifunctional Telegram based Android RAT without port forwarding.

DOGERAT A multifunctional Telegram based Android RAT without port forwarding. Features Read all the files of Internal Storage Delete files or folder f

shiva yadav 268 Jan 1, 2023
Grm is an improved Deno port of GramJS, written in TypeScript

Grm is an improved Deno port of GramJS, written in TypeScript. GramJS is a popular MTProto API Telegram client library written in JavaScript for Node.js and browsers, with its core being based on Telethon.

Dunkan 26 Dec 31, 2022
Improved Deno port of GramJS — a MTProto API Telegram client library.

Warning Considered as unstable. But, most of the commonly used features are working as expected. Grm Grm is an improved Deno port of GramJS, written i

Dunkan 26 Dec 31, 2022