Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework

Overview

unified-latex

Monorepo for @unified-latex packages.

These packages provide a JS/TypeScript interface for creating, manipulating, and printing LaTeX Abstract Syntax Trees (ASTs).

Most of the action lies in the

  • packages/ directory, where you'll find plugins for Unifiedjs and standalone tools for parsing LaTeX to an Abstract Syntax Tree (AST). Though parsing LaTeX isn't possible since it effectively has no grammar, unified-latex makes some practical assumptions. It should work on your code, unless you do complicated things like redefine control sequences or embed complicated TeX-style macros.

How it works

unified-latex uses PEG.js to define a PEG grammar for LaTeX. LaTeX source is first parsed with this grammar. Then it is post-processed based on knowledge of special macros. (e.g., some macros are known to take an argument, like \mathbb. Such arguments are not detected in the PEG processing stage).

Development

You should develop in each project's subfolder in the packages/ directory. These packages are set up as npm workspaces.

If you have node.js and npm installed, run

npm install

in this (the root) directory. Then -- after doing a full build as explained below first! -- you may build any particular package (for example)

cd packages/unified-latex
npm install
npm run build

Building

Building is a two-stage process. esbuild is used to create bundled packages in the esm and commonjs formats. Secondly, the TypeScript compiler is used to create the needed type information. All compiled files are stored in the dist/ directory of a workspace.

To build code for all workspaces, run

npm run build -ws

from the root directory.

If typescript complains about imports not existing in rootDir, it probably means that there is not a TypeScript reference to that particular workspace. (References are how typescript divides projects into different pieces so that it doesn't need to recompile every project). Add the imported project to the "references" field of the tsconfig.json.

Note that all tsconfig.json files extend tsconfig.build.json, which has some special configuration options to forward imports of @unified-latex/... directly to the correct folder during development.

Testing

Tests in a specific workspace can be run via npx jest in that workspace. These for the whole project can be run via npm run tests in the root directory.

Readme Generation and Consistency

README.md files for all workspaces are generated automatically by running

npx esr scripts/build-docs.ts

package.json files can be checked for naming consistency by running

npx esr scripts/package-consistency.ts

Publishing

Version management is done with lerna. Run

npx lerna version

to update the version of all packages. Run

npm run package
npm run publish

to publish all workspaces.

Playground

You use the Playground to view how latex is parsed/pretty-printed. To run your own version, visit the playground repository, and make a local clone. After running npm install, run npm link in your local latex-parser repository. Then, run npm link latex-ast-parser in the local playground repository. This will mirror your development version of latex-parser in the playground.

Related Projects

Comments
  • failed to build locally

    failed to build locally

    // unified-latex-util-parse

    • npm install
    • npm run build

    ../unified-latex-util-match/dist/index.d.ts:3:1127 - error TS2307: Cannot find module '../unified-latex-types/dist' or its corresponding type declarations.

    3 export declare const anyEnvironment: (node: any) => node is import("../unified-latex-types/dist").Environment, anyMacro: (node: any) => node is import("../unified-latex-types/dist").Macro, anyString: (node: any) => node is import("../unified-latex-types/dist").String, argument: (node: any) => node is import("../unified-latex-types/dist").Argument, blankArgument: (node: any) => boolean, comment: (node: any) => node is import("../unified-latex-types/dist").Comment, environment: (node: any, envName?: string | undefined) => node is import("../unified-latex-types/dist").Environment, group: (node: any) => node is import("../unified-latex-types/dist").Group, macro: (node: any, macroName?: string | undefined) => node is import("../unified-latex-types/dist").Macro, math: (node: any) => node is import("../unified-latex-types/dist").InlineMath | import("../unified-latex-types/dist").DisplayMath, parbreak: (node: any) => node is import("../unified-latex-types/dist").Parbreak, string: (node: any, value?: string | undefined) => node is import("../unified-latex-types/dist").String, whitespace: (node: any) => node is import("../unified-latex-types/dist").Whitespace; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    opened by yunmc 14
  • missing index.cjs files in the dist packages

    missing index.cjs files in the dist packages

    The distribution packages on npmjs are missing the index.cjs files. The files are explicitly exported in all the package.json file, e.g.,

    https://github.com/siefkenj/unified-latex/blob/main/packages/unified-latex-util-parse/package.json#L28

    I think the only reason the cjs files aren't in the packages is because

    https://github.com/siefkenj/unified-latex/blob/main/scripts/make-package.mjs#L49

    doesn't include "**/*.cjs".

    That's a guess since I wasn't able to build from source yet; otherwise, I would make a PR.

    What exact version of nodejs and npm are you using for development? I'm using

    ~/unified-latex$ node --version
    v14.19.2
    ~/unified-latex$ npm --version
    8.10.0
    

    on Ubuntu 20.04.

    opened by williamstein 4
  • Error [ERR_REQUIRE_ESM]: Must use import to load ES Module

    Error [ERR_REQUIRE_ESM]: Must use import to load ES Module

    Hi I'm trying to get webpack to work with unified-latex.

    Here is my webpack.config.js:

    const nodeExternals = require("webpack-node-externals");
    const path = require("path");
    
    module.exports = {
      name: 'server',
      entry: {
        server: path.resolve(__dirname, 'server/server.tsx'),
      },
      mode: 'production',
      output: {
        path: path.resolve(__dirname, 'dist'),
        filename: '[name].js',
      },
      resolve: {
        extensions: ['.ts', '.tsx'],
      },
      externals: [nodeExternals()],
      target: 'node',
      node: {
        __dirname: false,
      },
      module: {
        rules: [
          {
            test: /\.tsx?$/,
            loader: 'ts-loader',
            options: {
              compilerOptions: {
                "noEmit": false
              }
            },
          },
          {
            test: /\.(sa|sc)ss$/,
            use: [
              {
                loader: "css-loader",
                options: {
                  importLoaders: 2
                }
              },
              {
                loader: "sass-loader"
              }
            ]
          },
          {
            test: /\.svg$/,
            loader: 'url-loader',
            options: {
              limit: 8192,
            }
          },
        ],
      },
    }
    

    Here is my tsconfig.json:

    {
      "compilerOptions": {
        "target": "es5",
        "lib": [
          "dom",
          "dom.iterable",
          "esnext"
        ],
        "allowJs": false,
        "skipLibCheck": true,
        "esModuleInterop": true,
        "allowSyntheticDefaultImports": true,
        "strict": true,
        "forceConsistentCasingInFileNames": true,
        "noFallthroughCasesInSwitch": true,
        "module": "esnext",
        "moduleResolution": "node",
        "resolveJsonModule": true,
        "isolatedModules": true,
        "noEmit": true,
        "jsx": "react-jsx"
      },
      "include": [
        "src"
      ]
    }
    

    After that I try to run:

    node dist/server.js
    

    And I got the following error:

    internal/modules/cjs/loader.js:1102
          throw new ERR_REQUIRE_ESM(filename, parentPath, packageJsonPath);
          ^
    
    Error [ERR_REQUIRE_ESM]: Must use import to load ES Module: /app/node_modules/unified/index.js
    require() of ES modules is not supported.
    require() of /app/node_modules/unified/index.js from /app/node_modules/@unified-latex/unified-latex-util-parse/index.cjs is an ES module file as it is a .js file whose nearest parent package.json contains "type": "module" which defines all .js files in that package scope as ES modules.
    Instead rename index.js to end in .cjs, change the requiring code to use import(), or remove "type": "module" from /app/node_modules/unified/package.json.
    
        at new NodeError (internal/errors.js:322:7)
        at Object.Module._extensions..js (internal/modules/cjs/loader.js:1102:13)
        at Module.load (internal/modules/cjs/loader.js:950:32)
        at Function.Module._load (internal/modules/cjs/loader.js:790:12)
        at Module.require (internal/modules/cjs/loader.js:974:19)
        at require (internal/modules/cjs/helpers.js:101:18)
        at Object.<anonymous> (/app/node_modules/@unified-latex/unified-latex-util-parse/index.cjs:39:22)
        at Module._compile (internal/modules/cjs/loader.js:1085:14)
        at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)
        at Module.load (internal/modules/cjs/loader.js:950:32) {
      code: 'ERR_REQUIRE_ESM'
    }
    

    Seems like package unified-latex-util-parse mix cjs and js in imports. I have tried different approaches, but have achieved a positive result.

    Also it seems like "@unified-latex/unified-latex-to-hast": "^1.2.0" has missing .d.ts in npm, but this problem was fixed using following stupid fix:

    types-fix.d.ts:

    declare module "@unified-latex/unified-latex-to-hast" {
        export function convertToHtml(tree: Ast.Node | Ast.Node[]): string;
    };
    

    This can be fixed switching to version 1.1.0 of @unified-latex/*.

    opened by udovin 3
  • Potential parsing issue

    Potential parsing issue

    Hi there, super excited about this!

    I wanted to point out a potential bug. In beamer documents, it's common to use \begin{frame}{title} or \begin{frame}{title}{subtitle}. This doesn't work in the latex-parser-playground.

    This code:

    \documentclass{beamer}
    
    \begin{document}
    
    \begin{frame}{Title}
    
      \begin{itemize}
        Test
      \end{itemize}
    
    \end{frame}
    
    \end{document}
    

    results in this:

    \documentclass{beamer}
    
    \begin{document}
    	\begin{frame}
    		{Title}
    
    		\begin{itemize}
    			Test
    		\end{itemize}
    	\end{frame}
    \end{document}
    

    Notice how the {Title} is on a separate line

    opened by kylebutts 3
  • npm package @unified-latex/unified-latex-util-match contains broken .d.ts

    npm package @unified-latex/unified-latex-util-match contains broken .d.ts

    Its index.d.ts contains a line:

    export declare const anyEnvironment: (node: any) => node is import("@unified-latex/unified-latex-types/dist").Environment, anyMacro: (node: any) => node is import("@unified-latex/unified-latex-types/dist").Macro, anyString: (node: any) => node is import("@unified-latex/unified-latex-types/dist").String, argument: (node: any) => node is import("@unified-latex/unified-latex-types/dist").Argument, blankArgument: (node: any) => boolean, comment: (node: any) => node is import("@unified-latex/unified-latex-types/dist").Comment, environment: (node: any, envName?: string | undefined) => node is import("@unified-latex/unified-latex-types/dist").Environment, group: (node: any) => node is import("@unified-latex/unified-latex-types/dist").Group, macro: (node: any, macroName?: string | undefined) => node is import("@unified-latex/unified-latex-types/dist").Macro, math: (node: any) => node is import("@unified-latex/unified-latex-types/dist").InlineMath | import("@unified-latex/unified-latex-types/dist").DisplayMath, parbreak: (node: any) => node is import("@unified-latex/unified-latex-types/dist").Parbreak, string: (node: any, value?: string | undefined) => node is import("@unified-latex/unified-latex-types/dist").String, whitespace: (node: any) => node is import("@unified-latex/unified-latex-types/dist").Whitespace;
    

    However @unified-latex/unified-latex-types as published in npm doesn't contain a directory dist.

    opened by theseanl 2
  • Question: How to add support of new macros

    Question: How to add support of new macros

    I using this library in following way:

    import { parse } from "@unified-latex/unified-latex-util-parse";
    import { convertToHtml } from "@unified-latex/unified-latex-to-hast";
    
    const ast = parse(content ?? "");
    const html = convertToHtml(ast);
    

    I want to add support for custom implementation for macro:

    <<Some text>> % this should be generated like &laquo;Some text&raquo;
    % It looks like this is a common feature and should be added to this library.
    
    \includegraphics{url} % this should be generated like <img src="PREFIX/url" />
    % It looks like it's a custom implementation, at least for html generation.
    

    How can I add this support? It seems I should add a macro to parse and an html generator to convertToHtml, but the current interface doesn't allow this.

    opened by udovin 2
  • Thank you!

    Thank you!

    Sorry for clogging up your issues, but I just quickly wanted to thank you for this creating and maintaining this project! I tried to do something similar a couple of months ago here and here but your implemenation seems much more robust! I'll be switching to your version at some point, so thank you!

    opened by tefkah 1
  • Macro default argument support

    Macro default argument support

    I'm trying to write some custom plugins for transforming latex, and so far, I really liked your work, especially in how you provide every required toolsets via modular package structure. While experimenting with it, I noticed that currently there's no support for macros with default arguments.

    % input.tex
    \documentclass{article}
    % \newcommand\foo[2][bar]{#1,#2}
    \begin{document}
        \foo[bar]{baz}
        \foo{baz}
    \end{document}
    
    unified-latex input.tex -e "\\newcommand\\foo[2][bar]{#1,#2}"
    

    Then the result will be

    \documentclass{article}
    
    %\newcommand\foo[2][bar]{#1,#2}
    
    \begin{document}
            bar,baz
    
            ,baz
    \end{document}
    

    where the expected output would have two identical lines of bar,baz.

    In unified-latex-util-macros, an xpase-style signature is computed, but it produces o irrespective of the presence of a default argument. I've also noticed that manually passing O{bar} m via \NewDocumentCommand still produces unexpected results, so there should be another issue regarding macro expanders.

    It'd be nice if this support can be added. I'm wondering to know if there's any blocker for implementing it, I'm interested in sending a PR.

    opened by theseanl 1
  • So cool!

    So cool!

    Hi @siefkenj -- just wanted to say thanks for this library, I just started using it to parse some latex documents, working great right now, I am sure I will have some questions as I get into it more.

    I also wanted to introduce myself 👋 and a project I am working on, where there might be a chance to collaborate / coordinate a bit with this project if you all are game!?

    I am working for the @executablebooks project on MyST Markdown, which is a markup language that is gaining traction in the python communities through tools like JupyterBook, Sphinx (as it is based on RST). We are currently working on standardizing some of the AST underlying MyST, initial work is here: https://spec.myst.tools/

    This allows some pretty cool web-based rendering of things like cross-references and all sorts of other citations (e.g. see typography for inline demos). We have also started working on creating various latex templates so that you can take your markup and write it to latex using one of a few hundred journals. We have also just started pushing into JATS export (on every inline demo in the docs), which is used in scientific publishing and archiving.

    All of this is one way at the moment (with the upcoming exception of JATS):

    image

    What would be amazing would be to have some interop with unified-latex to support reading (and probably in the future better/prettier writing) of latex documents.

    I am not quite sure what next steps would be for that, I would be happy to meet and share our project's vision/goals? I am mostly here to show enthusiasm for your project. 🚀 :)

    opened by rowanc1 1
  • Multiple newlines processed as multiple whitspace in math mode

    Multiple newlines processed as multiple whitspace in math mode

    Parsing

    $x
    
    y$
    

    produces x followed by two whitespaces followed by y. All whitespace should be collapsed (otherwise it can produce a blank line when pretty-printing).

    opened by siefkenj 0
  • Attempt to parse math

    Attempt to parse math

    Currently, aside from recognizing groups and environments, math is treated as a simple stream of tokens. As LaTeX is known for its excellent math equation generation, better parsing of math allows better formatting of math expressions, easing maintenance of documentation websites making heavy use of math notation.

    Here is a rough idea on how to implement it, inspired in part by programming language parsers:

    expr_unit "expression unit"
        = num+
        / char
        / whitespace
        / !binary_macro macro
        / begin_group b_expr end_group
    
    power "superscript and subscript"
        = expr_unit
        / power "^" expr_unit
        / power "_" expr_unit
    
    d_expr "delimited expression"
        = power
        / left_delimiter b_expr? right_delimiter
    
    m_expr "multiplication expression"
        = power
        / m_expr power
    
    u_expr "unary plus/minus expression"
        = m_expr
        / [+-] u_expr
    
    b_expr "binary expression"
        = u_expr
        / b_expr binary_operator u_expr
    

    A description in table form for the commons:

    | Operator | Description | | :------------------------ | :----------------------------------- | | Superscript and subscript | Normally ^ and _. | Logical delimitation | Parentheses, brackets, \{, \}, and \left-\right pairs. The delimiter on the left and right can be different. | "Multiplication" | Denoted by the lack of operators between two logical units. 2x is considered a "multiplication" expression, while 2 \times x is not. | Unary operators | +, -, \neg, etc. | Binary operators | +, -, =, \times, \le, etc.

    opened by mcendu 1
Owner
Jason Siefken
Jason Siefken
Hemsida för personer i Sverige som kan och vill erbjuda boende till människor på flykt

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

null 4 May 3, 2022
Kurs-repo för kursen Webbserver och Databaser

Webbserver och databaser This repository is meant for CME students to access exercises and codealongs that happen throughout the course. I hope you wi

null 14 Jan 3, 2023
A docsify.js plugin for rendering LaTeX math blocks from markdown

docsify-latex A docsify.js plugin for typesetting LaTeX with display engines from markdown. Docsify + LaTeX = ❤️ Installation Add JavaScript LaTeX dis

Scruel Tao 8 Dec 25, 2022
A unified and lightweight web application framework for multiple platforms.

Handlers.js Handlers.js is a unified and lightweight web application framework for multiple platforms. import handlerJS from "handlers.js"; const App

186526 7 Jul 26, 2022
This repo contains utility tools for manipulating files, process images and automation.

utility-tools-cli This repo contains utility tools which makes life lil bit easier. Features Rename Files in a Folder with the convention you want. Re

Wasim Raja 4 Nov 4, 2022
Framework agnostic CLI tool for routes parsing and generation of a type-safe helper for safe route usage. 🗺️ Remix driver included. 🤟

About routes-gen is a framework agnostic CLI tool for routes parsing and generation of a type-safe helper for safe route usage. Think of it as Prisma,

Stratulat Alexandru 192 Jan 2, 2023
Unified JavaScript logging system. KISS, light and library free.

Logger.js Logger.js is a JavaScript ES6 module that offers a unified console output across Firefox and Chromium based browsers. It handles standard er

Arthur Beaulieu 1 Oct 1, 2020
Incredible drastically simplifies creation of developer video content. It offers a unified workflow to storyboard, record, collaborate and produce the video.

?? Introduction Incredible drastically simplifies creation of developer video content. It offers a unified workflow to storyboard, record, collaborate

Incredible 113 Dec 6, 2022
Blogkit - A unified blogging engine built with Next.js

Blogkit (beta) Blogkit is a unified blog engine inspired by Sairin. Get started with starter templates Template Description blogkit-notion-starter Not

2nthony 7 Jun 9, 2022
Unified-myst is a monorepo containing packages for using MyST

unified-myst (IN-DEVELOPMENT) unified-myst is a monorepo containing packages for using MyST (Markedly Structured Text), within the unified ecosystem.

Executable Books 5 Apr 14, 2022
This is a Google Apps Script library for parsing the form object from HTML form and appending the submitted values to the Spreadsheet.

HtmlFormApp Overview This is a Google Apps Script library for parsing the form object from HTML form and appending the submitted values to the Spreads

Kanshi TANAIKE 18 Oct 23, 2022
Runtime object parsing and validation with static TypeScript typing.

TypeParse Runtime object transformation, parsing and validation with inferred static TypeScript typing. Install Using npm npm install typeparse Using

Kenneth Herrera 4 May 5, 2022
Command-line toolkit for parsing, compiling, transpiling, optimizing, linking, dataizing, and running EOLANG programs

First, you install npm and Java SE. Then, you install eolang package: $ npm install eolang Then, you write a simple EO program in hello.eo file in th

objectionary 17 Nov 17, 2022
Enhanced interval features for Node.js, such as promisified interval and human readable time parsing.

Interval-next Interval-next is a package that extends Javascript's built-in setInterval() capabilities. You have a plain and promisified interval meth

Snowy 5 Jul 28, 2022
A TOML parsing tool written in Rust for Node.js

@daydog/toml A TOML parsing tool written in Rust for Node.js Installation @daydog/toml is available via npm. npm install @daydog/toml Usage parse You

null 5 Jul 20, 2022
JavaScript library for parsing Dirtywave M8 files, complete with a CLI for interacting with M8 files.

m8-js This repository contains a JavaScript library for parsing Dirtywave M8 files, as well as a CLI for interacting with M8 files. The hopes are not

Jeremy Whitlock 20 Dec 17, 2022
A parsing library for CircleCI configuration files, powered by the CircleCI Config SDK

CircleCI Config Parser A parsing library for CircleCI configuration files, powered by the CircleCI Config SDK Used by the CircleCI Visual Config Edito

CircleCI Public 16 Dec 4, 2022
front-end framework for fast and powerful configuration of utilities and intuitive UI

front-end framework for fast and powerful configuration of utilities and intuitive UI Getting Started with Vector → Getting started A variety of optio

Skill Class 12 Jun 29, 2022
front-end framework for fast and powerful configuration of utilities and intuitive UI

front-end framework for fast and powerful configuration of utilities and intuitive UI Getting Started with Vector → Getting started A variety of optio

DE:MO 12 Jun 29, 2022