A probabilistic programming language based on pattern-rewriting

Overview

MJr-compiler

MJr is a probabilistic programming language based on pattern-rewriting, heavily inspired by MarkovJunior by Maxim Gumin. This project provides a compiler for the MJr programming language, which compiles MJr programs to JavaScript, TypeScript or Python 3.

image

Examples

Documentation

See the documentation and implementation notes for more information.

Comments
  • Logical operations on input patterns

    Logical operations on input patterns

    "Input patterns" are used in rewrite rules, observe rules, and some other places to determine matches; conceptually, an input pattern is used as a boolean test on a grid position, where the test is true whereever the pattern matches and false otherwise. It would be interesting to allow logical operations on patterns, e.g. [W..W] and not [.WW.] would match two W symbols with a gap of 2 between them, but only if they are not connected along that gap by W symbols.

    This would expand the expressiveness of the language with, potentially, no performance cost: in the current implementation, each DFA state is mapped to an "accept set" which is a set of patterns matched, so any logical operations could be performed at compile-time when computing these "accept sets". (Care would be needed for patterns like not [W], to ensure the DFA's initial 'zero' state still doesn't match them, otherwise the initial call to gridN_update may not correctly report matches in the empty grid. This should be treated like [.] and not [W].)

    Additionally, this would eliminate the need for some statements to check whether an output pattern is already present in the grid: a rewrite rule such as a -> b would be equivalent to a and not b -> b. (Currently, this check is only eliminated when it is statically determined that a and b are mutually exclusive.)

    enhancement 
    opened by kaya3 1
  • Patterns which capture

    Patterns which capture

    It would be useful to be able to express relationships between an input pattern and an output pattern, using something like "capturing groups" (in regex terminology). Taking the River model as an example, the statement

    one:
        [EB] -> [.E]
        [GB] -> [.G]
    

    could be more succinctly written as something like one: [[EG]B] -> [.\1] where \1 is a placeholder for the symbol matched by the union [EG].

    • Need to decide on syntax.
    • Semantically this could be thought of as syntax sugar for describing a set of rewrite rules, but it would be better if rules with capturing groups are not actually expanded at compile-time, since multiple capturing groups would compile to exponentially-sized output code. Better to match the pattern [[EG]B] as usual and do the capturing at runtime.
    • Need to decide semantics of logical operations and, or and not on patterns with capturing groups.
    • May need to include information about capturing groups in pattern types, and figure out how this could work (if at all) when the output pattern is not a compile-time constant.
    enhancement 
    opened by kaya3 0
  • 3D grids

    3D grids

    At the moment only 2D grids are supported, but it would be good to support 3D grids. This will necessarily require changes to many parts of the compiler:

    • Need to decide if a program with multiple grids must have them all 2D or all 3D, and choose a syntax for specifying whether a program or grid is 2D or 3D. (Perhaps grid {dim=3} ...?)
    • Tokenizer/parser needs to support that syntax and allow for 3D pattern literals. Probably use // for the separator in the z direction.
    • Pattern types and the Pattern class need a depth property.
    • 3D symmetry groups and functions to apply 3D symmetries to patterns.
    • If 2D and 3D grids can coexist, need to decide the semantics of a map statement between grids of different dimensions (or perhaps have a separate project statement?)
    • Position attribute expressions need to allow .z, but only for positions in 3D grids.
    • Grid index expressions in the compiler need to include a z * width * height term. Ideally, the compiler should not emit declarations like atZ = ... for 2D grids, but this adds complexity.
    enhancement 
    opened by kaya3 0
  • Grid- and position-dependent output patterns in `all` statements

    Grid- and position-dependent output patterns in `all` statements

    Currently, all statements do not support output patterns which depend both on the current grid state and the match position. The reason for this is that either the output patterns are computed just once each (so they must be valid for all match positions) or they are computed in the loop which applies rewrites (so the grid state may be different to the one for which they should be evaluated). The best solution is probably to allow a separate "write buffer" for the grid, so that the old grid state is still available.

    Minimal example (playground link):

    grid [BRGU]
    symmetry "none"
    
    # works with 'one' but not 'all'
    all: [B] -> (
        [R] if at.x % 2 == 0
        else [G] if count [R] > 10
        else [U]
    )
    
    bug 
    opened by kaya3 1
  • Alternative matching strategies for some patterns

    Alternative matching strategies for some patterns

    The current implementation uses a pair of DFAs to keep track of all matches of all input patterns in the grid. This is suboptimal for 1x1 patterns, because each 1x1 pattern is matched by many different DFA states, causing duplication of the match handling code; something similar occurs for patterns of height 1.

    Additionally, the DFAs can get quite big when they have to match patterns larger than 4x4; for example the compiled output for the BasicDungeonGrowth model is about 833KB in size, simply because it has a single 5x5 pattern with 8 symmetries. Likewise, the PacMan model compiles to about 872KB, about 75% of which is attributable to just three input patterns. This is also a performance issue, since the time required to update the DFA states scales with the size of the largest input pattern, even if only one cell in the grid is changed.

    Some better strategies:

    • For 1x1 input patterns, there is no need for DFAs at all; matching them is trivial.
    • For other patterns of height 1, matches can be handled directly by the "row DFA", reducing the number of patterns that the "col DFA" needs to recognise.
    • For large input patterns, revert to either a direct search or something like the algorithm used in the original MarkovJunior project (source link).

    In the latter case, control-flow analysis may allow such input patterns to be ignored in parts of the program where matches will no longer be used.

    enhancement 
    opened by kaya3 2
  • Alphabet reduction

    Alphabet reduction

    For some programs it is in principle possible to statically determine (by control-flow analysis) that some alphabet symbols can never occur simultaneously in the same grid state. For example, in the River model, the symbols W and R can never coexist with G and E. This means that, if the program is not animated and the grid state is never logged (so that only the final grid state is ever observed), then the program would have the same observable behaviour if W and R were removed from the alphabet and replaced with G and E in all patterns, as in the RiverOptimised model.

    Reducing the size of the alphabet means that smaller DFAs are emitted by the compiler, and this is significant because the DFA makes up a large proportion of the generated code. For the River example, the compiled output for the optimised model is about 30% smaller than for the unoptimised model, or 35% when the compiled output is minified.

    enhancement 
    opened by kaya3 0
  • Samplers for rules with few matches

    Samplers for rules with few matches

    A "sampler" is a data structure containing the current matches for some set of input patterns. In the current implementation, a sampler is implemented as a pair of arrays (one for the matches, one for index lookups) with sufficient capacity in case every pattern simultaneously matches at every position. This means that no memory allocation ever has to be done in the main loop, but it also wastes memory, particularly when the grids are large and some patterns will only ever have a small number of matches, and this may also impact performance due to cache misses.

    Some examples where this optimisation would be relevant:

    • In the BasicSnake model, the patterns [RBW] and [RBD] can have at most three simultaneous matches each, and [PGG] can have at most one.
    • In the MazesAndLakes model, the patterns [RBB] and [RWW] can have at most 3 * LAND_SEEDS and LAND_SEEDS simultaneous matches respectively.

    If a set of patterns will only ever have one match at a time, then the sampler can be replaced with a single variable for the position of the current match (or -1 if there is no match). Otherwise, if the number of matches will only ever be small, then the sampler can be implemented as a single unsorted array, and old matches can be removed by linear search.

    The main problem is knowing when this optimisation is possible. The simplest solution would be to let the programmer provide hints for which patterns can't have many matches; or it could be detected using control-flow analysis for grid symbols.

    enhancement 
    opened by kaya3 0
Owner
Andrew Kay
Andrew Kay
When a person that doesn't know how to create a programming language tries to create a programming language

Kochanowski Online Spróbuj Kochanowskiego bez konfiguracji projektu! https://mmusielik.xyz/projects/kochanowski Instalacja Stwórz nowy projekt przez n

Maciej Musielik 18 Dec 4, 2022
Cookbook Method is the process of learning a programming language by building up a repository of small programs that implement specific programming concepts.

CookBook - Hacktoberfest Find the book you want to read next! PRESENTED BY What is CookBook? A cookbook in the programming context is collection of ti

GDSC-NITH 16 Nov 17, 2022
Write "hello world" in your native language, code "hello world" in your favorite programming language!

Hello World, All languages! ?? ?? Write "hello world" in your native language, code "hello world" in your favorite language! #hacktoberfest2022 How to

Carolina Calixto 6 Dec 13, 2022
A Flow-based programming language for universal applications.

Hlang A Flow-based programming language for universal applications. Hlang aims to make programming easier, faster and more comfortable. It avoids codi

HSET 5 Dec 25, 2022
⚡ Extremely fast online playground for every programming language.

Riju Riju is a very fast online playground for every programming language. In less than a second, you can start playing with a Python interpreter or c

Radon Rosborough 845 Dec 28, 2022
Programming language makes by hobby <3

Joule Programming language makes by hobby <3 About It's a compiled programming language written in JavaScript :/. Codes compile into Assembly code tha

null 6 Feb 17, 2022
VSCode extension for the rickroll-lang programming language (incomplete)

Rickroll-Lang VSCode Extension The Rick Roll programming language is a rickroll based, process oriented, dynamic, strong, esoteric programming languag

Siddhesh Zantye 6 Oct 10, 2022
A programming language (WIP)

Umo programming language Concept Effect system (not implemented) Subtyping (not implemented) Opt-in shared mutability (not implemented) Gradual typing

Masaki Hara 15 Feb 25, 2022
Jaksel Script, Programming language very modern and Indonesian style

Jaksel Script Jaksel Script is a new programming language, very modern, easy to learn, using Indonesia-slang language. No programming experience requi

Rio Chandra 823 Jan 3, 2023
One-stop Go+ programming language environment manager

InGop (Go+ language) One-stop Go+ programming language environment manager How do I learn the Go+ programming language quickly? Start with installatio

uiuing 15 Nov 21, 2022
An enhanced VSCode extension for the Move programming language.

Move Analyzer Plus Provides language support for the Move programming language. Install on the VSCode Extension Marketplace: Move Analyzer Plus on the

The Moving Company 10 Aug 12, 2022
Simple JSON parse/stringify for the Wren programming language

wren-json Simple JSON parse/stringify for the Wren programming language. Parses strict json and relaxed json Comments Unquoted keys Trailing commas St

.ruby 8 May 18, 2022
The interpretation implementation implemented programming language built for fun. I'm currently boring in full stack web development. So, I crafted this one LoL. 👻

What's Wuttyi? Everything is expression ?? I just developed this tiny programming language because of boring in higher level programming construct. Mo

Aung Myat Moe 9 Dec 13, 2022
Simple Math Programming Language (SMPL) for the Web

Simple Math Programming Language (SMPL) for the Web SMPL is a math-oriented programming language that can be interpreted in the browser. Its primary u

mathe:buddy / TH Köln 6 Dec 15, 2022
A type programming language which compiles to and interops with type-level TypeScript

Prakaar Prakaar (hindi for "type") is a type programming language which compiles to and interops with type-level TypeScript. Prakaar itself is also a

Devansh Jethmalani 17 Sep 21, 2022
This is a boilerplate for creating your own languages for various use cases. You can even create your own programming language from scratch!

Bootstrap compiler This is a bootstrap compiler that can be used to compile to compiler written in the target language. You can write a compiler in th

Kaan 3 Nov 14, 2022
This is a project that is in partial fulfillment of our CSCI 318 - Programming Language Concepts class of the Fall 2022 semester in New York Institute of Technology

StreetEasier A hub to search for apartments and roommate matching This project was bootstrapped with Create React App. Want to Test Yourself? 1.) Clon

Kennette James Maddela 3 Dec 5, 2022
The repository shows the compiler (simulator) of the Little Man Computer, which also contains some programs in the LMC programming language for implementing different functions.

Little Man Computer The repository shows the compiler (simulator) of the Little Man Computer, which also contains some programs in the LMC programming

Cow Cheng 2 Nov 17, 2022