Modest natural-language processing

Overview
compromise
modest natural language processing
npm install compromise

isn't it weird how we can write text, but not parse it?
    ᔐᖜ↬- and how we can't get the information back out?⇬

it's like we've agreed that
text is a dead-end.
and the knowledge in it
should not really be used.

compromise tries its best to parse text.
it is small, quick, and often good-enough.
it is not as smart as you'd think.

.match():

interpret and match text:

let doc = nlp(entireNovel)
doc.match('the #Adjective of times').text()
// "the blurst of times?"
if (doc.has('simon says #Verb') === false) {
  return null
}

.verbs():

conjugate and negate verbs in any tense:

let doc = nlp('she sells seashells by the seashore.')
doc.verbs().toPastTense()
doc.text()
// 'she sold seashells by the seashore.'

.nouns():

play between plural, singular and possessive forms:

let doc = nlp('the purple dinosaur')
doc.nouns().toPlural()
doc.text()
// 'the purple dinosaurs'

.numbers():

interpret plain-text numbers

nlp.extend(require('compromise-numbers'))

let doc = nlp('ninety five thousand and fifty two')
doc.numbers().add(2)
doc.text()
// 'ninety five thousand and fifty four'

.topics():

names/places/orgs, tldr:

let doc = nlp(buddyHolly)
doc.people().if('mary').json()
// [{text:'Mary Tyler Moore'}]

let doc = nlp(freshPrince)
doc.places().first().text()
// 'West Phillidelphia'

doc = nlp('the opera about richard nixon visiting china')
doc.topics().json()
// [
//   { text: 'richard nixon' },
//   { text: 'china' }
// ]

.contractions():

handle implicit terms:

let doc = nlp("we're not gonna take it, no we ain't gonna take it.")

// match an implicit term
doc.has('going') // true

// transform
doc.contractions().expand()
dox.text()
// 'we are not going to take it, no we are not going to take it.'

Use it on the client-side:

">
<script src="https://unpkg.com/compromise">script>
<script src="https://unpkg.com/compromise-numbers">script>
<script>
  nlp.extend(compromiseNumbers)

  var doc = nlp('two bottles of beer')
  doc.numbers().minus(1)
  document.body.innerHTML = doc.text()
  // 'one bottle of beer'
script>

as an es-module:

import nlp from 'compromise'

var doc = nlp('London is calling')
doc.verbs().toNegative()
// 'London is not calling'

compromise is 180kb (minified):

it's pretty fast. It can run on keypress:

it works mainly by conjugating all forms of a basic word list.

The final lexicon is ~14,000 words:

you can read more about how it works, here. it's weird.

.extend():

decide how words get interpreted:

let myWords = {
  kermit: 'FirstName',
  fozzie: 'FirstName',
}
let doc = nlp(muppetText, myWords)

or make heavier changes with a compromise-plugin.

const nlp = require('compromise')

nlp.extend((Doc, world) => {
  // add new tags
  world.addTags({
    Character: {
      isA: 'Person',
      notA: 'Adjective',
    },
  })

  // add or change words in the lexicon
  world.addWords({
    kermit: 'Character',
    gonzo: 'Character',
  })

  // add methods to run after the tagger
  world.postProcess(doc => {
    doc.match('light the lights').tag('#Verb . #Plural')
  })

  // add a whole new method
  Doc.prototype.kermitVoice = function () {
    this.sentences().prepend('well,')
    this.match('i [(am|was)]').prepend('um,')
    return this
  }
})

Docs:

gentle introduction:
Documentation:
Concepts API Plugins
Accuracy Accessors Adjectives
Caching Constructor-methods Dates
Case Contractions Export
Filesize Insert Hash
Internals Json Html
Justification Lists Keypress
Lexicon Loops Ngrams
Match-syntax Match Numbers
Performance Nouns Paragraphs
Plugins Output Scan
Projects Selections Sentences
Tagger Sorting Syllables
Tags Split Pronounce
Tokenization Text Strict
Named-Entities Utils Penn-tags
Whitespace Verbs Typeahead
World data Normalization
Fuzzy-matching Typescript
Talks:
Articles:
Some fun Applications:

API:

Constructor

(these methods are on the nlp object)

  • .tokenize() - parse text without running POS-tagging
  • .extend() - mix in a compromise-plugin
  • .fromJSON() - load a compromise object from .json() result
  • .verbose() - log our decision-making for debugging
  • .version() - current semver version of the library
  • .world() - grab all current linguistic data
  • .parseMatch() - pre-parse any match statements for faster lookups
Utils
  • .all() - return the whole original document ('zoom out')
  • .found [getter] - is this document empty?
  • .parent() - return the previous result
  • .parents() - return all of the previous results
  • .tagger() - (re-)run the part-of-speech tagger on this document
  • .wordCount() - count the # of terms in the document
  • .length [getter] - count the # of characters in the document (string length)
  • .clone() - deep-copy the document, so that no references remain
  • .cache({}) - freeze the current state of the document, for speed-purposes
  • .uncache() - un-freezes the current state of the document, so it may be transformed
Accessors
Match

(all match methods use the match-syntax.)

  • .match('') - return a new Doc, with this one as a parent
  • .not('') - return all results except for this
  • .matchOne('') - return only the first match
  • .if('') - return each current phrase, only if it contains this match ('only')
  • .ifNo('') - Filter-out any current phrases that have this match ('notIf')
  • .has('') - Return a boolean if this match exists
  • .lookBehind('') - search through earlier terms, in the sentence
  • .lookAhead('') - search through following terms, in the sentence
  • .before('') - return all terms before a match, in each phrase
  • .after('') - return all terms after a match, in each phrase
  • .lookup([]) - quick find for an array of string matches
Case
Whitespace
  • .pre('') - add this punctuation or whitespace before each match
  • .post('') - add this punctuation or whitespace after each match
  • .trim() - remove start and end whitespace
  • .hyphenate() - connect words with hyphen, and remove whitespace
  • .dehyphenate() - remove hyphens between words, and set whitespace
  • .toQuotations() - add quotation marks around these matches
  • .toParentheses() - add brackets around these matches
Tag
  • .tag('') - Give all terms the given tag
  • .tagSafe('') - Only apply tag to terms if it is consistent with current tags
  • .unTag('') - Remove this term from the given terms
  • .canBe('') - return only the terms that can be this tag
Loops
  • .map(fn) - run each phrase through a function, and create a new document
  • .forEach(fn) - run a function on each phrase, as an individual document
  • .filter(fn) - return only the phrases that return true
  • .find(fn) - return a document with only the first phrase that matches
  • .some(fn) - return true or false if there is one matching phrase
  • .random(fn) - sample a subset of the results
Insert
Transform
Output
Selections
Subsets

Plugins:

These are some helpful extensions:

Adjectives

npm install compromise-adjectives

Dates

npm install compromise-dates

Numbers

npm install compromise-numbers

Export

npm install compromise-export

  • .export() - store a parsed document for later use
  • nlp.load() - re-generate a Doc object from .export() results
Html

npm install compromise-html

  • .html({}) - generate sanitized html from the document
Hash

npm install compromise-hash

  • .hash() - generate an md5 hash from the document+tags
  • .isEqual(doc) - compare the hash of two documents for semantic-equality
Keypress

npm install compromise-keypress

Ngrams

npm install compromise-ngrams

Paragraphs

npm install compromise-paragraphs this plugin creates a wrapper around the default sentence objects.

Sentences

npm install compromise-sentences

Strict-match

npm install compromise-strict

Syllables

npm install compromise-syllables

  • .syllables() - split each term by its typical pronounciation
Penn-tags

npm install compromise-penn-tags


Typescript

we're committed to typescript/deno support, both in main and in the official-plugins:

import nlp from 'compromise'
import ngrams from 'compromise-ngrams'
import numbers from 'compromise-numbers'

const nlpEx = nlp.extend(ngrams).extend(numbers)

nlpEx('This is type safe!').ngrams({ min: 1 })
nlpEx('This is type safe!').numbers()

Partial-builds

or if you don't care about POS-tagging, you can use the tokenize-only build: (90kb!)

">
<script src="https://unpkg.com/compromise/builds/compromise-tokenize.js">script>
<script>
  var doc = nlp('No, my son is also named Bort.')

  //you can see the text has no tags
  console.log(doc.has('#Noun')) //false

  //the rest of the api still works
  console.log(doc.has('my .* is .? named /^b[oa]rt/')) //true
script>

Limitations:

  • slash-support: We currently split slashes up as different words, like we do for hyphens. so things like this don't work: nlp('the koala eats/shoots/leaves').has('koala leaves') //false

  • inter-sentence match: By default, sentences are the top-level abstraction. Inter-sentence, or multi-sentence matches aren't supported without a plugin: nlp("that's it. Back to Winnipeg!").has('it back')//false

  • nested match syntax: the danger beauty of regex is that you can recurse indefinitely. Our match syntax is much weaker. Things like this are not (yet) possible: doc.match('(modern (major|minor))? general') complex matches must be achieved with successive .match() statements.

  • dependency parsing: Proper sentence transformation requires understanding the syntax tree of a sentence, which we don't currently do. We should! Help wanted with this.

FAQ

    ☂️ Isn't javascript too...

      yeah it is!
      it wasn't built to compete with NLTK, and may not fit every project.
      string processing is synchronous too, and parallelizing node processes is weird.
      See here for information about speed & performance, and here for project motivations

    💃 Can it run on my arduino-watch?

      Only if it's water-proof!
      Read quick start for running compromise in workers, mobile apps, and all sorts of funny environments.

    🌎 Compromise in other Languages?

      we've got work-in-progress forks for German and French, in the same philosophy.
      and need some help.

    Partial builds?

      we do offer a compromise-tokenize build, which has the POS-tagger pulled-out.
      but otherwise, compromise isn't easily tree-shaken.
      the tagging methods are competitive, and greedy, so it's not recommended to pull things out.
      Note that without a full POS-tagging, the contraction-parser won't work perfectly. ((spencer's cool) vs. (spencer's house))
      It's recommended to run the library fully.

See Also:

MIT

Comments
  • "Feauture" request: TypeScript definition file

    A lot of us now-a-days are using Angular 2 with TypeScript, and we would love to see a definition file (.d.ts) for nlp_compromise. I'm willing to volunteer in the making of it, but I wouldn't want to do it alone as nlp_compromise is a pretty large library.

    Thoughts? More volunteers?

    Thanks!

    yesss operations 
    opened by ghost 52
  • performance / why is it running twice ...

    performance / why is it running twice ...

    Hey there, contributing from my fork doesn't make sense because the structure will change to 'only the 3 dictionary files and a factory' soon. However - let me ask some perfomance questions. Maybe I missed something, hidden in the code, but --> several 'autoclosure' functions run every time when a module is required.

    Let's take an example - the conjugation of verbs which is used quite often. I'll use simple console.log to demonstrate it.

    In src/parents/verb/index

    put some logs in the conjugate function

    the.conjugate = function() {
      console.log( 'BEWARE! conjugate is conjugating' );
      verb_conjugate = require('./conjugate');
      var conjugated = verb_conjugate(the.word);
      console.log( 'conjugate result', conjugated );
      return conjugated; //verb_conjugate(the.word);
    }
    

    and in the 'autoclosure' form function

    the.form = (function() {
        console.log( 'BEWARE! the.form is conjugating' );
        verb_conjugate = require('./conjugate');
        // don't choose infinitive if infinitive == present
        var order = [
          'past',
          'present',
          'gerund',
          'infinitive'
        ];
        var forms = verb_conjugate(the.word);
        console.log( 'forms result', forms );
        for (var i = 0; i < order.length; i++) {
            if (forms[order[i]] === the.word) {
                return order[i];
            }
        }
    })()
    

    When I do

    console.log( nlp.verb('last') );
    

    it will conjugate

    and when I do

    console.log( nlp.verb('last').conjugate() );
    

    it will conjugate twice

    opened by redaktor 27
  • new match2 plugin

    new match2 plugin

    Hello,

    I'm building my own match function, probably add it as an extension because it requires a dependency which will increase the code size.

    I have so far figured out that I can use termList to get all the terms to match against and now I'm trying to save the groups. How do I set the groups for a matched term, I need to set multiple by name, number, and the total match as group 0? The code for the match function is a bit hard to follow.

    Also turns out using a subset of termList with buildFrom to build a doc doesn't actually work as expected, it fails when I run text() on the new document.

    yesss 
    opened by kelvinhammond 23
  • Changes in the fork and the pull request ...

    Changes in the fork and the pull request ...

    Hey,

    just commited nearly the last changes to the fork https://github.com/redaktor/nlp_compromise before I could do a pull request.

    I need to eliminate the • 'hardcoded' dups in lexicon generation • last 37/1360(?) tests failing

    The lexicon will be at least 10% smaller then and I really think starting with this structure language dependent contributing can become easy. Just because I saw you were recently active ...

    opened by redaktor 22
  • Add support for HTML

    Add support for HTML

    Hi, It would be useful to be able to use it on HTML pages. The simplest solution would be to teach parser to recognize and skip HTML tags. A more sophisticated solution would be able to recognize block/inline tags and add sentence splits or ignore the tag. This might fail dramatically depending on CSS though.

    hmmm enhancement 
    opened by ershov 18
  • Support named capture groups

    Support named capture groups

    Based on previous discussions. Looking at support that better involved Doc and Phrase, and returns Doc from basic function. Currently calling functions .named() to avoid overlap between Phrase function for easy access vs Phrase object. Seemed cleaner than putting everything into the Doc function.

    • [x] Smaller code
    • [ ] Performance review
    • [x] Merge overlapping named groups into a single document
    • [x] Support multiple groups per phrase
      • [x] Add support
      • [x] Make sure we're always giving terms a group name (or find a better way to do this)
    • [x] Add syntax support for string based matching
      • [x] Choose string syntax
        • [<name>#Foo]
        • [<name> #Foo]
        • [?<name> #Foo]
        • [?<name>#Foo] - No space to match JS proposal ?<name>#Foo
        • ?<name>[#Foo] - Clarity that name effects whole group
        • ?<name>#Foo - Make this entirely separate from capture groups
    • [x] Add function for returning named groups as object
    • [x] Phrase.prototype.names - object containg meta regarding named groups
    • [x] Phrase.named() - returned named phrases uses .names object
    • [x] Phrase.named(target?: string) - returned target named phrases uses .names object
    • [x] Phrase.named(target?: number) - returned target named phrases uses .names object
    • [x] Doc.named() - create document from named phrases
    • [x] Doc.named(target?: string) - create document from target named phrases
    • [x] Doc.named(target?: number) - create document from target named phrases
    • [x] Store named capture group data in Phrase during matchAll
    • [x] Persist .name across Phrases

    Where else does support need to be added?

    Support in tags object syntax checks if capture is a string, instead of boolean this allows the rest of capture related logic to work as normal. Should we create a whole new value instead? i.e. name

    opened by Drache93 17
  • Consistent interface

    Consistent interface

    Just wanted to get your feedback on API consistency. I think we could do a lot more to make the API consistent and intuitive to use. Some examples:

    • to_past, to_present, to_past functions return a mutated wrapped instance but americanize and britishize return just the changed text. Is that by design?
    • Sentence.to_past returns the mutated Sentence instance but same doesn't apply for Text.to_past. (I don't know if there's any way to fix that other than extending Array.prototype)
    • Unlike other wrappers, (new NLP.term(...)).text is a string property rather than a function.

    Also, I think it might make things a lot less error-prone if we can make the entire library immutable (that'd break backwards compatibility, maybe offer an immutability helper?)

    kinda-big yesss 
    opened by creatorrr 17
  • Changes to nlp clone still affects original nlp

    Changes to nlp clone still affects original nlp

    Hi I'm trying to use multiple instances of nlp each with different plugins nlp.clone().extend(). However, since the world variable is global, any changes effect all nlp instances.

    This can be seen by updating the following test and adding a copy of the first check to the end.

    https://github.com/spencermountain/compromise/blob/51f1f158e6fcab14cdbf14f182b0c41f94e513e7/tests/plugin/addWords.test.js#L6-L8

    test('persistent-lexicon-change', function(t) {
      let nlp2 = nlp.clone()
      let doc = nlp('he is marko')
      t.equal(doc.match('#Place+').length, 0, 'default-no-place')
      t.equal(doc.match('#Person+').length, 1, 'default-one-person')
    
      nlp2.extend((Doc, world) => {
        world.addWords({
          marko: 'Place',
        })
      })
      doc = nlp2('he is marko')
      t.equal(doc.match('#Place+').length, 1, 'now-one-place')
      t.equal(doc.match('#Person+').length, 0, 'now-no-person')
    
     // ....
    
      // Tests fail here - original nlp should not be affected?
      doc = nlp('he is marko')
      t.equal(doc.match('#Place+').length, 0, 'default-no-place')
      t.equal(doc.match('#Person+').length, 1, 'default-one-person')
    
      t.end()
    })
    

    Am I misunderstanding the intention behind the clone() function? Thanks in advance!

    hmmm Discussion 
    opened by Drache93 16
  • [WIP] Add fractions support when parsing value

    [WIP] Add fractions support when parsing value

    It seems to work. Not quite sure how mixed fractions work but they do. The code to parse numbers is kind of hard to understand. Maybe use pegjs? Still have to write tests.

    opened by michaelmesser 14
  • date.parse() in nlp_core

    date.parse() in nlp_core

    I tried all the readme examples, and I got very mixed results. Did something in the library break or am I doing something wrong?

    nlp.text('She sells seashells').to_past()
    // An array containing the object shown below. Object contains text from original string.
    

    screenshot 2016-04-14 14 31 39

    nlp.noun("dinosaur").pluralize();
    // This works
    
    nlp.verb("speak").conjugate();
    // This works
    
    nlp.text('She sells seashells').negate()
    // Same issue as above
    
    nlp.sentence('I fed the dog').replace('the [Noun]', 'the cat')
    // Same issue as above
    
    nlp.text("Tony Hawk did a kickflip").people();
    // This works
    
    nlp.person("Tony Hawk").article();
    // "a" instead of "he" 
    
    nlp.value("five hundred and sixty").number;
    // This works
    

    I'm using Chrome Version 49.0.2623.112 (64-bit) on OS X I got the same results in Node.js 5.10.1

    kinda-big 
    opened by prashcr 14
  • Fractions toNumber

    Fractions toNumber

    Hey @spencermountain,

    I started working on this and noticed way too late that you'd already done some similar improvements, when I pulled dev and saw your new fraction tests failing....

    Mine does add support for converting text fractions to numbers and super long weird fractions

    > nlp('two hundred and twelve and one twentieths').values().toNumber().all().out()
    '212.05'
    

    shall I continue to work on this? or maybe you can incorporate into yours.

    opened by Jakeii 13
  • Possible typo on a city name

    Possible typo on a city name

    https://github.com/spencermountain/compromise/blob/e4b2c8d7e5409c112f58085d8b41f225470de700/data/lexicon/places/cities.js#L493

    You may be meaning Reims instead of Reimes? I couldn't find anything about a city named Reimes in Europe.

    bug fixed-on-dev 
    opened by purplnay 1
  • How to fix the error for tokenizing (getting lemmatized version of sentence) in German?

    How to fix the error for tokenizing (getting lemmatized version of sentence) in German?

    Hi! Thank you for you cool library! Getting the error for the code: `import nlp from "compromise";

    // Set the language to German nlp.plugin(nlp.german);

    // Define a function for lemmatization const lemmatize = (sentence) => { // Tokenize the sentence into individual words const tokens = nlp.text(sentence).tokens(); // Lemmatize each word and return the result as an array return tokens.map((token) => token.lemma()); };

    // Test the function with a German sentence const result = lemmatize("Ich bin der Hund, der bellt."); console.log(result); // Output: ["ich", "sein", "der", "Hund", ",", "der", "bellen", "."]` image

    What am I missing?

    opened by lofti198 1
  • Inconsistent tagging of #TextValue,  #Date and #Duration

    Inconsistent tagging of #TextValue, #Date and #Duration

    In below similar text examples, the POS tags for six is often #Value and #TextValue, but not always and month(s) is #Date and #Duration but not always.

    const nlp = require('compromise')
    
    const text = [
      'a six month session',
      'I have agreed a six month session',
      'six months',
      `A six-month session`,
      'I have agreed a six-month session',
    ]
    
    text.forEach((t) => nlp(t).debug())
    

    Outputs:

      ┌─────────
      │ 'a'        - Determiner
      │ 'six'      - Value, TextValue, Cardinal
      │ 'month'    - Date, Noun, Duration
      │ 'session'  - Noun, Singular
    
    
    
      ┌─────────
      │ 'six'      - Value, TextValue, Cardinal
      │ 'months'   - Noun, Plural
    
    
    
      ┌─────────
      │ 'A'        - Determiner
      │ 'six'      - Value, TextValue, Cardinal
      │ 'month'    - Date, Noun, Duration
      │ 'session'  - Noun, Singular
    
    
    
      ┌─────────
      │ 'I'        - Noun, Pronoun
      │ 'have'     - Verb, Auxiliary
      │ 'agreed'   - Verb, PastTense
      │ 'a'        - Determiner
      │ 'six'      - Noun, Singular
      │ 'month'    - Date, Noun, Duration
      │ 'session'  - Noun, Singular
    
    
    
    
      ┌─────────
      │ 'six'      - Value, TextValue, Cardinal
      │ 'months'   - Noun, Plural
    
    
    
      ┌─────────
      │ 'A'        - Determiner
      │ 'six'      - Value, TextValue, Cardinal
      │ 'month'    - Date, Noun, Duration
      │ 'session'  - Noun, Singular
    
    
    
      ┌─────────
      │ 'I'        - Noun, Pronoun
      │ 'have'     - Verb, Auxiliary
      │ 'agreed'   - Verb, PastTense
      │ 'a'        - Determiner
      │ 'six'      - Noun, Singular
      │ 'month'    - Date, Noun, Duration
      │ 'session'  - Noun, Singular
    

    It seems like these should have a more consistent tagging output? I can take a look at improving this in a PR if you give me some pointers on where to configure?

    quick fix tagger 
    opened by thegoatherder 2
  • Equivalent to nltk.corpus stopwords

    Equivalent to nltk.corpus stopwords

    Hi, I'm just learning about the project and it's pretty amazing. I tinkered with NTLK and Gensim before but this is so convenient to explore and embed on a page. Learning with Observable notebooks is also great!

    That being said I end up for a lot of noise in my selection. I tried a bit of normalize() and remove() with encouraging results. Still, I'm quite surprised that when I search in this repository I don't seem to find stop words.

    This made me wonder, is this the "wrong" way in this context? Is the philosophy of compromise not to rely on such lists?

    PS: I apologize for hijacking issues but is there a forum/chat/platform for discussions on using compromise that would a better place? I have other questions like using .tfidf() on .ngrams() but I don't make to create noise here.

    Discussion 
    opened by Utopiah 1
  • Match-case in syllable response

    Match-case in syllable response

    currently syllable results are normalized and lowercased. Often people want to hyphenize text inline, and this involves a weird mapping backward. I'm not sure what the best way to do this would be.

    let doc = nlp("Calgary Flames")
    doc.compute('syllables')
    doc.json({syllables:true})
    //["cal", "ga", "ry"]
    doc.text('syllables') // ?
    

    maybe something like .out('syllables')? Maybe it's enough to preserve case, and let users drop it if they want. I dunno!

    hmmm enhancement 
    opened by spencermountain 0
  • non-english parentheses tokenization

    non-english parentheses tokenization

    [email protected]

    Noticed the lib is removing some symbols unexpectedly when running the json methods.

    Ex: image

    The same issue happens with foo[bar] or foo{bar}.

    Expected behaviour:

    nlp('foo{bar} foo').json() =>
    [Object {]()
      text: "foo(bar) foo"
      terms: [Array(2) []()
      0: [Object {]()text: "foo(bar)", tags: Array(2), pre: "", post: " "}
      1: [Object {]()text: "foo", tags: Array(2), pre: "", post: ""}
    ]
    
    bug hmmm 
    opened by kant01ne 2
Releases(14.8.1)
  • 14.8.1(Dec 13, 2022)

  • 14.8.0(Nov 25, 2022)

    • [fix] - tagging fixes
    • [new] - add Person .presumedMale(), .presumedFemale() methods
    • [new] - add Pronoun class, .refersTo()
    • [new] - add Noun.references()
    • [new] - .nouns('spencer') shorthand as an if-match
    • [change] - "[do] you .." etc now #QuestionWord
    • [new] - add #Hyphenated tag
    • [fix] - improved Auxiliary verb tagging
    • [update] - dependencies
    Source code(tar.gz)
    Source code(zip)
  • 14.7.1(Nov 11, 2022)

  • 14.7.0(Nov 4, 2022)

    • [new] - match term id
    • [change] - tag text by default on .concat('')
    • [change] - allow modifying term prePunctuation
    • [new] - .wrap() method
    • [new] - .isFull() method
    • [new] - support full notIf matches on sweep
    • [fix] - text params for #953
    • [fix] - nouns().isSingular() missing
    • [change] - one-character w/ dash tokenization #977
    • [change] - allow setting model.one.prePunctuation + postPunctuation
    • [fix] - compromise-paragraphs plugin
    Source code(tar.gz)
    Source code(zip)
  • 14.6.0(Oct 21, 2022)

    • [change] - move internal conjugation methods
    • [update] - github scripts
    • [change] - fixes to .clauses() parser
    • [change] - an astrix is not a word
    • [new] - @hasColon method
    • [new] - @hasDash supports two dashes
    • [new] - #Passive verb tag
    • [new] - existential #There tag
    • [new] - add tense info to sentence json
    • [fix] - verb tokenization issues
    • [fix] - .replace() issues
    • [update] - dependencies
    Source code(tar.gz)
    Source code(zip)
  • 14.5.2(Oct 16, 2022)

  • 14.5.1(Oct 12, 2022)

  • 14.5.0(Aug 26, 2022)

    • [fix] - possible runtime error in setTag method
    • [change] - make #Honorific always a #Person #951
    • [new] - manually change conjugations/inflections from plugin #949
    • [new] - .adjectives().conjugate() method
    • [update] - dependencies
    Source code(tar.gz)
    Source code(zip)
  • 14.4.5(Aug 10, 2022)

  • 14.4.4(Aug 3, 2022)

  • 14.4.3(Aug 2, 2022)

  • 14.4.2(Jul 29, 2022)

  • 14.4.1(Jul 27, 2022)

    • [change] - improvements to negative-optional match logic - !foo?
    • [change] - support short sentences embedded in quotes+parentheses
    • [change] - faster sentence tokenizer
    • [change] - ° symbol is not punctuation
    • [new] - implement .swap() for comparative/superlative adjectives
    • [fix] - sentence.toFuture() conjugation rules
    • [update] - dependencies
    Source code(tar.gz)
    Source code(zip)
  • 14.4.0(Jul 2, 2022)

    • [change] - support root matches like '{walk}' work without doing .compute('root')
    • [change] - split numbers+units '12km' as contraction - #919
    • [new] - .lazy(txt, match) fast-scan method 1
    • [fix] - support apostrophes in lexicon #932
    • [fix] - support unTag property in sweep
    • [change] - keep sentence caches, when still valid
    • [change] - alias nlp.compile() to .buildTrie()
    • [fix] - tagging fixes
    • [update] - dependencies plugin-releases: dates, speed, de-compromise
    Source code(tar.gz)
    Source code(zip)
  • 14.3.1(Jun 15, 2022)

  • 14.3.0(Jun 8, 2022)

    • [fix] - unwanted logging in compromise/one
    • [fix] - dependency export path for react-native builds #928
    • [change] - split hyphenated words in match syntax 'foo-bar'
    • [change] - support 4-digit number-ranges (when not a phone number) plugin-releases: dates
    Source code(tar.gz)
    Source code(zip)
  • 14.2.1(Jun 3, 2022)

  • 14.2.0(Jun 1, 2022)

    • [fix] - speed improvements
    • [fix] - bug with fast-or possessive matches
    • [fix] - bug with slow-or end-matches
    • [change] - no-longer attempt 's contractions in compromise/one
    • [new] - flag novel tags in world.one.tagSet
    • [new] - .sweep() and nlp.buildNet() methods
    • [new] - some typescript support in plugins #918
    • [fix] - better unicode support with Unicode property escapes
    • [fix] - problems matching on cached documents
    • [fix] - typescript fixes
    • [fix] - suffix tagging issues
    • [fix] - uncached matches missing in .sweep()
    • [fix] - non-empty results when pointer is first repaired
    • [fix] - nouns().toPlural() fix for #921
    • [fix] - drop deprecated .subst() method internally
    • [new] - some support for .numbers().units() again #919
    Source code(tar.gz)
    Source code(zip)
  • 14.1.2(Apr 27, 2022)

    • [new] - add .harden() .soften() undocumented methods
    • [fix] - support pre-parsed matches in .has() .if() and .not()
    • [fix] - contraction OR match issue
    • [fix] - match-syntax min-max issue
    • [fix] - normalized printout of abbreviations
    • [update] - date plugin release
    • [update] - dependencies
    Source code(tar.gz)
    Source code(zip)
  • 14.1.1(Apr 15, 2022)

  • 14.1.0(Apr 12, 2022)

    • [fix] - client-side export format for plugins
    • [new] - more adjective transformation methods
    • [new] - emoji + emoticon tagger
    • [new] - case-sensitive match option - {caseSensitive:true}
    Source code(tar.gz)
    Source code(zip)
  • 14.0.0(Mar 22, 2022)

    Compromise is a javascript library that can do natural-language-processing tasks in the browser.

    v14 is a big release, a proud re-write. It took a lot of work. Thank you to the many individuals that have helped create it.

    Speed:

    v14 is much faster. Usually 2x faster. You should be able to parse twice as many documents, in the same time.

    Size:

    v14 has been split into 3 libraries, so you can choose how-much of the library you'd like to use. this is possible by switching to esmodules.

    import nlp from 'compromise/one' // 68kb
    import nlp from 'compromise/two' // 225kb
    import nlp from 'compromise/three' // 275kb
    

    in v14, we are dropping support for IE11 and node <12.

    Self-repairing pointers:

    we've finally found a quick way to support dynamic pointers to changing word data:

    let doc = nlp('the dog is nice')
    let sub = doc.match('is')
    doc.match('doc').insertBefore('brown')
    console.log(sub.text())
    // 'is'
    

    This works by using a fast-mode index lookup, with id-based error-correction.

    Included plugins:

    compromise-penn-tags compromise-plugin-scan and compromise-plugin-typeahead are now included in /one by default, which is great news.

    compromise-plugin-numbers and compromise-plugin-adjectives is included by default in /three

    New languages:

    We now support early-versions of french, spanish, and german

    Measured tagging suggestion:

    a user-given lexicon is less coercive - so adding your own words is less-dangerous:

    nlp('Dan Brown', { brown: 'Color' }).has('#Color') //false
    

    Replace wildcards:

    let doc = nlp('i am george and i live in France.')
    doc.replace('i am [#Person+] and i live in [.]', '$0 is from $1')
    doc.text()
    // 'george is from France'
    

    Root-replace:

    .swap() is a way to replace via a root-word - where declinations are automatically handled:

    let doc = nlp('i strolled downtown').compute('root')
    doc.swap('stroll', 'walk')
    // 'i walked downtown'
    

    New plugin scheme:

    We finally have a .plugin() scheme strong-enough to use internally. v14 is completely constructed via .plugin(). See the plugin documentation for details.

    New plugins

    see:

    as well as our existing compromise-speech and compromise-dates functionality


    Changelog

    • [breaking] - remove .parent() and .parents() chain - (use .all() instead)
    • [breaking] - remove @titleCase alias (use @isTitleCase)
    • [breaking] - remove '.get()' alias - use '.eq()'
    • [breaking] - remove .json(0) shorthand - use .json()[0]
    • [breaking] - remove .tagger() - use .compute('tagger')
    • [breaking] - remove .export() -> .load() - use .json() -> nlp(json)
    • [breaking] - remove nlp.clone()
    • [breaking] - remove .join() deprecated
    • [breaking] - remove .lists() deprecated
    • [breaking] - remove .segment() deprecated
    • [breaking] - remove .sententences().toParticiple() & .verbs().toParticiple()
    • [breaking] - remove .nouns().toPossessive() & .nouns().hasPlural()
    • [breaking] - remove array support in match methods - (use .match().match() instead)
    • [breaking] - refactor .out('freq') output format - (uses .compute('freq').terms().unique().json() instead)
    • [breaking] - change .json() result format for subsets
    • [change] merge re-used capture-group names in one match
    • [change] drop support for undocumented empty '.split()' methods - which used to split the parent
    • [change] subtle changes to .text('fmt') formats
    • [change] @hasContraction is no-longer secretly-greedy. use @hasContraction{2}
    • [change] .and() now does a set 'union' operation of results (no overlaps)
    • [change] bestTag is now .compute('tagRank')
    • [change] .sort() is no longer in-place (its now immutable)
    • [change] drop undocumented options param to .replaceWith() method
    • [change] add match-group as 2nd param to split methods
    • [change] remove #FutureTense tag - which is not really a thing in english
    • [change] .unique() no-longer mutates parent
    • [change] .normalize() inputs cleanup
    • [change] drop agreement parameters in .numbers() methods
    • [change] - less-magical money parsing - nlp('50 cents').money().get() is no-longer 0.5
    • [change] - .find() does not return undefined on an empty result anymore
    • [change] - fuzzy matches must now be wrapped in tildes, like ~this~
    • [new] .union(), .intersection(), .difference() and .complement() methods
    • [new] .confidence() method - approximate tagging confidence score for arbitrary selections
    • [new] .settle() - remove overlaps in matches
    • [new] .isDoc() - helper-method for comparing two views
    • [new] .none() - helper-method for returning an empty view of the document
    • [new] .toView() method - drop back to a normal Class instance
    • [new] .grow() .growLeft() and .growRight() methods
    • [new] add punctuation match support via pre/post params
    • [new] add ambiguous empty .map() state as 2nd param
    Source code(tar.gz)
    Source code(zip)
  • 13.11.4(Sep 20, 2021)

  • 13.11.3(Jun 21, 2021)

  • 13.11.2(May 5, 2021)

    • [fix] - verbphrase conjugation fixes
    • [fix] - verbphrase tagger fixes
    • [fix] - url tagging regex improvements (thanks Axay!) update deps plugin-releases: dates
    Source code(tar.gz)
    Source code(zip)
  • 13.11.1(Apr 18, 2021)

  • 13.11.0(Apr 15, 2021)

    • [change] - use babel default build target (drop ie11 polyfill)
    • [change] - dont compile esm build w/ babel anymore
    • [fix] - sentence conjugation fixes
    • [fix] - improvements to phrasal verbs
    • [change] - keep tokenization for some more dashed suffixes like 'snail-like' plugin-releases: dates, numbers, sentences
    Source code(tar.gz)
    Source code(zip)
  • 13.10.6(Apr 6, 2021)

  • 13.10.5(Mar 29, 2021)

  • 13.10.4(Mar 19, 2021)

Owner
spencer kelly
freelance javascripter, believer in the internet
spencer kelly
general natural language facilities for node

natural "Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing. Docum

null 10k Jan 9, 2023
Retext is a natural language processor powered by plugins part of the unified collective.

retext is a natural language processor powered by plugins part of the unified collective. Intro retext is an ecosystem of plugins for processing natur

retext 2.2k Dec 29, 2022
the 'natural satellite' subnet manager

deimos the 'natural satellite' subnet manager more just built against a grudge, because a spreadsheet is the worst way to store this kind of informati

Aaron Duce 2 Feb 7, 2022
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more

NLP.js If you're looking for the version 3 docs, you can find them here Version 3 "NLP.js" is a general natural language utility for nodejs. Currently

AXA 5.3k Dec 29, 2022
:robot: Natural language processing with JavaScript

classifier.js ?? An library for natural language processing with JavaScript Table of Contents Instalation Example of use Auto detection of numeric str

Nathan Firmo 90 Dec 12, 2022
A modest JavaScript framework for the HTML you already have

Stimulus A modest JavaScript framework for the HTML you already have Stimulus is a JavaScript framework with modest ambitions. It doesn't seek to take

Hotwire 11.7k Dec 29, 2022
Semantic is a UI component framework based around useful principles from natural language.

Semantic UI Semantic is a UI framework designed for theming. Key Features 50+ UI elements 3000 + CSS variables 3 Levels of variable inheritance (simil

Semantic Org 50.3k Dec 31, 2022
Semantic is a UI component framework based around useful principles from natural language.

Semantic UI Semantic is a UI framework designed for theming. Key Features 50+ UI elements 3000 + CSS variables 3 Levels of variable inheritance (simil

Semantic Org 50.3k Jan 3, 2023
general natural language facilities for node

natural "Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing. Docum

null 10k Jan 9, 2023
Retext is a natural language processor powered by plugins part of the unified collective.

retext is a natural language processor powered by plugins part of the unified collective. Intro retext is an ecosystem of plugins for processing natur

retext 2.2k Dec 29, 2022
Semantic is a UI component framework based around useful principles from natural language.

Semantic UI Semantic is a UI framework designed for theming. Key Features 50+ UI elements 3000 + CSS variables 3 Levels of variable inheritance (simil

Semantic Org 50.3k Jan 7, 2023
Parses natural language to date schedules.

DateParrot DateParrot parses natural language into a unified schedule object or ISO date. This package is in a very early stage and not yet production

Jörg Bayreuther 7 Aug 3, 2022
A port of the Processing visualization language to JavaScript.

⚠️ This project has been archived ⚠️ With the development of p5js and the API advances in Processing itself, as well as Processing.js itself having be

Processing.js 3.1k Jan 4, 2023
the 'natural satellite' subnet manager

deimos the 'natural satellite' subnet manager more just built against a grudge, because a spreadsheet is the worst way to store this kind of informati

Aaron Duce 2 Feb 7, 2022
Lightweight (< 2.3kB gzipped) and performant natural sorting of arrays and collections by differentiating between unicode characters, numbers, dates, etc.

fast-natural-order-by Lightweight (< 2.3kB gzipped) and performant natural sorting of arrays and collections by differentiating between unicode charac

Shelf 5 Nov 14, 2022
i18n-language.js is Simple i18n language with Vanilla Javascript

i18n-language.js i18n-language.js is Simple i18n language with Vanilla Javascript Write by Hyun SHIN Demo Page: http://i18n-language.s3-website.ap-nor

Shin Hyun 21 Jul 12, 2022
When a person that doesn't know how to create a programming language tries to create a programming language

Kochanowski Online Spróbuj Kochanowskiego bez konfiguracji projektu! https://mmusielik.xyz/projects/kochanowski Instalacja Stwórz nowy projekt przez n

Maciej Musielik 18 Dec 4, 2022
Write "hello world" in your native language, code "hello world" in your favorite programming language!

Hello World, All languages! ?? ?? Write "hello world" in your native language, code "hello world" in your favorite language! #hacktoberfest2022 How to

Carolina Calixto 6 Dec 13, 2022
Processing Foundation 18.6k Jan 1, 2023
danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

Danfojs: powerful javascript data analysis toolkit What is it? Danfo.js is a javascript package that provides fast, flexible, and expressive data stru

JSdata 4k Dec 29, 2022