general natural language facilities for node

Overview

natural

NPM version Node.js CI JavaScript Style Guide GitHub Super-Linter Coverage Status CII Best Practices Slack

"Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing. Documentation can be found here on Github Pages.

License

Copyright (c) 2011, 2012 Chris Umbel, Rob Ellis, Russell Mull

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

WordNet License

This license is available as the file LICENSE in any downloaded version of WordNet. WordNet 3.0 license: (Download)

WordNet Release 3.0 This software and database is being provided to you, the LICENSEE, by Princeton University under the following license. By obtaining, using and/or copying this software and database, you agree that you have read, understood, and will comply with these terms and conditions.: Permission to use, copy, modify and distribute this software and database and its documentation for any purpose and without fee or royalty is hereby granted, provided that you agree to comply with the following copyright notice and statements, including the disclaimer, and that the same appear on ALL copies of the software, database and documentation, including modifications that you make for internal use or for distribution. WordNet 3.0 Copyright 2006 by Princeton University. All rights reserved. THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. The name of Princeton University or Princeton may not be used in advertising or publicity pertaining to distribution of the software and/or database. Title to copyright in this software, database and any associated documentation shall at all times remain with Princeton University and LICENSEE agrees to preserve same.

Comments
  • Any plan on support for pure javascript on client side?

    Any plan on support for pure javascript on client side?

    I did not go too deep to understand if current version can be used on client side as it is. But given the training data needed I guess not.

    Is there any plan on making it available to use in browser rather than though node?

    Thanks for your efforts. This will be a useful resource in the future.

    Feature Request Browser 
    opened by amitamb 27
  • PoS Tagger for Brazillian Portuguese

    PoS Tagger for Brazillian Portuguese

    Hi there, I'm using this awesome lib in a chatbot project called hubot-natural and I'm having trouble to use the PoS Tagger feature to recognize Brazilian Portuguese. Is there any chance to have the Brills PoS tagger translated? Or is that something more complex than translating. If it helps funding, I should say I'm willing to pay for it's translation.

    Thks in advance.

    opened by diegodorgam 18
  • log4js causing issues with client-side usage

    log4js causing issues with client-side usage

    I'm trying to use natural on the client side and I'm getting errors on (Webpack) compilation like the following:

    Error in ./~/log4js/lib/appenders/clustered.js
    Module not found: 'cluster' in /Users/liadrian/Dev/naive-bayes/node_modules/log4js/lib/appenders
    
     @ ./~/log4js/lib/appenders/clustered.js 3:14-32
    
    Error in ./~/log4js/lib/appenders/gelf.js
    Module not found: 'dgram' in /Users/liadrian/Dev/naive-bayes/node_modules/log4js/lib/appenders
    
     @ ./~/log4js/lib/appenders/gelf.js 5:12-28
    
    Error in ./~/log4js/lib/appenders/hipchat.js
    Module not found: 'hipchat-notifier' in /Users/liadrian/Dev/naive-bayes/node_modules/log4js/lib/appenders
    
    ... and many more like the above
    

    The problem

    This is caused by a dependency on log4js, which for some reason has a lot of require statements in its appenders that are not declared in its own package.json file.

    An example:

    // node_modules/log4js/lib/appenders/clustered.js
    var cluster = require('cluster');
    

    But package.json from the log4js module does not include cluster as a dependency. As a result, when Webpack is trying to compile my app, an error is thrown for each of these delinquent requires (i.e. require statements for modules not listed in its package.json dependency list).

    The bottom line is: log4js is the problem.

    The solution

    After looking into this, I was able to find that log4js is only used within the Brill POS Tagger module. I recommend that we move away from log4js completely and replace those logging statements with normal console.log or console.warn statements instead.

    I have already attempted to do this, and can confirm that this will indeed solve the problem.

    I will submit a PR if the community agrees it is beneficial to the project.

    opened by adrianmcli 18
  • Make webworker-threads a peer-dependency

    Make webworker-threads a peer-dependency

    I'm running natural in Electron. Natural crashes when imported on electron, because it tries to use webworker-threads, which only works on node. It works fine when I delete the webworker-threads folder from node_modules folder. But this is tedious. As far as I know there is no way to specify in my package.json not to install the webworker-threads optional dependency of natural (this seems to exist only on command-line with --no-optional).

    A solution to that could be to set webworker-threads as peer-dependencies instead of optional-dependencies in the package.json. The user would have the choice to include webworker-threads in the case he wants it.

    Also, electron also has builtin webworkers, so it would be nice if natural can use them. But for now, it would be quite a low-hanging fruit not to use them at all.

    Solution Provided Improvement 
    opened by OoDeLally 16
  • Error while natural = require('natural')

    Error while natural = require('natural')

    Hi

    I am getting error while I require natural as below. var natural = require('natural');

    Error : TypeError: word.trim is not a function at loadDictionary (\node_modules\natural\lib\natural\stemm ers\indonesian\stemmer_id.js:136:21)

    In file stemmer_id.js function loadDictionary(){ var fs = require('fs'); var dirname = __dirname + "/../../../../data/kata-dasar.txt"; var fin = fs.readFileSync(dirname).toString().split("\n"); for(var i in fin){ var word = fin[i]; word = word.trim(); dictionary.push(word); } }

    error is => word = word.trim(); correction is => if (typeof word !== 'string') { word = word .toString(); }

    So that the function becomes

    function loadDictionary(){ var fs = require('fs'); var dirname = __dirname + "/../../../../data/kata-dasar.txt"; var fin = fs.readFileSync(dirname).toString().split("\n"); for(var i in fin){ var word = fin[i]; if (typeof word !== 'string') { word = word .toString(); } word = word.trim(); dictionary.push(word); } }

    Similar issue in file prefix_rules.js at line number 36

    Help/Questions 
    opened by praveshbalhara 15
  • Have to check array with longer suffixes first.

    Have to check array with longer suffixes first.

    https://github.com/NaturalNode/natural/blob/8071055f103f2d9fead395c5b823e0b247fb13a7/lib/natural/stemmers/porter_stemmer_es.js#L196

    You have to check first the upper array and then this one: Array('en', 'es', 'éis', 'emos')

    Solution Provided Improvement 
    opened by dmarman 15
  • Wordnet DB

    Wordnet DB

    I was reading in the README that the Wordnet database is no longer downloaded automatically, and I also see that there are no dependencies to WNdb.

    Although, may I suggest my branch of WNdb? As it is smaller and does not contain the Wordnet archive in the repository, but downloads them directly from Princeton's servers ?

    Feature Request Request For Feedback 
    opened by yanickrochon 15
  • Logistic Regression Classifier throws unable to find minimum exception

    Logistic Regression Classifier throws unable to find minimum exception

    LRC seems to throw this exception for unknown (to me) reason. Same code was working yesterday but it crash today when calling :

    classifier.train();
    

    It seems that removing random combination of

    addDocument() 
    

    calls helps. But it does not seem to be an issue of one particularly wrong document.

    for example :

    classifier.addDocument('London', 'London');
    classifier.addDocument('NewYork', 'NewYork');
    classifier.addDocument('Toronto', 'Toronto');
    

    fails but removing London it does work. On the other hand adding London but removing Toronto works too.

    In total I add about 100 documents to training set.

    Any suggestions what went wrong?

    Bug 
    opened by waglik 14
  • Jaro-Winkler Distance Algorithm Match Value Question

    Jaro-Winkler Distance Algorithm Match Value Question

    Firstly, a massive thank-you for putting this library together. It's been really helpful implementing some text similarity behaviour in an application I'm currently writing.

    Secondly, I just wanted to ask a question regarding the match value that is returned from the JaroWinklerDistance algorithm. On reading the wikipedia article I definitely got the impression that a comparison between two strings that are exactly the same should return a result of 1. I'm finding, however, that this isn't the case.

    In the application I'm currently writing, I'm using the functionality to order place names in order of their match strength against an original request. In the case of sydney, this works as expected:

    > natural.JaroWinklerDistance('sydney', 'sydney');
    1
    

    However, when I did a comparison for seddon against seddon, it returns less than 1:

    > natural.JaroWinklerDistance('seddon', 'seddon');
    0.8933333333333334
    

    I went on to do some other smaller tests in the node console to see if I could have the function produce a higher score for two different strings than two that were exactly the same, as realistically, this is the case that I'm worried might occur. After a little bit of playing around I found that I could:

    > natural.JaroWinklerDistance('abc', 'abc');
    0.8666666666666666
    > natural.JaroWinklerDistance('abcd', 'abcd');
    1
    > natural.JaroWinklerDistance('abcd', 'abc');
    0.9416666666666667
    

    Once I grok the algorithm, I'll have a look at forking the code and seeing if I can work out where this is happening, but I thought I'd raise an issue first to see if it was something that someone else knew how to fix quickly and simply.

    Thanks again for your efforts on the library.

    Cheers, Damon.

    opened by DamonOehlman 13
  • LogisticRegressionClassifier: unhandled infinite loop

    LogisticRegressionClassifier: unhandled infinite loop

    If two classes contain exactly the same documents, the train() method will block infinitely. I created the following code snippet which leads to this infinite loop:

    let natural = require('natural')
    
    var classifier = new natural.LogisticRegressionClassifier();
    
    classifier.addDocument('test', 'class1');
    classifier.addDocument('test', 'class2');
    
    classifier.train(); // infinite loop
    
    console.log('will not be logged')
    

    However for 3 or more classes sharing the same document works:

    let natural = require('natural')
    
    var classifier = new natural.LogisticRegressionClassifier();
    
    classifier.addDocument('test', 'class1');
    classifier.addDocument('test', 'class2');
    classifier.addDocument('test', 'class3');
    
    classifier.train();
    
    console.log(classifier.getClassifications('test'))
    
    Bug 
    opened by Finkes 10
  • Release a new version

    Release a new version

    The last release on NPM was 11 months ago by @chrisumbel. Since then, there have been more than 10 pull requests merged into master. Isn't it about time that someone (who has the permissions) release a new version so we can use all the new stuff that was merged in?

    I don't see the point of my last PR being merged if no one is going to use it.

    cc @kkoch986 and @silentrob

    opened by adrianmcli 10
  • Bug report in SequenceTokenizerNew

    Bug report in SequenceTokenizerNew

    SequenceTokenizerNew fails on following call:

    sentenceTokenizer.tokenize('"All ticketed passengers should now be in the Blue Concourse sleep lounge. Make sure your validation papers are in order. Thank you". The upstairs lounge was not at all grungy.') (quote from "The Jaunt" by Stephen King)

    with following message:

    {
        "message": "Expected [ \\t\\n\\r.?!] or [)\\]}\"'`’] but \"M\" found.",
        "expected": [
            {
                "type": "class",
                "parts": [
                    " ",
                    "\t",
                    "\n",
                    "\r",
                    ".",
                    "?",
                    "!"
                ],
                "inverted": false,
                "ignoreCase": false
            },
            {
                "type": "class",
                "parts": [
                    ")",
                    "]",
                    "}",
                    "\"",
                    "'",
                    "`",
                    "’"
                ],
                "inverted": false,
                "ignoreCase": false
            }
        ],
        "found": "M",
        "location": {
            "start": {
                "offset": 75,
                "line": 1,
                "column": 76
            },
            "end": {
                "offset": 76,
                "line": 1,
                "column": 77
            }
        },
        "name": "SyntaxError"
    }
    
    Bug 
    opened by Jabher 3
  • NGrams doesnt support words with hyphen and slash in English

    NGrams doesnt support words with hyphen and slash in English

    There are a few words in English that contain hyphen or slash

    Example:

    • image-based
    • text-based
    • links/CTA

    It would be great if Natural could manage these cases.

    let text = "links text-based opposed image-based links/CTA’s"
    var NGrams = natural.NGrams;
    const T = natural.AggressiveTokenizer;
    const tokenizer = new T();
    NGrams.setTokenizer(tokenizer);
    console.log(NGrams.ngrams(text, 1));
    

    Output: [["links"], ["text"], ["based"], ["opposed"], ["image"], ["based"], ["links"], ["CTA"], ["s"]]

    opened by sam68740 0
  • SentenceTokenizer doesn't split by three dots symbol

    SentenceTokenizer doesn't split by three dots symbol

    SentenceTokenizer doesn't split by this symbol . Here is the example of the content.

    const { SentenceTokenizerNew } = require('natural')
    const content = `We’re heading for a catastrophic global temperature rise… Fires are blazing from the Amazon to the Arctic`
    const tokenizer = new SentenceTokenizer()
    tokenizer.tokenize(content) // returns one sentence, while two is expected
    
    opened by satyrius 0
  • Can we add

    Can we add "y'all" to the normalizer conversion table?

    In my app i'm finding "y'all" to be a somewhat common contraction that isn't accounted for in normalizer.js: https://github.com/NaturalNode/natural/blob/master/lib/natural/normalizers/normalizer.js#L34

    opened by MeanwhileMedia 3
  • natural.Tfidf.listTerms works incorrectly for custom-generated tokens (those passed as array to addDocument(...))

    natural.Tfidf.listTerms works incorrectly for custom-generated tokens (those passed as array to addDocument(...))

    natural.Tfidf.addDocument accepts either a string or an array of pre-tokenized texts. When a document is added using an array of tokens, listTerms still applies the tokenization to the individual document tokens when computing the tfidf score, resulting in a tfidf score of 0, even though the tf and idf scores are > 0.

    (natural version: ^5.1.11) An example:

    > var natural = require('natural')
    > var tfidf = new natural.TfIdf()
    > tfidf.listTerms(0)
    [
      {
        term: 'domain',
        tf: 1,
        idf: 0.3068528194400547,
        tfidf: 0.3068528194400547
      },
      { term: 'google.com', tf: 1, idf: 0.3068528194400547, tfidf: 0 }
    ]
    

    The second document should have a tfidf score of 0.306... (1 * .0.3068..), but it is 0.

    The fix is simple.. Update the listTerms(...) function to pass an array in tfidf: _this.tfidf(term, d) call (change to: tfidf: _this.tfidf([term], d) (line 174 here: https://github.com/NaturalNode/natural/blob/master/lib/natural/tfidf/tfidf.js ).

    Thanks.

    opened by senatet 0
Releases(6.1.2)
Owner
null
Retext is a natural language processor powered by plugins part of the unified collective.

retext is a natural language processor powered by plugins part of the unified collective. Intro retext is an ecosystem of plugins for processing natur

retext 2.2k Dec 29, 2022
:robot: Natural language processing with JavaScript

classifier.js ?? An library for natural language processing with JavaScript Table of Contents Instalation Example of use Auto detection of numeric str

Nathan Firmo 90 Dec 12, 2022
the 'natural satellite' subnet manager

deimos the 'natural satellite' subnet manager more just built against a grudge, because a spreadsheet is the worst way to store this kind of informati

Aaron Duce 2 Feb 7, 2022
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more

NLP.js If you're looking for the version 3 docs, you can find them here Version 3 "NLP.js" is a general natural language utility for nodejs. Currently

AXA 5.3k Dec 29, 2022
architecture-free neural network library for node.js and the browser

Synaptic Important: Synaptic 2.x is in stage of discussion now! Feel free to participate Synaptic is a javascript neural network library for node.js a

Juan Cazala 6.9k Dec 27, 2022
Machine-learning for Node.js

Limdu.js Limdu is a machine-learning framework for Node.js. It supports multi-label classification, online learning, and real-time classification. The

Erel Segal-Halevi 1k Dec 16, 2022
Run XGBoost model and make predictions in Node.js

XGBoost-Node eXtreme Gradient Boosting Package in Node.js XGBoost-Node is a Node.js interface of XGBoost. XGBoost is a library from DMLC. It is design

暖房 / nuan.io 31 Nov 15, 2022
Machine Learning library for node.js

shaman Machine Learning library for node.js Linear Regression shaman supports both simple linear regression and multiple linear regression. It support

Luc Castera 108 Feb 26, 2021
Powerful Neural Network for Node.js

NeuralN Powerful Neural Network for Node.js NeuralN is a C++ Neural Network library for Node.js with multiple advantages compared to existing solution

TOTEMS::Tech 275 Dec 15, 2022
Bayesian bandit implementation for Node and the browser.

#bayesian-bandit.js This is an adaptation of the Bayesian Bandit code from Probabilistic Programming and Bayesian Methods for Hackers, specifically d3

null 44 Aug 19, 2022
Latent Dirichlet allocation (LDA) topic modeling in javascript for node.js.

LDA Latent Dirichlet allocation (LDA) topic modeling in javascript for node.js. LDA is a machine learning algorithm that extracts topics and their rel

Kory Becker 279 Nov 4, 2022
Simple Javascript implementation of the k-means algorithm, for node.js and the browser

#kMeans.js Simple Javascript implementation of the k-means algorithm, for node.js and the browser ##Installation npm install kmeans-js ##Example (JS)

Emil Bay 44 Aug 19, 2022
FANN (Fast Artificial Neural Network Library) bindings for Node.js

node-fann node-fann is a FANN bindings for Node.js. FANN (Fast Artificial Neural Network Library) is a free open source neural network library, which

Alex Kocharin 186 Oct 31, 2022
Clustering algorithms implemented in Javascript for Node.js and the browser

Clustering.js ####Clustering algorithms implemented in Javascript for Node.js and the browser Examples License Copyright (c) 2013 Emil Bay github@tixz

Emil Bay 29 Aug 19, 2022
This tool allows you to draw up plans for facilities from Foxhole's new Inferno update. It takes power and resource needs into account to help you efficiently design your facilities.

Foxhole Facility Planner This tool allows you to draw up plans for facilities from Foxhole's new Inferno update. It takes power and resource needs int

Brandon Ray 23 Dec 23, 2022
Semantic is a UI component framework based around useful principles from natural language.

Semantic UI Semantic is a UI framework designed for theming. Key Features 50+ UI elements 3000 + CSS variables 3 Levels of variable inheritance (simil

Semantic Org 50.3k Dec 31, 2022
Semantic is a UI component framework based around useful principles from natural language.

Semantic UI Semantic is a UI framework designed for theming. Key Features 50+ UI elements 3000 + CSS variables 3 Levels of variable inheritance (simil

Semantic Org 50.3k Jan 3, 2023
Modest natural-language processing

compromise modest natural language processing npm install compromise by Spencer Kelly and many contributors isn't it weird how we can write text, but

spencer kelly 10.4k Dec 30, 2022
Retext is a natural language processor powered by plugins part of the unified collective.

retext is a natural language processor powered by plugins part of the unified collective. Intro retext is an ecosystem of plugins for processing natur

retext 2.2k Dec 29, 2022
Semantic is a UI component framework based around useful principles from natural language.

Semantic UI Semantic is a UI framework designed for theming. Key Features 50+ UI elements 3000 + CSS variables 3 Levels of variable inheritance (simil

Semantic Org 50.3k Jan 7, 2023