general natural language facilities for node

Last update: Jan 9, 2023

Related tags

Overview

natural

"Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing. Documentation can be found here on Github Pages.

License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

WordNet License

This license is available as the file LICENSE in any downloaded version of WordNet. WordNet 3.0 license: (Download)

WordNet Release 3.0 This software and database is being provided to you, the LICENSEE, by Princeton University under the following license. By obtaining, using and/or copying this software and database, you agree that you have read, understood, and will comply with these terms and conditions.: Permission to use, copy, modify and distribute this software and database and its documentation for any purpose and without fee or royalty is hereby granted, provided that you agree to comply with the following copyright notice and statements, including the disclaimer, and that the same appear on ALL copies of the software, database and documentation, including modifications that you make for internal use or for distribution. WordNet 3.0 Copyright 2006 by Princeton University. All rights reserved. THIS SOFTWARE AND DATABASE IS PROVIDED "AS IS" AND PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PRINCETON UNIVERSITY MAKES NO REPRESENTATIONS OR WARRANTIES OF MERCHANT- ABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE LICENSED SOFTWARE, DATABASE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. The name of Princeton University or Princeton may not be used in advertising or publicity pertaining to distribution of the software and/or database. Title to copyright in this software, database and any associated documentation shall at all times remain with Princeton University and LICENSEE agrees to preserve same.

Comments

Any plan on support for pure javascript on client side?

I did not go too deep to understand if current version can be used on client side as it is. But given the training data needed I guess not.

Is there any plan on making it available to use in browser rather than though node?

Thanks for your efforts. This will be a useful resource in the future.
Feature Request Browser

opened by amitamb 27
PoS Tagger for Brazillian Portuguese

Hi there, I'm using this awesome lib in a chatbot project called hubot-natural and I'm having trouble to use the PoS Tagger feature to recognize Brazilian Portuguese. Is there any chance to have the Brills PoS tagger translated? Or is that something more complex than translating. If it helps funding, I should say I'm willing to pay for it's translation.

Thks in advance.

opened by diegodorgam 18
log4js causing issues with client-side usage
I'm trying to use natural on the client side and I'm getting errors on (Webpack) compilation like the following:

Error in ./~/log4js/lib/appenders/clustered.js Module not found: 'cluster' in /Users/liadrian/Dev/naive-bayes/node_modules/log4js/lib/appenders @ ./~/log4js/lib/appenders/clustered.js 3:14-32 Error in ./~/log4js/lib/appenders/gelf.js Module not found: 'dgram' in /Users/liadrian/Dev/naive-bayes/node_modules/log4js/lib/appenders @ ./~/log4js/lib/appenders/gelf.js 5:12-28 Error in ./~/log4js/lib/appenders/hipchat.js Module not found: 'hipchat-notifier' in /Users/liadrian/Dev/naive-bayes/node_modules/log4js/lib/appenders ... and many more like the above

The problem

This is caused by a dependency on log4js, which for some reason has a lot of require statements in its appenders that are not declared in its own package.json file.

An example:

// node_modules/log4js/lib/appenders/clustered.js var cluster = require('cluster');

But package.json from the log4js module does not include cluster as a dependency. As a result, when Webpack is trying to compile my app, an error is thrown for each of these delinquent requires (i.e. require statements for modules not listed in its package.json dependency list).

The bottom line is: log4js is the problem.

The solution

After looking into this, I was able to find that log4js is only used within the Brill POS Tagger module. I recommend that we move away from log4js completely and replace those logging statements with normal console.log or console.warn statements instead.

I have already attempted to do this, and can confirm that this will indeed solve the problem.

I will submit a PR if the community agrees it is beneficial to the project.
opened by adrianmcli 18
Make webworker-threads a peer-dependency

I'm running natural in Electron. Natural crashes when imported on electron, because it tries to use webworker-threads, which only works on node. It works fine when I delete the webworker-threads folder from node_modules folder. But this is tedious. As far as I know there is no way to specify in my package.json not to install the webworker-threads optional dependency of natural (this seems to exist only on command-line with --no-optional).

A solution to that could be to set webworker-threads as peer-dependencies instead of optional-dependencies in the package.json. The user would have the choice to include webworker-threads in the case he wants it.

Also, electron also has builtin webworkers, so it would be nice if natural can use them. But for now, it would be quite a low-hanging fruit not to use them at all.
Solution Provided Improvement

opened by OoDeLally 16
Error while natural = require('natural')

Hi

I am getting error while I require natural as below. var natural = require('natural');

Error : TypeError: word.trim is not a function at loadDictionary (\node_modules\natural\lib\natural\stemm ers\indonesian\stemmer_id.js:136:21)

In file stemmer_id.js function loadDictionary(){ var fs = require('fs'); var dirname = __dirname + "/../../../../data/kata-dasar.txt"; var fin = fs.readFileSync(dirname).toString().split("\n"); for(var i in fin){ var word = fin[i]; word = word.trim(); dictionary.push(word); } }

error is => word = word.trim(); correction is => if (typeof word !== 'string') { word = word .toString(); }

So that the function becomes

function loadDictionary(){ var fs = require('fs'); var dirname = __dirname + "/../../../../data/kata-dasar.txt"; var fin = fs.readFileSync(dirname).toString().split("\n"); for(var i in fin){ var word = fin[i]; if (typeof word !== 'string') { word = word .toString(); } word = word.trim(); dictionary.push(word); } }

Similar issue in file prefix_rules.js at line number 36
Help/Questions

opened by praveshbalhara 15
Have to check array with longer suffixes first.

https://github.com/NaturalNode/natural/blob/8071055f103f2d9fead395c5b823e0b247fb13a7/lib/natural/stemmers/porter_stemmer_es.js#L196

You have to check first the upper array and then this one: Array('en', 'es', 'éis', 'emos')
Solution Provided Improvement

opened by dmarman 15
Wordnet DB

I was reading in the README that the Wordnet database is no longer downloaded automatically, and I also see that there are no dependencies to WNdb.

Although, may I suggest my branch of WNdb? As it is smaller and does not contain the Wordnet archive in the repository, but downloads them directly from Princeton's servers ?
Feature Request Request For Feedback

opened by yanickrochon 15
Logistic Regression Classifier throws unable to find minimum exception
LRC seems to throw this exception for unknown (to me) reason. Same code was working yesterday but it crash today when calling :

classifier.train();

It seems that removing random combination of

addDocument()

calls helps. But it does not seem to be an issue of one particularly wrong document.

for example :

classifier.addDocument('London', 'London'); classifier.addDocument('NewYork', 'NewYork'); classifier.addDocument('Toronto', 'Toronto');

fails but removing London it does work. On the other hand adding London but removing Toronto works too.

In total I add about 100 documents to training set.

Any suggestions what went wrong?
Bug
opened by waglik 14
Jaro-Winkler Distance Algorithm Match Value Question
Firstly, a massive thank-you for putting this library together. It's been really helpful implementing some text similarity behaviour in an application I'm currently writing.

Secondly, I just wanted to ask a question regarding the match value that is returned from the JaroWinklerDistance algorithm. On reading the wikipedia article I definitely got the impression that a comparison between two strings that are exactly the same should return a result of 1. I'm finding, however, that this isn't the case.

In the application I'm currently writing, I'm using the functionality to order place names in order of their match strength against an original request. In the case of sydney, this works as expected:

> natural.JaroWinklerDistance('sydney', 'sydney'); 1

However, when I did a comparison for seddon against seddon, it returns less than 1:

> natural.JaroWinklerDistance('seddon', 'seddon'); 0.8933333333333334

I went on to do some other smaller tests in the node console to see if I could have the function produce a higher score for two different strings than two that were exactly the same, as realistically, this is the case that I'm worried might occur. After a little bit of playing around I found that I could:

> natural.JaroWinklerDistance('abc', 'abc'); 0.8666666666666666 > natural.JaroWinklerDistance('abcd', 'abcd'); 1 > natural.JaroWinklerDistance('abcd', 'abc'); 0.9416666666666667

Once I grok the algorithm, I'll have a look at forking the code and seeing if I can work out where this is happening, but I thought I'd raise an issue first to see if it was something that someone else knew how to fix quickly and simply.

Thanks again for your efforts on the library.

Cheers, Damon.
opened by DamonOehlman 13

LogisticRegressionClassifier: unhandled infinite loop

If two classes contain exactly the same documents, the train() method will block infinitely. I created the following code snippet which leads to this infinite loop:

let natural = require('natural')

var classifier = new natural.LogisticRegressionClassifier();

classifier.addDocument('test', 'class1');
classifier.addDocument('test', 'class2');

classifier.train(); // infinite loop

console.log('will not be logged')

However for 3 or more classes sharing the same document works:

let natural = require('natural')

var classifier = new natural.LogisticRegressionClassifier();

classifier.addDocument('test', 'class1');
classifier.addDocument('test', 'class2');
classifier.addDocument('test', 'class3');

classifier.train();

console.log(classifier.getClassifications('test'))

Bug

opened by Finkes 10

Release a new version

The last release on NPM was 11 months ago by @chrisumbel. Since then, there have been more than 10 pull requests merged into master. Isn't it about time that someone (who has the permissions) release a new version so we can use all the new stuff that was merged in?

I don't see the point of my last PR being merged if no one is going to use it.

cc @kkoch986 and @silentrob

opened by adrianmcli 10

Bug report in SequenceTokenizerNew

SequenceTokenizerNew fails on following call:

sentenceTokenizer.tokenize('"All ticketed passengers should now be in the Blue Concourse sleep lounge. Make sure your validation papers are in order. Thank you". The upstairs lounge was not at all grungy.') (quote from "The Jaunt" by Stephen King)

with following message:

{
    "message": "Expected [ \\t\\n\\r.?!] or [)\\]}\"'`’] but \"M\" found.",
    "expected": [
        {
            "type": "class",
            "parts": [
                " ",
                "\t",
                "\n",
                "\r",
                ".",
                "?",
                "!"
            ],
            "inverted": false,
            "ignoreCase": false
        },
        {
            "type": "class",
            "parts": [
                ")",
                "]",
                "}",
                "\"",
                "'",
                "`",
                "’"
            ],
            "inverted": false,
            "ignoreCase": false
        }
    ],
    "found": "M",
    "location": {
        "start": {
            "offset": 75,
            "line": 1,
            "column": 76
        },
        "end": {
            "offset": 76,
            "line": 1,
            "column": 77
        }
    },
    "name": "SyntaxError"
}

Bug

opened by Jabher 3

NGrams doesnt support words with hyphen and slash in English
There are a few words in English that contain hyphen or slash

Example:

image-based

text-based

links/CTA

It would be great if Natural could manage these cases.

let text = "links text-based opposed image-based links/CTA’s" var NGrams = natural.NGrams; const T = natural.AggressiveTokenizer; const tokenizer = new T(); NGrams.setTokenizer(tokenizer); console.log(NGrams.ngrams(text, 1));

Output: [["links"], ["text"], ["based"], ["opposed"], ["image"], ["based"], ["links"], ["CTA"], ["s"]]
opened by sam68740 0

SentenceTokenizer doesn't split by three dots symbol

SentenceTokenizer doesn't split by this symbol …. Here is the example of the content.

const { SentenceTokenizerNew } = require('natural')
const content = `We’re heading for a catastrophic global temperature rise… Fires are blazing from the Amazon to the Arctic`
const tokenizer = new SentenceTokenizer()
tokenizer.tokenize(content) // returns one sentence, while two is expected

opened by satyrius 0

Can we add "y'all" to the normalizer conversion table?

In my app i'm finding "y'all" to be a somewhat common contraction that isn't accounted for in normalizer.js: https://github.com/NaturalNode/natural/blob/master/lib/natural/normalizers/normalizer.js#L34

opened by MeanwhileMedia 3
natural.Tfidf.listTerms works incorrectly for custom-generated tokens (those passed as array to addDocument(...))
natural.Tfidf.addDocument accepts either a string or an array of pre-tokenized texts. When a document is added using an array of tokens, listTerms still applies the tokenization to the individual document tokens when computing the tfidf score, resulting in a tfidf score of 0, even though the tf and idf scores are > 0.

(natural version: ^5.1.11) An example:

> var natural = require('natural') > var tfidf = new natural.TfIdf() > tfidf.listTerms(0) [ { term: 'domain', tf: 1, idf: 0.3068528194400547, tfidf: 0.3068528194400547 }, { term: 'google.com', tf: 1, idf: 0.3068528194400547, tfidf: 0 } ]

The second document should have a tfidf score of 0.306... (1 * .0.3068..), but it is 0.

The fix is simple.. Update the listTerms(...) function to pass an array in tfidf: _this.tfidf(term, d) call (change to: tfidf: _this.tfidf([term], d) (line 174 here: https://github.com/NaturalNode/natural/blob/master/lib/natural/tfidf/tfidf.js ).

Thanks.
opened by senatet 0

Releases(6.1.2)

6.1.2(Jan 3, 2023)

Source code(tar.gz)
Source code(zip)
6.1.1(Jan 2, 2023)

Source code(tar.gz)
Source code(zip)
6.1.0(Jan 2, 2023)

Source code(tar.gz)
Source code(zip)
6.0.1(Dec 30, 2022)

Source code(tar.gz)
Source code(zip)
6.0.0(Dec 29, 2022)

Typescript types are added to the library to enable the use with Typescript projects.
Source code(tar.gz)
Source code(zip)
5.2.4(Dec 5, 2022)

Source code(tar.gz)
Source code(zip)
5.2.3(Jul 14, 2022)

Source code(tar.gz)
Source code(zip)
5.2.2(Apr 29, 2022)

Source code(tar.gz)
Source code(zip)
5.2.1(Apr 28, 2022)

Source code(tar.gz)
Source code(zip)
5.2.0(Apr 27, 2022)

Source code(tar.gz)
Source code(zip)
5.1.13(Jan 3, 2022)

Source code(tar.gz)
Source code(zip)
5.1.12(Jan 3, 2022)

Source code(tar.gz)
Source code(zip)
5.1.11(Nov 11, 2021)

Source code(tar.gz)
Source code(zip)
5.1.10(Nov 1, 2021)

Source code(tar.gz)
Source code(zip)
5.1.1(Sep 17, 2021)

Updated license
Source code(tar.gz)
Source code(zip)
5.1.0(Aug 30, 2021)

Source code(tar.gz)
Source code(zip)
5.0.5(Aug 23, 2021)

Repairs a bug in one of the index files. Thanks to @aleclarson.
Source code(tar.gz)
Source code(zip)
5.0.4(Jul 26, 2021)

Source code(tar.gz)
Source code(zip)
5.0.3(Apr 1, 2021)

Source code(tar.gz)
Source code(zip)
5.0.2(Mar 30, 2021)

Source code(tar.gz)
Source code(zip)
5.0.1(Mar 28, 2021)

Source code(tar.gz)
Source code(zip)
5.0.0(Mar 26, 2021)

Source code(tar.gz)
Source code(zip)
4.0.4(Mar 25, 2021)

Source code(tar.gz)
Source code(zip)
4.0.3(Mar 24, 2021)

Repair sentence tokenizer for braces and quotes.
Source code(tar.gz)
Source code(zip)
4.0.2(Mar 10, 2021)

Source code(tar.gz)
Source code(zip)
4.0.1(Mar 10, 2021)

Source code(tar.gz)
Source code(zip)
4.0.0(Feb 17, 2021)

The Japanese stemmer now returns a newly created object when imported.
Source code(tar.gz)
Source code(zip)
3.0.3(Feb 17, 2021)

Source code(tar.gz)
Source code(zip)
3.0.1(Feb 8, 2021)

Created index files per module which are then included in the main index file.
Source code(tar.gz)
Source code(zip)
3.0.0(Feb 3, 2021)
This is a major release that focuses on code quality. Code has been polished to conform to Standard JS. In addition, code has been tested with jscpd.

As a consequence major changes have been made:

All methods that were attached to native types have been removed. For instance the attach method has been removed from the stemmers and tokenizers.

Some methods in lib/natural/index (the API of natural) are renamed to become camel case

Documentation has to be updated for these changes. Will do that soon.
Source code(tar.gz)
Source code(zip)

Owner

GitHub

Retext is a natural language processor powered by plugins part of the unified collective.

retext is a natural language processor powered by plugins part of the unified collective. Intro retext is an ecosystem of plugins for processing natur

2.2k Dec 29, 2022

:robot: Natural language processing with JavaScript

classifier.js ?? An library for natural language processing with JavaScript Table of Contents Instalation Example of use Auto detection of numeric str

90 Dec 12, 2022

the 'natural satellite' subnet manager

deimos the 'natural satellite' subnet manager more just built against a grudge, because a spreadsheet is the worst way to store this kind of informati

2 Feb 7, 2022

An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more

NLP.js If you're looking for the version 3 docs, you can find them here Version 3 "NLP.js" is a general natural language utility for nodejs. Currently

5.3k Dec 29, 2022

architecture-free neural network library for node.js and the browser

Synaptic Important: Synaptic 2.x is in stage of discussion now! Feel free to participate Synaptic is a javascript neural network library for node.js a

6.9k Dec 27, 2022

Machine-learning for Node.js

Limdu.js Limdu is a machine-learning framework for Node.js. It supports multi-label classification, online learning, and real-time classification. The

1k Dec 16, 2022

Run XGBoost model and make predictions in Node.js

XGBoost-Node eXtreme Gradient Boosting Package in Node.js XGBoost-Node is a Node.js interface of XGBoost. XGBoost is a library from DMLC. It is design

31 Nov 15, 2022

Machine Learning library for node.js

shaman Machine Learning library for node.js Linear Regression shaman supports both simple linear regression and multiple linear regression. It support

108 Feb 26, 2021

Powerful Neural Network for Node.js

NeuralN Powerful Neural Network for Node.js NeuralN is a C++ Neural Network library for Node.js with multiple advantages compared to existing solution

275 Dec 15, 2022

Bayesian bandit implementation for Node and the browser.

#bayesian-bandit.js This is an adaptation of the Bayesian Bandit code from Probabilistic Programming and Bayesian Methods for Hackers, specifically d3

44 Aug 19, 2022

Latent Dirichlet allocation (LDA) topic modeling in javascript for node.js.

LDA Latent Dirichlet allocation (LDA) topic modeling in javascript for node.js. LDA is a machine learning algorithm that extracts topics and their rel

279 Nov 4, 2022

Simple Javascript implementation of the k-means algorithm, for node.js and the browser

#kMeans.js Simple Javascript implementation of the k-means algorithm, for node.js and the browser ##Installation npm install kmeans-js ##Example (JS)

44 Aug 19, 2022

FANN (Fast Artificial Neural Network Library) bindings for Node.js

node-fann node-fann is a FANN bindings for Node.js. FANN (Fast Artificial Neural Network Library) is a free open source neural network library, which

186 Oct 31, 2022

Clustering algorithms implemented in Javascript for Node.js and the browser

29 Aug 19, 2022

This tool allows you to draw up plans for facilities from Foxhole's new Inferno update. It takes power and resource needs into account to help you efficiently design your facilities.

Foxhole Facility Planner This tool allows you to draw up plans for facilities from Foxhole's new Inferno update. It takes power and resource needs int

23 Dec 23, 2022

general natural language facilities for node

Related tags

Overview

natural

License

WordNet License

Comments

The problem

The solution

Releases(6.1.2)

6.1.2(Jan 3, 2023)

6.1.1(Jan 2, 2023)

6.1.0(Jan 2, 2023)

6.0.1(Dec 30, 2022)

6.0.0(Dec 29, 2022)

5.2.4(Dec 5, 2022)

5.2.3(Jul 14, 2022)

5.2.2(Apr 29, 2022)

5.2.1(Apr 28, 2022)

5.2.0(Apr 27, 2022)

5.1.13(Jan 3, 2022)

5.1.12(Jan 3, 2022)

5.1.11(Nov 11, 2021)

5.1.10(Nov 1, 2021)

5.1.1(Sep 17, 2021)

5.1.0(Aug 30, 2021)

5.0.5(Aug 23, 2021)

5.0.4(Jul 26, 2021)

5.0.3(Apr 1, 2021)

5.0.2(Mar 30, 2021)

5.0.1(Mar 28, 2021)

5.0.0(Mar 26, 2021)

4.0.4(Mar 25, 2021)

4.0.3(Mar 24, 2021)

4.0.2(Mar 10, 2021)

4.0.1(Mar 10, 2021)

4.0.0(Feb 17, 2021)

3.0.3(Feb 17, 2021)

3.0.1(Feb 8, 2021)

3.0.0(Feb 3, 2021)

Owner

Retext is a natural language processor powered by plugins part of the unified collective.

:robot: Natural language processing with JavaScript

the 'natural satellite' subnet manager

An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more

architecture-free neural network library for node.js and the browser

Machine-learning for Node.js

Run XGBoost model and make predictions in Node.js

Machine Learning library for node.js

Powerful Neural Network for Node.js

Bayesian bandit implementation for Node and the browser.

Latent Dirichlet allocation (LDA) topic modeling in javascript for node.js.

Simple Javascript implementation of the k-means algorithm, for node.js and the browser

FANN (Fast Artificial Neural Network Library) bindings for Node.js

Clustering algorithms implemented in Javascript for Node.js and the browser

This tool allows you to draw up plans for facilities from Foxhole's new Inferno update. It takes power and resource needs into account to help you efficiently design your facilities.

Semantic is a UI component framework based around useful principles from natural language.

Semantic is a UI component framework based around useful principles from natural language.

Modest natural-language processing

Retext is a natural language processor powered by plugins part of the unified collective.

Semantic is a UI component framework based around useful principles from natural language.