Find root-affix combinations of English words.

Overview

Find root-affixes of word

查找英语单词的词根词缀组合。

查找规则

  1. 直接返回小于等于长度为 2 的单词
  2. 先获取单词原形,还原复数、比较级、过去式等单词形式
  3. 再通过穷举获得所有的词根词缀组合
  4. 然后去除不完整的组合,即该拼写组合 != 单词
  5. 在所有符合条件的组合中,比较所有组合的长度,获取最小长度的组合,如果有多个,则记录多个最小长度组合
  6. 在所有的最小长度组合中,优先获取单词根组合
  7. 在剩余的组合中,取词缀和词根的长度差值最小的组合
  8. 返回所有符合条件的组合

问题

  1. 缺少必要的词根词缀,导致部分单词查找不到组合,比如 wolf, strong
  2. 不准确
  • 查找规则可能导致部分正确组合被过滤了
  • 问题1的延伸
  • 两个单词的组合词,比如 honeyguide
  1. 多组合
  • 依然有可能返回多种词根词缀组合,比如 agitation

改进

首先想加入词干提取(Snowball),用词干提取器代替单词原型表,优点:减少大量的无效组合,减少穷举次数,降低单词原型表带来的不确定性。

缺点:不准确,比如 wolves -> wolv,应为 wolf, went -> went, 应为 go。暂时还没有使用过其他词干提取器,不知道 NLTKWordNet 会不会更好一点。其次是使原词与词干失去了关联属性,不过这点可以用单词原形表补充,如果有必要的话。

然后整理最简单词表,比如 homeworker,即是由 homeworker 两个单词组成,查找词根词缀时,拆分为两个单词查找,应更为妥当。

第三,添加词根词缀表示规则。比如,^ 表示该词根词缀只能出现在单词开头,$ 表示只能出现在单词结尾,其他详见 root_affix_rule.csv

最后是对词根词缀进行打分,这是一个粗糙的想法,未经过验证,而且如何打分,也没有什么好的思路。

测试

项目测试地址:Find root-affixes 🍂 of word

遇到错误或可能错误的词根词缀组合,可以通过页面右下角的 Bad root-affixes, report to GitHub 链接发送 Issue 到 Github issues

资源

其中参考了以下开源项目的资源:

感谢!

LICENSE

MIT License

Copyright (c) 2022 excing

You might also like...

Types generator will help user to create TS types from JSON. Just paste your single object JSON the Types generator will auto-generate the interfaces for you. You can give a name for the root object

Types generator will help user to create TS types from JSON. Just paste your single object JSON the Types generator will auto-generate the interfaces for you. You can give a name for the root object

Types generator Types generator is a utility tool that will help User to create TS Interfaces from JSON. All you have to do is paste your single objec

Dec 6, 2022

With this script you can bypass both root detection and ssl pinning for your android app.

frida_rootansslbypas ██████╗ ██████╗ ██████╗ ████████╗ █████╗ ███╗ ██╗██████╗ ███████╗███████╗██╗ ██████╗ ██╗ ██╗██████╗ █████╗

Dec 24, 2022

Storybook Addon Root Attributes to switch html, body or some element attributes (multiple) at runtime for you story

Storybook Addon Root Attributes to switch html, body or some element attributes (multiple) at runtime for you story

Storybook Addon Root Attributes What is this This project was inspired by le0pard/storybook-addon-root-attribute The existing library received only on

Sep 6, 2022

A VS Code extension to practice and improve your typing speed right inside your code editor. Practice with simple words or code snippets.

A VS Code extension to practice and improve your typing speed right inside your code editor. Practice with simple words or code snippets.

Warm Up 🔥 👨‍💻 A VS Code extension to practice and improve your typing speed right inside your code editor. Practice with simple words or code snipp

Dec 12, 2022

I made countdown birthday and fireworks animation using HTML Canvas, CSS, JS. The fireworks animation gonna come out once the countdown is finished or in other words, "Birthday Time".

Countdown-Birthday-Fireworks-Animation I made countdown birthday and fireworks animation using HTML Canvas, CSS, JS. The fireworks animation gonna com

Dec 31, 2022

A2er - Fun browser extension, changing all words ending with `a` to end with `er`.

a2er Fun browser extension, changing all words ending with a to end with er. This started as a joke between friends and me, pronouncing words ending w

Jan 10, 2022

Kyrgyz / Kazakh numbers-to-words converter

Kyrgyz / Kazakh numbers-to-words converter

Mar 12, 2022

pre-calculated list of similar Persian words ordered by rating and best match

similar-persian-words pre-calculated list of similar Persian words ordered by rating and best match. Install npm: npm install similar-persian-words Us

May 29, 2022

List of ~240,000 Persian words

an-array-of-persian-words List of ~240,000 English words. Derived from the Dehkhoda dictionary. Install npm: npm install an-array-of-persian-words Use

Mar 16, 2022
This textlint rule found representations not suitable for English papers

This textlint rule found representations not suitable for English papers

SATO Yusuke 4 Mar 1, 2022
Chrome Extension to learn English through subtitles while watching YouTube.

Super-Subtitles Chrome Extension to learn English through Subtitles while watching YouTube How does it helps Non-native English speakers often tend to

null 6 Nov 11, 2022
Font-end app to test the transformer model translation from Cape Verdian Creole to English

Getting Started with Create React App This project was bootstrapped with Create React App. Available Scripts In the project directory, you can run: np

Roberto Carlos 5 Sep 28, 2022
English version of the understand-node book

understanding-node-core This repository is for maintaining a English fork of the understand-nodejs book by @theanarkh It is written in Chinese, but in

Sid 4 Oct 1, 2022
List of jokes in English and Indonesian 👌

Jokes API Jokes API is a simple REST API for showing a list of jokes in English and Indonesia????. When to use ✨ Jokes API is a free online REST API t

Rades Pratama 8 Nov 13, 2022
WAMpage - A WebOS root LPE exploit chain

WAMpage WAMpage - A WebOS root LPE exploit chain This exploit is mainly of interest to other researchers - if you just want to root your TV, you proba

David Buchanan 45 Dec 2, 2022
This package enables you to mount your Remix app at a different path than root

Remix Mount Routes This package enables you to mount your Remix app at a different path than root. ?? Installation > npm install -D remix-mount-routes

Kiliman 26 Dec 17, 2022
Solidity NFT whitelist contract example using MerkleTree.js for constructing merkle root and merkle proofs.

MerkleTree.js Solidity NFT Whitelist example Allow NFT minting only to whitelisted accounts by verifying merkle proof in Solidity contract. Merkle roo

Miguel Mota 65 Dec 29, 2022
It's not butter, but it's root.

margerine Episode 2: Revenge of the ¯\_(ツ)_/¯ margerine is a root exploit and adb enabler for the DJI Air Unit (wm150), Caddx Vista (lt150), FPV Goggl

fpv.wtf 183 Dec 24, 2022
A TypeScript implementation of High-Performance Polynomial Root Finding for Graphics (Yuksel 2022)

Nomial Nomial is a TypeScript implementation of Cem Yuksel's extremely fast, robust, and simple root finding algorithm presented in the paper "High-Pe

Peter Boyer 10 Aug 3, 2022