Releases: zaibacu/rita-dsl
0.7.0: Merge pull request #109 from zaibacu/support-spacy-v3
0.7.0 (2021-02-02)
Features
-
standalone
engine now will return submatches list containing start and end for each part of match
#93 -
Partially covered #70
Allow nested patterns, like:
num_with_fractions = {NUM, WORD("-")?, IN_LIST(fractions)}
complex_number = {NUM|PATTERN(num_with_fractions)}
{PATTERN(complex_number)}->MARK("NUMBER")
-
Submatches for rita-rust engine
#96 -
Regex module which allows to specify word pattern, eg.
REGEX(^a)
means word must start with letter "a"Implemented by: Roland M. Mueller (https://github.com/rolandmueller)
#101 -
ORTH module which allows you to specify case sensitive entry while rest of the rules ignores case. Used for acronyms and proper names
Implemented by: Roland M. Mueller (https://github.com/rolandmueller)
#102 -
Additional macro for
tag
module, allowing to tag specific word/list of wordsImplemented by: Roland M. Mueller (https://github.com/rolandmueller)
#103 -
Added
names
module which allows to generate person names variations
#105 -
spaCy v3 Support
#109
Fix
-
Optimizations for Rust Engine
-
No need for passing text forward and backward, we can calculate from text[start:end]
-
Grouping and sorting logic can be done in binary code
#88
-
-
Fix NUM parsing bug
#90 -
Switch from
(^\s)
to\b
when doingIN_LIST
. Should solve several corner cases
#91 -
Fix floating point number matching
#92
0.6.0
0.6.0 (2020-08-29)
Features
- Implemented ability to alias macros, eg.:
numbers = {"one", "two", "three"}
@alias IN_LIST IL
IL(numbers) -> MARK("NUMBER")
Now using "IL" will actually call "IN_LIST" macro.
#66
-
introduce the TAG element as a module. Needs a new parser for the SpaCy translate.
Would allow more flexible matching of detailed part-of-speech tag, like all adjectives or nouns: TAG("^NN|^JJ").Implemented by:
Roland M. Mueller (https://github.com/rolandmueller)
#81 -
Add a new module for a PLURALIZE tag
For a noun or a list of nouns, it will match any singular or plural word.Implemented by:
Roland M. Mueller (https://github.com/rolandmueller)
#82 -
Add a new Configuration implicit_hyphon (default false) for automatically adding hyphon characters - to the rules.
Implemented by:
Roland M. Mueller (https://github.com/rolandmueller)
#84 -
Allow to give custom regex impl. By default
re
is used
#86 -
An interface to be able to use rust engine.
In general it's identical to
standalone
, but differs in one crucial part - all of the rules are compiled into actual binary code and that provides large performance boost.
It is proprietary, because there are various caveats, engine itself is a bit more fragile and needs to be tinkered to be optimized to very specific case
(eg. few long texts with many matches vs a lot short texts with few matches).
#87
Fix
0.5.0
Features
-
Added
PREFIX
macro which allows to attach word in front of list items or words
#47 -
Allow to pass variables directly when doing
compile
andcompile_string
#51 -
Allow to compile (and later load) rules using rita CLI while using standalone engine (spacy is already supported)
#53 -
Added ability to import rule files into rule file. Recursive import is supported as well.
#55 -
Added possibility to define pattern as a variable and reuse it in other patterns:
Example:
ComplexNumber = {NUM+, WORD("/")?, NUM?}
{PATTERN(ComplexNumber), WORD("inches"), WORD("Height")}->MARK("HEIGHT")
{PATTERN(ComplexNumber), WORD("inches"), WORD("Width")}->MARK("WIDTH")
Fix
v0.4.0
0.4.0 (2020-01-25)
Features
- Support for deaccent. In general, if accented version of word is given, both deaccented and accented will be used to match. To turn iit off -
!CONFIG("deaccent", "N")
#38 - Added shortcuts module to simplify injecting into spaCy
#42
Fix
- Fix issue regarding Spacy rules with
IN_LIST
and using case-sensitive mode. It was creating Regex pattern which is not valid spacy pattern
#40
Improve project infrastructure
Config revamp
Now there's one global config and child config created per-session (one session = one rule file compilation).
Imports and variables are stored in this config as well.
Remove context
argument from MACROS, making code cleaner and easier to read
0.2.2
Features of up to this point:
- Standalone parser - can use internal regex rather than spaCy if you need to
- Ability to do logical
OR
in rule. eg.:{WORD(w1)|WORD(w2),WORD(w3)}
would result into two rules:{WORD(w1),WORD(w3)}
and{WORD(w2),WORD(w3)}
- Exclude operator
{WORD(w1), WORD(w2)!}
would match w1 and anything butw2