A command-line tool and library for generating regular expressions from user-provided test cases

grex


Build Status dependency status codecov lines of code Downloads

Docs.rs Crates.io Lib.rs license

Linux Download MacOS Download Windows Download

Table of Contents

  1. What does this tool do?
  2. Do I still need to learn to write regexes then?
  3. Current features
  4. How to install?
    4.1 The command-line tool
    4.2 The library
  5. How to use?
    5.1 The command-line tool
    5.2 The library
    5.3 Examples
  6. How to build?
  7. How does it work?
  8. Do you want to contribute?

1. What does this tool do? Top ▲

grex is a library as well as a command-line utility that is meant to simplify the often complicated and tedious task of creating regular expressions. It does so by automatically generating a single regular expression from user-provided test cases. The resulting expression is guaranteed to match the test cases which it was generated from.

This project has started as a Rust port of the JavaScript tool regexgen written by Devon Govett. Although a lot of further useful features could be added to it, its development was apparently ceased several years ago. The plan is now to add these new features to grex as Rust really shines when it comes to command-line tools. grex offers all features that regexgen provides, and more.

The philosophy of this project is to generate the most specific regular expression possible by default which exactly matches the given input only and nothing else. With the use of command-line flags (in the CLI tool) or preprocessing methods (in the library), more generalized expressions can be created.

The produced expressions are Perl-compatible regular expressions which are also compatible with the regular expression parser in Rust's regex crate. Other regular expression parsers or respective libraries from other programming languages have not been tested so far, but they ought to be mostly compatible as well.

2. Do I still need to learn to write regexes then? Top ▲

Definitely, yes! Using the standard settings, grex produces a regular expression that is guaranteed to match only the test cases given as input and nothing else. This has been verified by property tests. However, if the conversion to shorthand character classes such as \w is enabled, the resulting regex matches a much wider scope of test cases. Knowledge about the consequences of this conversion is essential for finding a correct regular expression for your business domain.

grex uses an algorithm that tries to find the shortest possible regex for the given test cases. Very often though, the resulting expression is still longer or more complex than it needs to be. In such cases, a more compact or elegant regex can be created only by hand. Also, every regular expression engine has different built-in optimizations. grex does not know anything about those and therefore cannot optimize its regexes for a specific engine.

So, please learn how to write regular expressions! The currently best use case for grex is to find an initial correct regex which should be inspected by hand if further optimizations are possible.

3. Current Features Top ▲

  • literals
  • character classes
  • detection of common prefixes and suffixes
  • detection of repeated substrings and conversion to {min,max} quantifier notation
  • alternation using | operator
  • optionality using ? quantifier
  • escaping of non-ascii characters, with optional conversion of astral code points to surrogate pairs
  • case-sensitive or case-insensitive matching
  • capturing or non-capturing groups
  • fully compliant to newest Unicode Standard 13.0
  • fully compatible with regex crate 1.3.5+
  • correctly handles graphemes consisting of multiple Unicode symbols
  • reads input strings from the command-line or from a file
  • optional syntax highlighting for nicer output in supported terminals

4. How to install? Top ▲

4.1 The command-line tool Top ▲

You can download the self-contained executable for your platform above and put it in a place of your choice. Alternatively, pre-compiled 64-Bit binaries are available within the package managers Scoop (for Windows) and Homebrew (for macOS and Linux).

grex is also hosted on crates.io, the official Rust package registry. If you are a Rust developer and already have the Rust toolchain installed, you can install by compiling from source using cargo, the Rust package manager. So the summary of your installation options is:

( scoop | brew | cargo ) install grex

4.2 The library Top ▲

In order to use grex as a library, simply add it as a dependency to your Cargo.toml file:

[dependencies]
grex = "1.1.0"

5. How to use? Top ▲

Detailed explanations of the available settings are provided in the library section. All settings can be freely combined with each other.

5.1 The command-line tool Top ▲

$ grex -h

grex 1.1.0
© 2019-2020 Peter M. Stahl <[email protected]>
Licensed under the Apache License, Version 2.0
Downloadable from https://crates.io/crates/grex
Source code at https://github.com/pemistahl/grex

grex generates regular expressions from user-provided test cases.

USAGE:
    grex [FLAGS] [OPTIONS] <INPUT>... --file <FILE>

FLAGS:
    -d, --digits             Converts any Unicode decimal digit to \d
    -D, --non-digits         Converts any character which is not a Unicode decimal digit to \D
    -s, --spaces             Converts any Unicode whitespace character to \s
    -S, --non-spaces         Converts any character which is not a Unicode whitespace character to \S
    -w, --words              Converts any Unicode word character to \w
    -W, --non-words          Converts any character which is not a Unicode word character to \W
    -r, --repetitions        Detects repeated non-overlapping substrings and
                             converts them to {min,max} quantifier notation
    -e, --escape             Replaces all non-ASCII characters with unicode escape sequences
        --with-surrogates    Converts astral code points to surrogate pairs if --escape is set
    -i, --ignore-case        Performs case-insensitive matching, letters match both upper and lower case
    -g, --capture-groups     Replaces non-capturing groups by capturing ones
    -c, --colorize           Provides syntax highlighting for the resulting regular expression
    -h, --help               Prints help information
    -v, --version            Prints version information

OPTIONS:
    -f, --file <FILE>                      Reads test cases on separate lines from a file
        --min-repetitions <QUANTITY>       Specifies the minimum quantity of substring repetitions
                                           to be converted if --repetitions is set [default: 1]
        --min-substring-length <LENGTH>    Specifies the minimum length a repeated substring must have
                                           in order to be converted if --repetitions is set [default: 1]

ARGS:
    <INPUT>...    One or more test cases separated by blank space 

5.2 The library Top ▲

5.2.1 Default settings

Test cases are passed either from a collection via RegExpBuilder::from() or from a file via RegExpBuilder::from_file(). If read from a file, each test case must be on a separate line. Lines may be ended with either a newline \n or a carriage return with a line feed \r\n.

use grex::RegExpBuilder;

let regexp = RegExpBuilder::from(&["a", "aa", "aaa"]).build();
assert_eq!(regexp, "^a(?:aa?)?$");

5.2.2 Convert to character classes

use grex::{Feature, RegExpBuilder};

let regexp = RegExpBuilder::from(&["a", "aa", "123"])
    .with_conversion_of(&[Feature::Digit, Feature::Word])
    .build();
assert_eq!(regexp, "^(\\d\\d\\d|\\w(?:\\w)?)$");

5.2.3 Convert repeated substrings

use grex::{Feature, RegExpBuilder};

let regexp = RegExpBuilder::from(&["aa", "bcbc", "defdefdef"])
    .with_conversion_of(&[Feature::Repetition])
    .build();
assert_eq!(regexp, "^(?:a{2}|(?:bc){2}|(?:def){3})$");

By default, grex converts each substring this way which is at least a single character long and which is subsequently repeated at least once. You can customize these two parameters if you like.

In the following example, the test case aa is not converted to a{2} because the repeated substring a has a length of 1, but the minimum substring length has been set to 2.

use grex::{Feature, RegExpBuilder};

let regexp = RegExpBuilder::from(&["aa", "bcbc", "defdefdef"])
    .with_conversion_of(&[Feature::Repetition])
    .with_minimum_substring_length(2)
    .build();
assert_eq!(regexp, "^(?:aa|(?:bc){2}|(?:def){3})$");

Setting a minimum number of 2 repetitions in the next example, only the test case defdefdef will be converted because it is the only one that is repeated twice.

use grex::{Feature, RegExpBuilder};

let regexp = RegExpBuilder::from(&["aa", "bcbc", "defdefdef"])
    .with_conversion_of(&[Feature::Repetition])
    .with_minimum_repetitions(2)
    .build();
assert_eq!(regexp, "^(?:bcbc|aa|(?:def){3})$");

5.2.4 Escape non-ascii characters

use grex::RegExpBuilder;

let regexp = RegExpBuilder::from(&["You smell like 💩."])
    .with_escaping_of_non_ascii_chars(false)
    .build();
assert_eq!(regexp, "^You smell like \\u{1f4a9}\\.$");

Old versions of JavaScript do not support unicode escape sequences for the astral code planes (range U+010000 to U+10FFFF). In order to support these symbols in JavaScript regular expressions, the conversion to surrogate pairs is necessary. More information on that matter can be found here.

use grex::RegExpBuilder;

let regexp = RegExpBuilder::from(&["You smell like 💩."])
    .with_escaped_non_ascii_chars(true)
    .build();
assert_eq!(regexp, "^You smell like \\u{d83d}\\u{dca9}\\.$");

5.2.5 Case-insensitive matching

The regular expressions that grex generates are case-sensitive by default. Case-insensitive matching can be enabled like so:

use grex::{Feature, RegExpBuilder};

let regexp = RegExpBuilder::from(&["big", "BIGGER"])
    .with_conversion_of(&[Feature::CaseInsensitivity])
    .build();
assert_eq!(regexp, "(?i)^big(?:ger)?$");

5.2.6 Capturing Groups

Non-capturing groups are used by default. Extending the previous example, you can switch to capturing groups instead.

use grex::{Feature, RegExpBuilder};

let regexp = RegExpBuilder::from(&["big", "BIGGER"])
    .with_conversion_of(&[Feature::CaseInsensitivity, Feature::CapturingGroup])
    .build();
assert_eq!(regexp, "(?i)^big(ger)?$");

5.2.7 Syntax highlighting

The method with_syntax_highlighting() may only be used if the resulting regular expression is meant to be printed to the console. The regex string representation returned from enabling this setting cannot be fed into the regex crate.

use grex::RegExpBuilder;

let regexp = RegExpBuilder::from(&["a", "aa", "123"])
    .with_syntax_highlighting()
    .build();

5.3 Examples Top ▲

The following examples show the various supported regex syntax features:

$ grex a b c
^[a-c]$

$ grex a c d e f
^[ac-f]$

$ grex a b x de
^(?:de|[abx])$

$ grex abc bc
^a?bc$

$ grex a b bc
^(?:bc?|a)$

$ grex [a-z]
^\[a\-z\]$

$ grex -r b ba baa baaa
^b(?:a{1,3})?$

$ grex -r b ba baa baaaa
^b(?:a{1,2}|a{4})?$

$ grex y̆ a z
^(?:y̆|[az])$
Note: 
Grapheme y̆ consists of two Unicode symbols:
U+0079 (Latin Small Letter Y)
U+0306 (Combining Breve)

$ grex "I ♥ cake" "I ♥ cookies"
^I ♥ c(?:ookies|ake)$
Note:
Input containing blank space must be 
surrounded by quotation marks.

The string "I ♥♥♥ 36 and ٣ and 💩💩." serves as input for the following examples using the command-line notation:

$ grex <INPUT>
^I ♥♥♥ 36 and ٣ and 💩💩\.$

$ grex -e <INPUT>
^I \u{2665}\u{2665}\u{2665} 36 and \u{663} and \u{1f4a9}\u{1f4a9}\.$

$ grex -e --with-surrogates <INPUT>
^I \u{2665}\u{2665}\u{2665} 36 and \u{663} and \u{d83d}\u{dca9}\u{d83d}\u{dca9}\.$

$ grex -d <INPUT>
^I ♥♥♥ \d\d and \d and 💩💩\.$

$ grex -s <INPUT>
^I\s♥♥♥\s36\sand\s٣\sand\s💩💩\.$

$ grex -w <INPUT>
^\w ♥♥♥ \w\w \w\w\w \w \w\w\w 💩💩\.$

$ grex -D <INPUT>
^\D\D\D\D\D\D36\D\D\D\D\D٣\D\D\D\D\D\D\D\D$

$ grex -S <INPUT>
^\S \S\S\S \S\S \S\S\S \S \S\S\S \S\S\S$

$ grex -dsw <INPUT>
^\w\s♥♥♥\s\d\d\s\w\w\w\s\d\s\w\w\w\s💩💩\.$

$ grex -dswW <INPUT>
^\w\s\W\W\W\s\d\d\s\w\w\w\s\d\s\w\w\w\s\W\W\W$

$ grex -r <INPUT>
^I ♥{3} 36 and ٣ and 💩{2}\.$

$ grex -er <INPUT>
^I \u{2665}{3} 36 and \u{663} and \u{1f4a9}{2}\.$

$ grex -er --with-surrogates <INPUT>
^I \u{2665}{3} 36 and \u{663} and (?:\u{d83d}\u{dca9}){2}\.$

$ grex -dgr <INPUT>
^I ♥{3} \d(\d and ){2}💩{2}\.$

$ grex -rs <INPUT>
^I\s♥{3}\s36\sand\s٣\sand\s💩{2}\.$

$ grex -rw <INPUT>
^\w ♥{3} \w(?:\w \w{3} ){2}💩{2}\.$

$ grex -Dr <INPUT>
^\D{6}36\D{5}٣\D{8}$

$ grex -rS <INPUT>
^\S \S(?:\S{2} ){2}\S{3} \S \S{3} \S{3}$

$ grex -rW <INPUT>
^I\W{5}36\Wand\W٣\Wand\W{4}$

$ grex -drsw <INPUT>
^\w\s♥{3}\s\d(?:\d\s\w{3}\s){2}💩{2}\.$

$ grex -drswW <INPUT>
^\w\s\W{3}\s\d(?:\d\s\w{3}\s){2}\W{3}$

6. How to build? Top ▲

In order to build the source code yourself, you need the stable Rust toolchain installed on your machine so that cargo, the Rust package manager is available.

git clone https://github.com/pemistahl/grex.git
cd grex
cargo build

The source code is accompanied by an extensive test suite consisting of unit tests, integration tests and property tests. For running the unit and integration tests, simply say:

cargo test

Property tests are disabled by default with the #[ignore] annotation because they are very long-running. They are used for automatically generating test cases for regular expression conversion. If a test case is found that produces a wrong conversion, it is shrinked to the shortest test case possible that still produces a wrong result. This is a very useful tool for finding bugs. If you want to run these tests, say:

cargo test -- --ignored

7. How does it work? Top ▲

  1. A deterministic finite automaton (DFA) is created from the input strings.

  2. The number of states and transitions between states in the DFA is reduced by applying Hopcroft's DFA minimization algorithm.

  3. The minimized DFA is expressed as a system of linear equations which are solved with Brzozowski's algebraic method, resulting in the final regular expression.

8. Do you want to contribute? Top ▲

In case you want to contribute something to grex even though it's in a very early stage of development, then I encourage you to do so nevertheless. Do you have ideas for cool features? Or have you found any bugs so far? Feel free to open an issue or send a pull request. It's very much appreciated. :-)

Owner
Peter M. Stahl
Computational linguist, Rust enthusiast, green IT advocate
Peter M. Stahl
Comments
  • Add option to exclude test cases

    Add option to exclude test cases

    This adds a new option --file-negative, which contains a list of negative test cases. The resulting regex will strictly not matching any of these test cases. This fixes #16.


    To support negation, a second DFA is built of the negative cases, and then subtracted from the positive case DFA, using the standard DFA combination algorithm. To limit the number of nodes generated, combinations of nodes in the two DFAs are visited in depth-first order. Nodes that only occur in the negative match DFA are not visited.

    Because the repetition feature can produce grapheme transitions in the DFA that are variable length, code is added to calculate the overlap of two grapheme ranges.

    The generated graphs can contain 'dead ends' so some code is added to remove those. Some bug fixes for corner cases that were previously not hit were needed in the recreate_graph function were also necessary. Also find_next_state was written to use the new grapheme overlapping function, to prevent sometimes creating multiple conflicting edges out of a node.

    As part of this, a bug was fixed that previously caused blank lines the input to not be considered in the final regex, because the "initial" state could never be considered an accept state.

    I got rid of final_state_indices and moved that information into the node label. I also added descriptive labels to nodes to aid debugging.

    Adds appropriate tests. All pass. Ran through cargo fmt and cargo clippy.

    I haven't written much rust before so please let me know if there are any issues.

  • Problems to consider when making anchors optional

    Problems to consider when making anchors optional

    It seems grex inherited this bug from regexgen: https://github.com/devongovett/regexgen/issues/31

    Repro:

    $ cat input
    AGBHD
    EIBCD
    EGBCD
    FBJBF
    AGBH
    EIBC
    EGBC
    EBC
    FBC
    CD
    F
    C
    ABCD
    EBCD
    FBCD
    
    $ # note the last entry to be matched, i.e. "FBCD"
    
    $ grex --file input
    ^(?:F(?:BJBF)?|(?:E(?:[GI])?BC|(?:FB)?C)D?|A(?:GBHD?|BCD))$
    

    After removing ^ and $ (see #30), this generated pattern does not match "FBCD" despite it being one of the input strings:

    'FBCD'.match(/(?:F(?:BJBF)?|(?:E(?:[GI])?BC|(?:FB)?C)D?|A(?:GBHD?|BCD))/g);
    // → ['F', 'CD']
    

    Here’s what I think the bug is: within the generated pattern, it should never happen that something on the left matches a prefix of something that's further on the right, because then the latter can never match.

    See https://github.com/devongovett/regexgen/issues/31#issuecomment-801380409 for some more details.

  • Add optional CLI feature if using grex as a library

    Add optional CLI feature if using grex as a library

    Hi @pemistahl, first off, thanks for this wonderful library!

    But would it be possible to have an optional CLI feature in cargo.toml?

    In that way, if I'm using grex as a library, I don't need to get dependencies like structopt included in my project.

  • Grex hash for v1.2.0 release fails to verify in scoop

    Grex hash for v1.2.0 release fails to verify in scoop

    Simple as that:

    λ scoop install grex
    Installing 'grex' (1.2.0) [64bit]
    grex-v1.2.0-x86_64-pc-windows-msvc.zip (792,6 KB) [==========================================================================================================================] 100%
    Checking hash of grex-v1.2.0-x86_64-pc-windows-msvc.zip ... ERROR Hash check failed!
    App:         main/grex
    URL:         https://github.com/pemistahl/grex/releases/download/v1.2.0/grex-v1.2.0-x86_64-pc-windows-msvc.zip
    First bytes: 50 4B 03 04 14 00 00 00
    Expected:    d075efdbccb01c8b093b6c5120d064cc5ead534dec483c1a3d43cc4543d940ea
    Actual:      da9c50a4e19cbf7b1c4a001a9252c1a097b8eebbb9ec0bbf3f88bc79030e7d73
    

    Creating issue as this may be an overlooked thing. Another option is that I'm facing MIM attack which would be worse case scenario ;)

  • Make anchors

    Make anchors "^" and "$" optional

    Additional options: -B, --match-beginning - Match the beginning of the string (prepend ^) -E, --match-end - Match the end of the string (append $) -X, --match-line - Match the whole string (as a shorthand for -B -E)

    It's result of the discussion in the issue pemistahl/grex#30. Sorry, if some of my modifications look silly. It's my attempt to understand Rust from the scratch.

  • Overly complex regex with input containing several common parts

    Overly complex regex with input containing several common parts

    While building a regex for the various possible formats of Creative Commons' Public Domain Mark (to assist in https://github.com/spdx/license-list-XML/issues/988), I noticed that grex produces a more complex regex than what the input requires.

    Here's what I provided:

    grex \
      "This work is free of known copyright restrictions." \
      "This work (WWW) is free of known copyright restrictions." \
      "This work (by AAA) is free of known copyright restrictions." \
      "This work, identified by CCC, is free of known copyright restrictions." \
      "This work (WWW, by AAA) is free of known copyright restrictions." \
      "This work (WWW), identified by CCC, is free of known copyright restrictions." \
      "This work (WWW, by AAA), identified by CCC, is free of known copyright restrictions." \
      "This work (by AAA), identified by CCC, is free of known copyright restrictions."
    

    The result was (after manually making groups non-capturing):

    ^This work(?:(?: \((?:(?:WWW, b|b)y AAA|WWW)\),|,) identified by CCC, |(?: \((?:(?:WWW, b|b)y AAA|WWW)\) | ))is free of known copyright restrictions\.$
    

    Visualized as a Debuggex diagram:

    Screenshot 2020-03-10 at 14 20 39

    A regex produced by hand to match the same input shows that this could be simplified:

    ^This work(?: \((?:WWW(?:, by AAA)?|by AAA)\))?(?:, identified by CCC,)? is free of known copyright restrictions\.$
    

    Debuggex diagram:

    Screenshot 2020-03-10 at 12 22 50

  • Add feature for disabling capturing groups

    Add feature for disabling capturing groups

    grex produces regular expressions with capturing groups by default. Some users might prefer to create regexes with non-capturing groups instead, so I will add a new library method and a new command-line flag for handling this use case.

  • Optional anchors

    Optional anchors "^" and "$"

    Added options to suppress anchors: -B, --no-match-beginning - Match the beginning of the string (prepend ^) -E, --no-match-end - Match the end of the string (append $) -X, --no-match-line - Match the whole string (as a shorthand for -B -E)

    This PR is intended to close the issue pemistahl/grex#30 and my previous GH-39 as this one covers the requirements to keep anchors by default.

  • Couldn't compile ndarray

    Couldn't compile ndarray

    Hey I'm on Debian bullseye and cargo install grex won't succeed :

    [lots of errors]
    error: aborting due to 204 previous errors
    
    For more information about this error, try `rustc --explain E0277`.
    error: could not compile `ndarray`.
    

    ndarray version = 0.15.1 cargo version = 1.47

    has someone experienced the same issue ?

  • Installation problem

    Installation problem

    When installing grex on Debian Linux, I get 365 syntax errors. They seem to be many repetitions of: the trait data_traits::RawDataSubst<u128> is not implemented for <S as data_traits::DataOwned>::MaybeUninit 348 | impl_scalar_lhs_op!(Complex, Ordered, /, Div, div, "division"); | -------------------------------------------------------------------- in this macro invocation | ::: /home/greg/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.0/src/data_traits.rs:411:1 | 411 | pub unsafe trait DataOwned: Data { | -------------------------------- required by data_traits::DataOwned I seem to get the same blast whether I run cargo from the command-line or in vscode. I installed with: $ git clone https://github.com/pemistahl/grex.git $ cd grex $ cargo build and $ cargo install grex Since I don't see any other complaints perhaps my distribution is to blame: $ uname -a Linux debian-dell-desktop 5.10.0-4-amd64 #1 SMP Debian 5.10.19-1 (2021-03-02) x86_64 GNU/Linux Creating an empty project with grex as the only dependency also fails. Version 1.1 of grex seems to run fine. I've only been programming rust for about a year, so I haven't gotten to writing macros yet, but I'll try to dig deeper. -- Greg

  • Inserting a character breaks repetition detection (sometimes)

    Inserting a character breaks repetition detection (sometimes)

    I have been looking for a way to find repeated substrings. I think I can parse grex results to find repetitions, and given that my strings are rather short, I could then compare group contents to find non-contiguous repetitions.

    I did some quick tests and I may have chanced upon a problem:

    • grex -dsr -c 'heeelooo world lalala lalala foo foo xalxalxal xalxalxal'

      gives ^he{3}lo{3}\sworld(?:\s(?:la){3}){2}(?:\sfo{2}){2}(?:\s(?:xal){3}){2}$

    • grex -dsr -c 'heeelooo world lalala lalala foo foo xalxalxal i xalxalxal'

      gives ^he{3}lo{3}\sworld\s(?:(?:la){3}\s){2}(?:fo{2}\s){2}(?:xal){3}\si\s(?:xal){3}$

    • grex -dsr -c 'heeelooo world lalala k lalala foo foo xalxalxal i xalxalxal'

      gives ^he{3}lo{3}\sworld\slalala\sk\slalala\s(?:fo{2}\s){2}(?:xal){3}\si\s(?:xal){3}$

    In the last probe, neither of the two lalala was detected as repetitious when a k was inserted, although xalxalxal was treated as expected. Any thoughts?

  • Treat diffs as separate groups

    Treat diffs as separate groups

    For example:

    
    <iframe src="//player.bilibili.com/player.html?aid=303065226&bvid=BV1dP411n7bc&cid=833485551&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>
    
    <iframe src="//player.bilibili.com/player.html?aid=261233537&bvid=BV1xe411j7EQ&cid=851171461&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>
    
    <iframe src="//player.bilibili.com/player.html?aid=558528772&bvid=BV1Ee4y1r7wX&cid=848823074&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>
    
    
    <iframe src="//player.bilibili.com/player.html?aid=455751094&bvid=BV1U5411s7RU&cid=383073940&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>
    
    

    diff:

    aid=303065226&bvid=BV1dP411n7bc&cid=833485551
    aid=261233537&bvid=BV1xe411j7EQ&cid=851171461
    aid=558528772&bvid=BV1Ee4y1r7wX&cid=848823074
    aid=455751094&bvid=BV1U5411s7RU&cid=383073940
    

    regex:

    aid=([0-9]+)&bvid=([0-9a-zA-Z]+)&cid=([0-9]+)
    

    current output grex -f grex.txt -g:

    <iframe src="//player\.bilibili\.com/player\.html\?aid=(((261233537&bvid=BV1xe411j7EQ&cid=85117146|303065226&bvid=BV1dP411n7bc&cid=83348555)1|455751094&bvid=BV1U5411s7RU&cid=383073940)|558528772&bvid=BV1Ee4y1r7wX&cid=848823074)&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>$
    suliveevil@swy-M1 ~ % grex -f grex.txt -r
    ^<iframe src="/{2}player\.(?:bili){2}\.com/player\.html\?aid=(?:(?:45{2}751094&bvid=BV1U541{2}s7RU&cid=383073940|(?:26123{2}537&bvid=BV1xe41{2}j7EQ&cid=851{2}7146|(?:30){2}652{2}6&bvid=BV1dP41{2}n7bc&cid=83{2}485{3})1)|5{2}85287{2}2&bvid=BV1Ee4y1r7wX&cid=848{2}23074)&page=1" scrol{2}ing="no" border="0" frameborder="no" framespacing="0" al{2}owful{2}scre{2}n="true"> </iframe>
    
    截屏2022-10-04 05 23 50

    expected output:

    <iframe src="//player\.bilibili\.com/player\.html\?aid=([0-9]+)&bvid=([0-9a-zA-Z]+)&cid=([0-9]+)&page=1" scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>
    
    截屏2022-10-04 05 24 20

    https://github.com/pemistahl/grex/issues/48

  • Allow to specify characters that have to be converted to character class

    Allow to specify characters that have to be converted to character class

    First of all thank you for this great tool. When using, I often need to convert text into more detailed character classes, not just non-digits or non-blank characters. Is it possible to customize the range of characters to be converted into character classes, like [a-e\d], [①-⑨⒈-⒙] or specific languages such as Chinese and Japanese. For example, if the source text is 我的名字是Tom, I hope to get the regular expression [\u{4e00}-\u{9fa5}]{5}\w{3} instead of \w{8}, by specifying character class [\u{4e00}-\u{9fa5}]. And I want to specify the maximum and minimum length of repeated substrings. Sometimes I get results like (\w{5}|\w{7,8}|\w{10,17}), but the regular expression I expected is (\w{3,20}). So I hope to be able to specify the minimum and maximum repetition times of the substring, or combine the repetition times into an interval instead of multiple branches. I think these two points can be specified together, using multiple formats similar to \w{3,20} to specify characters that must be converted into character classes.

  • Provide more installation options

    Provide more installation options

    I think this is a great tool by description. But I use ubuntu and I doesn't have homebrew (and I don't want to install it only for this tool). I cannot try this tool :( I think good idea an wget one liner install script, or a step by step description, or apt or anithing what a tipical ubuntu laptop can do without install more package manager.

  • Allow to provide test cases that must not be matched

    Allow to provide test cases that must not be matched

    Currently, only test cases that must be matched by the generated regular expression can be provided. It would be useful to additionally provide test cases that must not be matched by the generated expression. In combination with shorthand character classes this would allow for more specific and versatile regular expressions.

A command line tool that builds and (re)starts your web application everytime you save a Go or template fileA command line tool that builds and (re)starts your web application everytime you save a Go or template file

# Fresh Fresh is a command line tool that builds and (re)starts your web application everytime you save a Go or template file. If the web framework yo

Nov 22, 2021
A command line utility and library for generating professional looking invoices in Go.
A command line utility and library for generating professional looking invoices in Go.

ginvoicer A command line utility and library for generating professional looking invoices in Go. This is a very rough draft and there could still be b

Dec 15, 2022
mass-binding-target is a command line tool for generating binding target list by search plot files from disk.

mass-binding-target mass-binding-target is a command line tool for generating binding target list by search plot files from disk. Build Go 1.13 or new

Nov 5, 2021
git-glimpse is a command-line tool that is aimed at generating a git prompt like the one from zsh-vcs-prompt.

Git GoGlimpse git-glimpse is a command-line tool that is aimed at generating a git prompt like the one from zsh-vcs-prompt. The particularity of this

Jan 27, 2022
A commandline tool to resolve URI Templates expressions as specified in RFC 6570.

URI Are you tired to build, concat, replace URL(s) (via shell scripts sed/awk/tr) from your awesome commandline pipeline? Well! here is the missing pi

Jun 9, 2021
A command line utility for generating language-specific project structure.
A command line utility for generating language-specific project structure.

hydra hydra is a command line utility for generating language-specific project structures. ⏬ ✨ Features Build project templates with just one command

Oct 8, 2021
An open-source GitLab command line tool bringing GitLab's cool features to your command line
An open-source GitLab command line tool bringing GitLab's cool features to your command line

GLab is an open source GitLab CLI tool bringing GitLab to your terminal next to where you are already working with git and your code without switching

Dec 30, 2022
A command line tool to prompt for a value to be included in another command line.

readval is a command line tool which is designed for one specific purpose—to prompt for a value to be included in another command line. readval prints

Dec 22, 2021
🚀 goprobe is a promising command line tool for inspecting URLs with modern and user-friendly way.

goprobe Build go build -o ./bin/goprobe Example > goprobe https://github.com/gaitr/goprobe > cat links.txt | goprobe > echo "https://github.com/gaitr/

Oct 24, 2021
A CLI tool that you can use create regular backups of your Notion.so Pages.

notion-offliner A CLI tool that you can use create regular backups of your Notion.so Pages. Perfect for disaster scenarios and offline usage. MacOS an

Jan 3, 2023
A command line http test tool. Maintain the case via git and pure text
A command line http test tool. Maintain the case via git and pure text

httptest A command line http test tool Maintain the api test cases via git and pure text We want to test the APIs via http requests and assert the res

Dec 16, 2022
CLI tool and library for generating a Software Bill of Materials from container images and filesystems
CLI tool and library for generating a Software Bill of Materials from container images and filesystems

A CLI tool and Go library for generating a Software Bill of Materials (SBOM) from container images and filesystems. Exceptional for vulnerability dete

Jan 6, 2023
tfacon is a CLI tool for connecting Test Management Platforms and Test Failure Analysis Classifier.

Test Failure Classifier Connector Description tfacon is a CLI tool for connecting Test Management Platforms and Test Failure Analysis Classifier. Test

Jun 23, 2022
oc CLI plugin to interact with Helm features provided by the OpenShift Console

OpenShift provides support for managing the lifecycle of Helm charts. This capability is limited primarily to the Web Console. This plugin enables the management of Helm charts similar to using the standalone Helm CLI while offloading much of the work to OpenShift.

Aug 20, 2022
Oct 1, 2022
wy : a set of command-line tools to test your container-based platform

wy wy (Abbreviation of Would You) is a set of command-line tools to test your container-based platform. ToC: Commands Deployment Monitoring Contributi

Apr 30, 2022
Go test command line interface for dlv(delve)

What does it do? Delver makes the command line interface for starting dlv the same as the one used in go test Example Say you're using this when devel

Jan 7, 2022
git-xargs is a command-line tool (CLI) for making updates across multiple Github repositories with a single command.
git-xargs is a command-line tool (CLI) for making updates across multiple Github repositories with a single command.

Table of contents Introduction Reference Contributing Introduction Overview git-xargs is a command-line tool (CLI) for making updates across multiple

Dec 31, 2022
git-xargs is a command-line tool (CLI) for making updates across multiple GitHub repositories with a single command
git-xargs is a command-line tool (CLI) for making updates across multiple GitHub repositories with a single command

git-xargs is a command-line tool (CLI) for making updates across multiple GitHub repositories with a single command. You give git-xargs:

Feb 5, 2022