Re2::Grep(pattern:String, options:Struct<...>?) -> (string:String?) -> Bool
Re2::Match(pattern:String, options:Struct<...>?) -> (string:String?) -> Bool
Re2::Capture(pattern:String, options:Struct<...>?) -> (string:String?) -> Struct<_1:String?,foo:String?,...>
Re2::FindAndConsume(pattern:String, options:Struct<...>?) -> (string:String?) -> List<String>
Re2::Replace(pattern:String, options:Struct<...>?) -> (string:String?, replacement:String) -> String?
Re2::Count(pattern:String, options:Struct<...>?) -> (string:String?) -> Uint32
Re2::Options([CaseSensitive:Bool?,DotNl:Bool?,Literal:Bool?,LogErrors:Bool?,LongestMatch:Bool?,MaxMem:Uint64?,NeverCapture:Bool?,NeverNl:Bool?,OneLine:Bool?,PerlClasses:Bool?,PosixSyntax:Bool?,Utf8:Bool?,WordBoundary:Bool?]) -> Struct<CaseSensitive:Bool,DotNl:Bool,Literal:Bool,LogErrors:Bool,LongestMatch:Bool,MaxMem:Uint64,NeverCapture:Bool,NeverNl:Bool,OneLine:Bool,PerlClasses:Bool,PosixSyntax:Bool,Utf8:Bool,WordBoundary:Bool>
The Re2 module supports regular expressions based on google::RE2 with a wide range of features provided (see the official documentation).
By default, the UTF-8 mode is enabled automatically if the regular expression is a valid UTF-8-encoded string, but is not a valid ASCII string. You can manually control the settings of the re2 library, if you pass the result of the Re2::Options
function as the second argument to other module functions, next to the regular expression.
{% note warning %}
Make sure to double all the backslashes in your regular expressions (if they are within a quoted string): standard string literals are treated as C-escaped strings in SQL. You can also format regular expressions as raw strings @@regexp@@
: double slashes are not needed in this case.
{% endnote %}
$value = "xaaxaaxaa";
$options = Re2::Options(false AS CaseSensitive);
$match = Re2::Match("[ax]+\\d");
$grep = Re2::Grep("a.*");
$capture = Re2::Capture(".*(?P<foo>xa?)(a{2,}).*");
$replace = Re2::Replace("x(a+)x");
$count = Re2::Count("a", $options);
SELECT
$match($value) AS match, -- false
$grep($value) AS grep, -- true
$capture($value) AS capture, -- (_0: 'xaaxaaxaa', _1: 'aa', foo: 'x')
$capture($value)._1 AS capture_member, -- "aa"
$replace($value, "b\\1z") AS replace, -- "baazaaxaa"
$count($value) AS count; -- 6
If you leave out the details of implementation and syntax of regular expressions, those functions are totally similar to the same-name functions from the Pire module. With other things equal and no specific preferences, we recommend that you use Pire::Grep or Pire::Match
.
You can call the Re2::Grep
function by using a REGEXP
expression (see the basic expression syntax).
For example, the following two queries are equivalent (also in terms of computing efficiency):
$grep = Re2::Grep("b+"); SELECT $grep("aaabccc");
SELECT "aaabccc" REGEXP "b+";
Unlike Pire::Capture, Re2::Capture
supports multiple and named capturing groups.
Result type: a structure with the fields of the type String?
.
_1
, _2
, etc._0
field containing the entire substring matching the regular expression.For more information about working with structures in YQL, see the section on containers.
Searches for all occurrences of the regular expression in the passed text and returns a list of values corresponding to the parenthesized part of the regular expression for each occurrence.
Works as follows:
\\1
, \\2
etc. The \\0
back-reference stands for the whole substring that matches the regular expression.Returns the number of non-overlapping substrings of the input string that have matched the regular expression.
Notes on Re2::Options from the official repository
Parameter | Default | Comments |
---|---|---|
CaseSensitive:Bool? | true | match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode) |
DotNl:Bool? | false | let . match \n (default ) |
Literal:Bool? | false | interpret string as literal, not regexp |
LogErrors:Bool? | true | log syntax and execution errors to ERROR |
LongestMatch:Bool? | false | search for longest match, not first match |
MaxMem:Uint64? | - | (see below) approx. max memory footprint of RE2 |
NeverCapture:Bool? | false | parse all parents as non-capturing |
NeverNl:Bool? | false | never match \n, even if it is in regexp |
PosixSyntax:Bool? | false | restrict regexps to POSIX egrep syntax |
Utf8:Bool? | true | text and pattern are UTF-8; otherwise Latin-1 |
The following options are only consulted when PosixSyntax == true. When PosixSyntax == false, these features are always enabled and cannot be turned off; to perform multi-line matching in that case, begin the regexp with (?m). | ||
PerlClasses:Bool? | false | allow Perl's \d \s \w \D \S \W |
WordBoundary:Bool? | false | allow Perl's \b \B (word boundary and not) |
OneLine:Bool? | false | ^ and $ only match beginning and end of text |