subquery.md 7.7 KB

Subquery templates

DEFINE SUBQUERY {#define-subquery}

DEFINE SUBQUERY lets you declare a subquery template that is a parameterizable block of several top-level expressions (statements), and then use it repeatedly in the FROM section of SELECT statements or as input data in PROCESS/REDUCE with parameters. As opposed to actions, the subquery template must end with the SELECT/PROCESS/REDUCE statement whose result is the subquery's return value. Keep in mind that you can't use the top-level SELECT/PROCESS/REDUCE statement and the modifying expressions (for example, INSERT) more than once.

After DEFINE SUBQUERY, specify:

  1. A named expression that will be used to access the declared template further in the query.
  2. The round brackets contain a list of named expressions you can use to access parameters inside the subquery template.
  3. AS keyword.
  4. The list of top-level expressions.
  5. END DEFINE acts as a marker of the last expression inside the subquery template.

One or more of the last subquery parameters can be marked with a question as optional: unless they haven't been specified when calling subquery, they will be assigned the NULL value.

{% note info %}

In large queries, you can use separate files for action definition and include them to the main query using EXPORT + IMPORT so that instead of one long text you can have several logical parts that are easier to navigate. An important nuance: the USE my_cluster; directive in the import query doesn't affect behavior of actions declared in other subquery files.

{% endnote %}

Even if the list of parameters in the subquery template definition is empty, when using it in FROM, specify the parentheses (). This may help to limit the scope of named expressions used in only one subquery.

In some cases, instead of DEFINE SUBQUERY it's more convenient to use an equivalent lambda function. In this case, the lambda function must accept, as the first argument, the special object called world that passes dependencies to make certain PRAGMA or COMMIT statements visible at the query template's point of use. Also, make sure to pass this object as the first argument along with the other arguments (if any) to other query templates, if you use them in your lambda function. The return value of the lambda function must have the structure list type (output table) or a list of variants over a tuple of structures (multiple output tables). In the latter case, the following unpacking is usually used at the query template's point of use:

$out1, $out2 = PROCESS $mySubquery($myParam1, $myParam2);
-- next we use $out1 and $out2 as separate tables.

Examples

DEFINE SUBQUERY $hello_world($name, $suffix?) AS
    $name = $name ?? ($suffix ?? "world");
    SELECT "Hello, " || $name || "!";
END DEFINE;

SELECT * FROM $hello_world(NULL); -- Hello, world!
SELECT * FROM $hello_world("John"); -- Hello, John!
SELECT * FROM $hello_world(NULL, "Earth"); -- Hello, Earth!
DEFINE SUBQUERY $dup($x) AS
   SELECT * FROM $x(1) -- apply the passed query template with one argument
   UNION ALL
   SELECT * FROM $x(2); -- ... and with other argument
END DEFINE;

DEFINE SUBQUERY $sub($n) AS
   SELECT $n * 10;
END DEFINE;

SELECT * FROM $dup($sub); -- pass the query template $sub as a parameter
-- Result:
-- 10
-- 20
/* Hide the named expressions $a and $b inside a separate scope */
DEFINE SUBQUERY $clean() AS
   $a = 10;
   $b = $a * $a;
   SELECT $a AS a, $b AS b;
END DEFINE;

SELECT * FROM $clean(); -- a: 10, b: 100
USE my_cluster;

DEFINE SUBQUERY $input() as
    SELECT * FROM `home/yql/tutorial/users`;
END DEFINE;

DEFINE SUBQUERY $myProcess1($nestedQuery, $lambda) AS
    PROCESS $nestedQuery() -- the parentheses () are mandatory here
    USING $lambda(TableRow());
END DEFINE;

$myProcess2 = ($world, $nestedQuery, $lambda) -> {
    -- If you use ListFlatMap or YQL::OrderedFlatMap, you get an Ordered YT Map operation
    return YQL::FlatMap($nestedQuery($world), $lambda);
};

-- With such use, the implementations of $myProcess1 and $myProcess2 are identical
SELECT * FROM $myProcess1($input, ($x) -> { RETURN AsList($x, $x) });
SELECT * FROM $myProcess2($input, ($x) -> { RETURN AsList($x, $x) });
USE my_cluster;

DEFINE SUBQUERY $runPartition($table) AS
    $paritionByAge = ($row) -> {
        $recordType = TypeOf($row);
        $varType = VariantType(TupleType($recordType, $recordType));
        RETURN If($row.age % 2 == 0,
            Variant($row, "0", $varType),
            Variant($row, "1", $varType),
        );
    };

    PROCESS $table USING $paritionByAge(TableRow());
END DEFINE;

-- Unpacking two results
$i, $j = (PROCESS $runPartition("home/yql/tutorial/users"));

SELECT * FROM $i;

SELECT * FROM $j;

Combining the subquery templates: SubqueryExtend, SubqueryUnionAll, SubqueryMerge, SubqueryUnionMerge {#subquery-extend} {#subquery-unionall} {#subquery-merge} {#subquery-unionmerge}

These functions combine the results of one or more subquery templates passed by arguments. The number of parameters in such subquery templates must be the same.

  • SubqueryExtend requires matching of subquery schemas.
  • SubqueryUnionAll follows the same rules as ListUnionAll.
  • SubqueryMerge uses the same constraints as SubqueryExtend and also outputs a sorted result if all subqueries have the same sort order.
  • SubqueryUnionMerge uses the same constraints as SubqueryUnionAll and also outputs a sorted result if all subqueries have the same sort order.

Examples

DEFINE SUBQUERY $sub1() as
    SELECT 1 as x;
END DEFINE;

DEFINE SUBQUERY $sub2() as
    SELECT 2 as x;
END DEFINE;

$s = SubqueryExtend($sub1,$sub2);
PROCESS $s();

Combining subquery templates after substituting list items: SubqueryExtendFor, SubqueryUnionAllFor, SubqueryMergeFor, SubqueryUnionMergeFor {#subquery-extend-for} {#subquery-unionall-for} {#subquery-merge-for} {#subquery-unionmerge-for}

The functions take the following arguments:

  • A non-empty list of values.
  • A subquery template that must have exactly one parameter.

They substitute each item from the list into the subquery template as a parameter and then combine the obtained subqueries.

  • SubqueryExtendFor requires matching of subquery schemas.
  • SubqueryUnionAllFor follows the same rules as ListUnionAll.
  • SubqueryMergeFor uses the same constraints as SubqueryExtendFor and also outputs a sorted result if all subqueries have the same sort order.
  • SubqueryUnionMergeFor uses the same constraints as SubqueryUnionAllFor and also outputs a sorted result if all subqueries have the same sort order.

Examples

DEFINE SUBQUERY $sub($i) as
    SELECT $i as x;
END DEFINE;

$s = SubqueryExtendFor([1,2,3],$sub);
PROCESS $s();

Adding sorting to the SubqueryOrderBy template or indicating the presence of this SubqueryAssumeOrderBy

The functions take the following arguments:

  • A subquery template without parameters.
  • A list of pairs (string indicating the column name and Boolean value: True for sorting in ascending order or False for sorting in descending order).

And they build a new query template without parameters where sorting is performed or a comment on the use of sorting is added to the result. To use the resulting query template, call the PROCESS function, since, when using a SELECT, sorting is ignored.

Examples

DEFINE SUBQUERY $sub() as
   SELECT * FROM (VALUES (1,'c'), (1,'a'), (3,'b')) AS a(x,y);
end define;

$sub2 = SubqueryOrderBy($sub, [('x',false), ('y',true)]);

PROCESS $sub2();