DEFINE SUBQUERY
lets you declare a subquery template that is a parameterizable block of several top-level expressions (statements), and then use it repeatedly in the FROM
section of SELECT statements or as input data in PROCESS/REDUCE with parameters.
As opposed to actions, the subquery template must end with the SELECT
/PROCESS
/REDUCE
statement whose result is the subquery's return value. Keep in mind that you can't use the top-level SELECT
/PROCESS
/REDUCE
statement and the modifying expressions (for example, INSERT
) more than once.
After DEFINE SUBQUERY
, specify:
AS
keyword.END DEFINE
acts as a marker of the last expression inside the subquery template.One or more of the last subquery parameters can be marked with a question as optional: unless they haven't been specified when calling subquery, they will be assigned the NULL
value.
{% note info %}
In large queries, you can use separate files for action definition and include them to the main query using EXPORT + IMPORT so that instead of one long text you can have several logical parts that are easier to navigate. An important nuance: the USE my_cluster;
directive in the import query doesn't affect behavior of actions declared in other subquery files.
{% endnote %}
Even if the list of parameters in the subquery template definition is empty, when using it in FROM
, specify the parentheses ()
. This may help to limit the scope of named expressions used in only one subquery.
In some cases, instead of DEFINE SUBQUERY
it's more convenient to use an equivalent lambda function.
In this case, the lambda function must accept, as the first argument, the special object called world
that passes dependencies to make certain PRAGMA or COMMIT statements visible at the query template's point of use. Also, make sure to pass this object as the first argument along with the other arguments (if any) to other query templates, if you use them in your lambda function.
The return value of the lambda function must have the structure list type (output table) or a list of variants over a tuple of structures (multiple output tables). In the latter case, the following unpacking is usually used at the query template's point of use:
$out1, $out2 = PROCESS $mySubquery($myParam1, $myParam2);
-- next we use $out1 and $out2 as separate tables.
DEFINE SUBQUERY $hello_world($name, $suffix?) AS
$name = $name ?? ($suffix ?? "world");
SELECT "Hello, " || $name || "!";
END DEFINE;
SELECT * FROM $hello_world(NULL); -- Hello, world!
SELECT * FROM $hello_world("John"); -- Hello, John!
SELECT * FROM $hello_world(NULL, "Earth"); -- Hello, Earth!
DEFINE SUBQUERY $dup($x) AS
SELECT * FROM $x(1) -- apply the passed query template with one argument
UNION ALL
SELECT * FROM $x(2); -- ... and with other argument
END DEFINE;
DEFINE SUBQUERY $sub($n) AS
SELECT $n * 10;
END DEFINE;
SELECT * FROM $dup($sub); -- pass the query template $sub as a parameter
-- Result:
-- 10
-- 20
/* Hide the named expressions $a and $b inside a separate scope */
DEFINE SUBQUERY $clean() AS
$a = 10;
$b = $a * $a;
SELECT $a AS a, $b AS b;
END DEFINE;
SELECT * FROM $clean(); -- a: 10, b: 100
USE my_cluster;
DEFINE SUBQUERY $input() as
SELECT * FROM `home/yql/tutorial/users`;
END DEFINE;
DEFINE SUBQUERY $myProcess1($nestedQuery, $lambda) AS
PROCESS $nestedQuery() -- the parentheses () are mandatory here
USING $lambda(TableRow());
END DEFINE;
$myProcess2 = ($world, $nestedQuery, $lambda) -> {
-- If you use ListFlatMap or YQL::OrderedFlatMap, you get an Ordered YT Map operation
return YQL::FlatMap($nestedQuery($world), $lambda);
};
-- With such use, the implementations of $myProcess1 and $myProcess2 are identical
SELECT * FROM $myProcess1($input, ($x) -> { RETURN AsList($x, $x) });
SELECT * FROM $myProcess2($input, ($x) -> { RETURN AsList($x, $x) });
USE my_cluster;
DEFINE SUBQUERY $runPartition($table) AS
$paritionByAge = ($row) -> {
$recordType = TypeOf($row);
$varType = VariantType(TupleType($recordType, $recordType));
RETURN If($row.age % 2 == 0,
Variant($row, "0", $varType),
Variant($row, "1", $varType),
);
};
PROCESS $table USING $paritionByAge(TableRow());
END DEFINE;
-- Unpacking two results
$i, $j = (PROCESS $runPartition("home/yql/tutorial/users"));
SELECT * FROM $i;
SELECT * FROM $j;
These functions combine the results of one or more subquery templates passed by arguments. The number of parameters in such subquery templates must be the same.
SubqueryExtend
requires matching of subquery schemas.SubqueryUnionAll
follows the same rules as ListUnionAll.SubqueryMerge
uses the same constraints as SubqueryExtend
and also outputs a sorted result if all subqueries have the same sort order.SubqueryUnionMerge
uses the same constraints as SubqueryUnionAll
and also outputs a sorted result if all subqueries have the same sort order.DEFINE SUBQUERY $sub1() as
SELECT 1 as x;
END DEFINE;
DEFINE SUBQUERY $sub2() as
SELECT 2 as x;
END DEFINE;
$s = SubqueryExtend($sub1,$sub2);
PROCESS $s();
The functions take the following arguments:
They substitute each item from the list into the subquery template as a parameter and then combine the obtained subqueries.
SubqueryExtendFor
requires matching of subquery schemas.SubqueryUnionAllFor
follows the same rules as ListUnionAll.SubqueryMergeFor
uses the same constraints as SubqueryExtendFor
and also outputs a sorted result if all subqueries have the same sort order.SubqueryUnionMergeFor
uses the same constraints as SubqueryUnionAllFor
and also outputs a sorted result if all subqueries have the same sort order.DEFINE SUBQUERY $sub($i) as
SELECT $i as x;
END DEFINE;
$s = SubqueryExtendFor([1,2,3],$sub);
PROCESS $s();
The functions take the following arguments:
And they build a new query template without parameters where sorting is performed or a comment on the use of sorting is added to the result. To use the resulting query template, call the PROCESS
function, since, when using a SELECT
, sorting is ignored.
DEFINE SUBQUERY $sub() as
SELECT * FROM (VALUES (1,'c'), (1,'a'), (3,'b')) AS a(x,y);
end define;
$sub2 = SubqueryOrderBy($sub, [('x',false), ('y',true)]);
PROCESS $sub2();