Databricks multilevel query on arbitrary condition - databricks-sql

I have top level data 'whatever' comprising many (thousands) of records. Each record has data of the form
structA
structB
...
structN,
varA
varB
...
varN
each structure may also contain other structures and variables e.g. structAA, structAB... varAA, varAB... etc
I would like to query these data and return records meeting a given criteria, for example
SELECT structA.structBC.structNN FROM whatever WHERE structB.structBN.varNN = "value"
Which would return structNN from each record where varNN equals "value"
Databricks will allow me to do something like
SELECT structA FROM whatever as a WHERE a.varN = "value"
but I cant seem to find out if/how one does more levels between SELECT and FROM or after WHERE. I dont particularly want the variable I select-on to be in the structure I return either, so that further complicates things. I suspect the data is accessed as a table which may be part of the problem i.e. I imagine the records as objects but they are columns and rows in some sense. If I cant do this with an SQL query, what can I use in Databricks to achieve the desired result?

it's pretty simple.
First, let's create a data with underlying structure, for example:
data = [{\
'level1a': {\
'level2a': {\
'a':1,\
'b':2\
}\
}\
,'level1b': {\
'level2b': 'some text'
}\
}]
df = spark.createDataFrame(data)
df.createOrReplaceTempView("data_view")
Now, we can select any element of the structure (or the whole or part of the structure):
%sql
select level1a, level1a.level2a, level1a.level2a.a, level1a.level2a.b, level1b, level1b.level2b from data_view;
Here is the result:
EDIT:
The query with condition:
%sql
select level1a, level1a.level2a, level1a.level2a.a, level1a.level2a.b, level1b, level1b.level2b
from data_view
where level1a.level2a.a = 1;

Related

PostgreSQL: json object where keys are unique array elements and values are the count of times they appear in the array

I have an array of strings, some of which may be repeated. I am trying to build a query which returns a single json object where the keys are the distinct values in the array, and the values are the count of times each value appears in the array.
I have built the following query;
WITH items (item) as (SELECT UNNEST(ARRAY['a','b','c','a','a','a','c']))
SELECT json_object_agg(distinct_values, counts) item_counts
FROM (
SELECT
sub2.distinct_values,
count(items.item) counts
FROM (
SELECT DISTINCT items.item AS distinct_values
FROM items
) sub2
JOIN items ON items.item = sub2.distinct_values
GROUP BY sub2.distinct_values, items.item
) sub1
DbFiddle
Which provides the result I'm looking for: { "a" : 4, "b" : 1, "c" : 2 }
However, it feels like there's probably a better / more elegant / less verbose way of achieving the same thing, so I wondered if any one could point me in the right direction.
For context, I would like to use this as part of a bigger more complex query, but I didn't want to complicate the question with irrelevant details. The array of strings is what one column of the query currently returns, and I would like to convert it into this JSON blob. If it's easier and quicker to do it in code then I can, but I wanted to see if there was an easy way to do it in postgres first.
I think a CTE and json_object_agg() is a little bit of a shortcut to get you there?
WITH counter AS (
SELECT UNNEST(ARRAY['a','b','c','a','a','a','c']) AS item, COUNT(*) AS item_count
GROUP BY 1
ORDER BY 1
)
SELECT json_object_agg(item, item_count) FROM counter
Output:
{"a":4,"b":1,"c":2}

Conditional WHERE clause in KDB?

Full Query:
{[tier;company;ccy; startdate; enddate] select Deal_Time, Deal_Date from DEALONLINE_REMOVED where ?[company = `All; 1b; COMPANY = company], ?[tier = `All;; TIER = tier], Deal_Date within(startdate;enddate), Status = `Completed, ?[ccy = `All;1b;CCY_Pair = ccy]}
Particular Query:
where ?[company = `All; 1b; COMPANY = company], ?[tier = `All; 1b; TIER = tier],
What this query is trying to do is to get the viewstate of a dropdown.
If there dropdown selection is "All", that where clause i.e. company or tier is invalidated, and all companies or tiers are shown.
I am unsure if the query above is correct as I am getting weird charts when displaying them on KDB dashboard.
What I would recommend is to restructure your function to make use of the where clause using functional qSQL.
In your case, you need to be able to filter based on certain input, if its "All" then don't filter else filter on that input. Something like this could work.
/Define sample table
DEALONLINE_REMOVED:([]Deal_time:10#.z.p;Deal_Date:10?.z.d;Company:10?`MSFT`AAPL`GOOGL;TIER:10?`1`2`3)
/New function which joins to where clause
{[company;tier]
wc:();
if[not company=`All;wc:wc,enlist (=;`Company;enlist company)];
if[not tier=`All;wc:wc,enlist (=;`TIER;enlist tier)];
?[DEALONLINE_REMOVED;wc;0b;()]
}[`MSFT;`2]
If you replace the input with `All you will see that everything is returned.
The full functional select for your query would be as follows:
whcl:{[tier;company;ccy;startdate;enddate]
wc:(enlist (within;`Deal_Date;(enlist;startdate;enddate))),(enlist (=;`Status;enlist `Completed)),
$[tier=`All;();enlist (=;`TIER;enlist tier)],
$[company=`All;()enlist (=;`COMPANY;enlist company)],
$[ccy=`All;();enlist (=;`CCY_Pair;enlist ccy)];
?[`DEALONLINE_REMOVED;wc;0b;`Deal_Time`Deal_Date!`Deal_Time`Deal_Date]
}
The first part specifies your date range and status = `Completed in the where clause
wc:(enlist (within;`Deal_Date;(enlist;startdate;enddate))),(enlist (=;`Status;enlist `Completed)),
Next each of these conditionals checks for `All for the TIER, COMPANY and CCY_Pair column filtering. It then joins these on to the where clause when a specific TIER, COMPANY or CCY_Pair are specified. (otherwise an empty list is joined on):
$[tier=`All;();enlist (=;`TIER;enlist tier)],
$[company=`All;();enlist (=;`COMPANY;enlist company)],
$[ccy=`All;();enlist (=;`CCY_Pair;enlist ccy)];
Finally, the select statement is called in its functional form as follows, with wc as the where clause:
?[`DEALONLINE_REMOVED;wc;0b;`Deal_Time`Deal_Date!`Deal_Time`Deal_Date]

Add a single field to model using raw SQL

I'm developing an extension for TYPO3 CMS 8.7.8. I'm using query->statement() to select all fields from a single table, plus 1 field from another table. I get a QueryResult with the proper models and I would like to have that 1 extra field added to them. Is that possible?
You can do SQL queries with the ->statement(...) method and in that, use normal JOIN commands
From the documentation
$result = $query->statement('SELECT * FROM tx_sjroffers_domain_model_offer
WHERE title LIKE ? AND organization IN ?', array('%climbing%', array(33,47)));
So you can do JOINs on whatever table you want to (also code from the documentation)
LEFT JOIN tx_blogexample_person
ON tx_blogexample_post.author = tx_blogexample_person.uid
But you will end up with the raw data from the mysql query. If you want to transform it into a object, use the Property Mapper
You can use JOIN in your sql statments like below.
$query = $this->createQuery();
$sql = 'SELECT single.*,another.field_anme AS fields_name
FROM
tx_single_table_name single
JOIN
tx_another_table_name another
ON
single.fields = another.uid
WHERE
O.deleted = 0
AND O.hidden=0
AND O.uid=' . $orderId;
return $query->statement($sql)->execute();

How to use select syntax for group by field which is array in Dynamics AX

I have field Value in table finStatementTrans which is array.
How should I write select syntax with group by and sum by this field?
while select finStatementTable join DataClassParagraph,sum(Value) from finStatementTrans
group by finStatementTrans.DataClassParagraph
where finStatementTable.RecId == finStatementTrans.FinStatementTable_FK
&& finStatementTable.FinStatementTableParent_FK == 5637569094
{
info(strFmt(%1,%2",finStatementTrans.DataClassParagraph,finStatementTrans.Value[1]));
}
Is this correct?
sum(Value[1])
with this I can't compile.
As Aliaksandr Maksimau mentioned in his comment, aggregating array fields is not possible. Aggregations are only supported for integer and real data type fields.
See also X++ data selection and manipulation, paragraph select statements, last sentence.

dynamic remove or add fields in a structure

how can I read consecutive structures from a file, when they have different fields, and create for each of them the appropriate fields (title: value)? I am a beginner. I think it is about dynamic adding new fields while reading i-th structure and dynamic removing the fields from the i-1 structure, which remained empty after reading a structure i. But how am I able to do it not knowing the names of all the fields before? For this I couldn't find example in documentation nor in the forum.
Thanks!
If some fields appear in every object, have them in a common structure that your array has instances of. For the variables fields, make a field "variable" or something in the main structure, and then dynamically assign field names and values within that structure. So for example, your structure might be:
a.name = 'Name1';
a.value = 'Value1';
a.variable.price = 50;
b.name = 'Name2';
b.value = 'Value2';
b.variable.year = 1996;
data(1) = a; data(2) = b;
where every object has fields "name" and "price" and object a has a price field but not a year field, and object b has a year field and no price field.
This will work for the kind of data you want to read in.