I am trying to update a column in one database with a query:
Here the query
and this is the output i think it is impossible to asign a query to a field but what is the solution for that plz.
enter image description here
= can be used when we are pretty sure that the subquery returns only 1 value.
When we are not sure whether subquery returns more than 1 value, we will have to use IN to accommodate all values or simply use TOP 1 to limit the equality matching to one value:
UPDATE mascir_fiche SET partner = (SELECT TOP 1 id FROM hr_employee WHERE parent_id IN (SELECT id FROM hr_employee));
With Limit:
UPDATE mascir_fiche SET artner = (SELECT id FROM hr_employee WHERE parent_id IN (SELECT id FROM hr_employee) limit 1);
I have streaming data using Kafka to Druid. It's an eCommerce de-normalized order event data where status and few fields get updated in every event.
I need to do aggregate query based on timestamp with the most updated entry only.
For example: If data sample is:
{"orderId":"123","status":"Initiated","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-05T01:02:33Z"}
{"orderId":"abc","status":"Initiated","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-05T01:03:33Z"}
{"orderId":"123","status":"Shipped","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-07T02:03:33Z"}
Now if I want to query on all orders stuck on "Initiated" status for more than 2 days then for above data it should only show orderId "abc".
But if I query something like
Select orderId,qty,paymentId from order where status = Initiated and WHERE "timestamp" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
This query will return both orders "123" and "abc", but 123 has another event received after 2 days so the previous events should not be included in result.
Is their any good and optimized way to handle this kind of scenarios in Apache druid?
One way I was thinking to use a separate lookup table to store orderId and latest status and perform a join with this lookup and above aggregation query on orderId and status
EDIT 1:
This query works but it joins on whole table, which can give resource limit exception for big datasets:
WITH maxOrderTime (orderId, "__time") AS
(
SELECT orderId, max("__time") FROM inline_data
GROUP BY orderId
)
SELECT inline_data.orderId FROM inline_data
JOIN maxOrderTime
ON inline_data.orderId = maxOrderTime.orderId
AND inline_data."__time" = maxOrderTime."__time"
WHERE inline_data.status='Initiated' and inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
EDIT 2:
Tried with:
SELECT
inline_data.orderID,
MAX(LOOKUP(status, 'status_as_number')) as last_status
FROM inline_data
WHERE
inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
GROUP BY inline_data.orderID
HAVING last_status = 1
But gives this error:
Error: Unknown exception
Error while applying rule DruidQueryRule(AGGREGATE), args
[rel#1853:LogicalAggregate.NONE.,
rel#1863:DruidQueryRel.NONE.[](query={"queryType":"scan","dataSource":{"type":"table","name":"inline_data"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/2021-03-14T09:57:05.000Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"lookup("status",'status_as_number')","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"order":"none","filter":null,"columns":["orderID","v0"],"legacy":false,"context":{"sqlOuterLimit":100,"sqlQueryId":"fbc167be-48fc-4863-b3a8-b8a7c45fb60f"},"descending":false,"granularity":{"type":"all"}},signature={orderID:LONG,
v0:STRING})]
java.lang.RuntimeException
I think this can be done easier. If you replace the status to a numeric representation, you can use it more easy.
First use an inline lookup to replace the status. See this page how to define a lookup: https://druid.apache.org/docs/0.20.1/querying/lookups.html
Now, we have for example these values in a lookup named status_as_number:
Initiated = 1
Shipped = 2
Since we now have a numeric representation, you can simply do a group by query and see the max status number. A query like this would be sufficient:
SELECT
inline_data.orderId,
MAX(LOOKUP(status, 'status_as_number')) as last_status
FROM inline_data
WHERE
inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
GROUP BY inline_data.orderId
HAVING last_status = 1
Note: this query is not tested. The HAVING part makes sure that you only see orders which are Initiated.
I hope this solves your problem.
Ok I have a query where I need to ommit the result if the first value of an array_agg = natural so I thought I can do this:
select
visitor_id,
array_agg(code
order by session_start) codes_array
from mark_conversion_sessions
where conv_visit_num2 < 2
and max_conv = 1
and (array_agg(code
order by session_start))[1] != 'natural'
group by visitor_id
But when I run this I get the error:
ERROR: aggregate functions are not allowed in WHERE
LINE 31: and (array_agg(code
So is there a way I can reference that array_agg in the where clause?
Thank you
The having clause is used to act like a where clause on grouped data. Move the criteria that is using aggregates into the having clause, eg:
select
visitor_id,
array_agg(code order by session_start) codes_array
from mark_conversion_sessions
where
conv_visit_num2 < 2
and max_conv = 1
group by visitor_id
having
(array_agg(code order by session_start))[1] != 'natural'
docs:
https://www.postgresql.org/docs/9.6/static/tutorial-agg.html
I have the next query which does not work:
UPDATE item
SET popularity= (CASE
WHEN (select SUM(io.quantity) from item i NATURAL JOIN itemorder io GROUP BY io.item_id) > 3 THEN TRUE
ELSE FALSE
END);
Here I want to compare each line of inner SELECT SUM value with 3 and update popularity. But SQL gives error:
ERROR: more than one row returned by a subquery used as an expression
I understand that inner SELECT returns many values, but can smb help me in how to compare each line. In other words make loop.
When using a subquery you need to get a single row back, so you're effectively doing a query for each record in the item table.
UPDATE item i
SET popularity = (SELECT SUM(io.quantity) FROM itemorder io
WHERE io.item_id = i.item_id) > 3;
An alternative (which is a postgresql extension) is to use a derived table in a FROM clause.
UPDATE item i2
SET popularity = x.orders > 3
FROM (select i.item_id, SUM(io.quantity) as orders
from item i NATURAL JOIN itemorder io GROUP BY io.item_id)
as x(item_id,orders)
WHERE i2.item_id = x.item_id
Here you're doing a single group clause as you had, and we're joining the table to be updated with the results of the group.
Here is a simple PostgreSQL update returning some data:
UPDATE table set num = num + 1
WHERE condition = true
RETURNING table.id, table.num
Is there a way to further use the returned results, as if they came from a select statement? Something like this:
INSERT into stats
(id, completed)
SELECT c.id, TRUE
FROM
(
UPDATE table set num = num + 1
WHERE condition = true
RETURNING table.id, table.num
) c
where c.num > 5
Or do I have to save the returned results into my application, then create a new query out of the returned results?
As of version 9.1, you can use an UPDATE ... RETURNING in a "Common Table Expression" ("CTE"), which for most purposes can be thought of as a named sub-query.
So for your purposes, you could use something like this:
WITH update_result AS
(
UPDATE table set num = num + 1
WHERE condition = true
RETURNING table.id, table.num
)
INSERT into stats
(id, completed)
SELECT c.id, TRUE
FROM update_result as c
WHERE c.num > 5
If you're using a version of Postgres below 9.1, then I think you will have to grab the result into a variable in some procedural code - either your application, or a database function (probably written in PL/pgSQL).
That syntax won't work (unfortunately! that would be convenient).
Either you update and then create another query, or you do everything in a stored procedure where you can safely store and handle query resuts, so that you just have one single database call from your application.