How can I extract returned lastUpdatedTime field from the following N1QL query in Couchbase version 7.0.3 and use it in an update query - nosql

I have the following query;
SELECT b.lastUpdatedTime
FROM `bag` b WHERE b.staticBar='ABC1234511991'
ORDER BY b.lastUpdatedTime DESC LIMIT 1
Which returns the following response
[
{
"lastUpdatedTime": 1672840089805
}
]
I would like to extract lastUpdatedTime field from the response array so that I can use that value in another query such as this;
UPDATE `bag` SET updated = true
WHERE staticBar='ABC1234511991'
AND lastUpdatedTime IN
(
SELECT lastUpdatedTime FROM `bag` AS bs
WHERE bs.staticBar='ABC1234511991'
ORDER BY bs.lastUpdatedTime DESC LIMIT 1
)
Write now the update query does not apply any update although the there is a data available.
I tried to use UNNEST and MERGE syntax but failed to achieve my goal

Use RAW in subquery so that it will generate ARRAY of values vs ARRAY of Objects.
CREATE INDEX ix1 ON bag(staticBar, lastUpdatedTime DESC);
UPDATE `bag` SET updated = true
WHERE staticBar='ABC1234511991'
AND lastUpdatedTime IN
(
SELECT RAW lastUpdatedTime FROM `bag` AS bs
WHERE bs.staticBar='ABC1234511991'
ORDER BY bs.lastUpdatedTime DESC LIMIT 1
);
Above one not optimal because right side of IN is subquery, so not able to push the predicate to indexer. So use following merge statement
MERGE INTO bag AS m USING (SELECT b.lastUpdatedTime
FROM `bag` b
WHERE b.staticBar='ABC1234511991'
ORDER BY b.lastUpdatedTime
DESC LIMIT 1) AS s
ON s.lastUpdatedTime = m.lastUpdatedTime
WHEN MATCHED THEN UPDATE SET m.updated = true;

Related

more than one row returned by a subquery used as an expression problem

I am trying to update a column in one database with a query:
Here the query
and this is the output i think it is impossible to asign a query to a field but what is the solution for that plz.
enter image description here
= can be used when we are pretty sure that the subquery returns only 1 value.
When we are not sure whether subquery returns more than 1 value, we will have to use IN to accommodate all values or simply use TOP 1 to limit the equality matching to one value:
UPDATE mascir_fiche SET partner = (SELECT TOP 1 id FROM hr_employee WHERE parent_id IN (SELECT id FROM hr_employee));
With Limit:
UPDATE mascir_fiche SET artner = (SELECT id FROM hr_employee WHERE parent_id IN (SELECT id FROM hr_employee) limit 1);

Aggregation on updating order data in Druid

I have streaming data using Kafka to Druid. It's an eCommerce de-normalized order event data where status and few fields get updated in every event.
I need to do aggregate query based on timestamp with the most updated entry only.
For example: If data sample is:
{"orderId":"123","status":"Initiated","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-05T01:02:33Z"}
{"orderId":"abc","status":"Initiated","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-05T01:03:33Z"}
{"orderId":"123","status":"Shipped","items":"item","qty":1,"paymentId":null,"shipmentId":null,timestamp:"2021-03-07T02:03:33Z"}
Now if I want to query on all orders stuck on "Initiated" status for more than 2 days then for above data it should only show orderId "abc".
But if I query something like
Select orderId,qty,paymentId from order where status = Initiated and WHERE "timestamp" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
This query will return both orders "123" and "abc", but 123 has another event received after 2 days so the previous events should not be included in result.
Is their any good and optimized way to handle this kind of scenarios in Apache druid?
One way I was thinking to use a separate lookup table to store orderId and latest status and perform a join with this lookup and above aggregation query on orderId and status
EDIT 1:
This query works but it joins on whole table, which can give resource limit exception for big datasets:
WITH maxOrderTime (orderId, "__time") AS
(
SELECT orderId, max("__time") FROM inline_data
GROUP BY orderId
)
SELECT inline_data.orderId FROM inline_data
JOIN maxOrderTime
ON inline_data.orderId = maxOrderTime.orderId
AND inline_data."__time" = maxOrderTime."__time"
WHERE inline_data.status='Initiated' and inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
EDIT 2:
Tried with:
SELECT
inline_data.orderID,
MAX(LOOKUP(status, 'status_as_number')) as last_status
FROM inline_data
WHERE
inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
GROUP BY inline_data.orderID
HAVING last_status = 1
But gives this error:
Error: Unknown exception
Error while applying rule DruidQueryRule(AGGREGATE), args
[rel#1853:LogicalAggregate.NONE.,
rel#1863:DruidQueryRel.NONE.[](query={"queryType":"scan","dataSource":{"type":"table","name":"inline_data"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/2021-03-14T09:57:05.000Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"lookup("status",'status_as_number')","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"order":"none","filter":null,"columns":["orderID","v0"],"legacy":false,"context":{"sqlOuterLimit":100,"sqlQueryId":"fbc167be-48fc-4863-b3a8-b8a7c45fb60f"},"descending":false,"granularity":{"type":"all"}},signature={orderID:LONG,
v0:STRING})]
java.lang.RuntimeException
I think this can be done easier. If you replace the status to a numeric representation, you can use it more easy.
First use an inline lookup to replace the status. See this page how to define a lookup: https://druid.apache.org/docs/0.20.1/querying/lookups.html
Now, we have for example these values in a lookup named status_as_number:
Initiated = 1
Shipped = 2
Since we now have a numeric representation, you can simply do a group by query and see the max status number. A query like this would be sufficient:
SELECT
inline_data.orderId,
MAX(LOOKUP(status, 'status_as_number')) as last_status
FROM inline_data
WHERE
inline_data."__time" < TIMESTAMPADD(DAY, -2, CURRENT_TIMESTAMP)
GROUP BY inline_data.orderId
HAVING last_status = 1
Note: this query is not tested. The HAVING part makes sure that you only see orders which are Initiated.
I hope this solves your problem.

If I can't use aggregate in a where clause, how to get results

Ok I have a query where I need to ommit the result if the first value of an array_agg = natural so I thought I can do this:
select
visitor_id,
array_agg(code
order by session_start) codes_array
from mark_conversion_sessions
where conv_visit_num2 < 2
and max_conv = 1
and (array_agg(code
order by session_start))[1] != 'natural'
group by visitor_id
But when I run this I get the error:
ERROR: aggregate functions are not allowed in WHERE
LINE 31: and (array_agg(code
So is there a way I can reference that array_agg in the where clause?
Thank you
The having clause is used to act like a where clause on grouped data. Move the criteria that is using aggregates into the having clause, eg:
select
visitor_id,
array_agg(code order by session_start) codes_array
from mark_conversion_sessions
where
conv_visit_num2 < 2
and max_conv = 1
group by visitor_id
having
(array_agg(code order by session_start))[1] != 'natural'
docs:
https://www.postgresql.org/docs/9.6/static/tutorial-agg.html

comprare aggregate sum function to number in postgres

I have the next query which does not work:
UPDATE item
SET popularity= (CASE
WHEN (select SUM(io.quantity) from item i NATURAL JOIN itemorder io GROUP BY io.item_id) > 3 THEN TRUE
ELSE FALSE
END);
Here I want to compare each line of inner SELECT SUM value with 3 and update popularity. But SQL gives error:
ERROR: more than one row returned by a subquery used as an expression
I understand that inner SELECT returns many values, but can smb help me in how to compare each line. In other words make loop.
When using a subquery you need to get a single row back, so you're effectively doing a query for each record in the item table.
UPDATE item i
SET popularity = (SELECT SUM(io.quantity) FROM itemorder io
WHERE io.item_id = i.item_id) > 3;
An alternative (which is a postgresql extension) is to use a derived table in a FROM clause.
UPDATE item i2
SET popularity = x.orders > 3
FROM (select i.item_id, SUM(io.quantity) as orders
from item i NATURAL JOIN itemorder io GROUP BY io.item_id)
as x(item_id,orders)
WHERE i2.item_id = x.item_id
Here you're doing a single group clause as you had, and we're joining the table to be updated with the results of the group.

Manipulate & use the results of UPDATE .... RETURNING

Here is a simple PostgreSQL update returning some data:
UPDATE table set num = num + 1
WHERE condition = true
RETURNING table.id, table.num
Is there a way to further use the returned results, as if they came from a select statement? Something like this:
INSERT into stats
(id, completed)
SELECT c.id, TRUE
FROM
(
UPDATE table set num = num + 1
WHERE condition = true
RETURNING table.id, table.num
) c
where c.num > 5
Or do I have to save the returned results into my application, then create a new query out of the returned results?
As of version 9.1, you can use an UPDATE ... RETURNING in a "Common Table Expression" ("CTE"), which for most purposes can be thought of as a named sub-query.
So for your purposes, you could use something like this:
WITH update_result AS
(
UPDATE table set num = num + 1
WHERE condition = true
RETURNING table.id, table.num
)
INSERT into stats
(id, completed)
SELECT c.id, TRUE
FROM update_result as c
WHERE c.num > 5
If you're using a version of Postgres below 9.1, then I think you will have to grab the result into a variable in some procedural code - either your application, or a database function (probably written in PL/pgSQL).
That syntax won't work (unfortunately! that would be convenient).
Either you update and then create another query, or you do everything in a stored procedure where you can safely store and handle query resuts, so that you just have one single database call from your application.