I have a stored function that calculates the status of an entity (this status is NOT an attribute of the entity but needs to be calculated).
Within a SELECT statement, I want to know how many elements have which status.
For now, I got the following:
SELECT my_function(entity_id) AS status,
COUNT(*)
FROM some_table
GROUP BY my_function(entity_id) -- entity_id is a column of some_table
ORDER BY status;
This works but I find it quite ugly to repeat the function call within the GROUP BY statement.
The code above is just an example. The actual function is defined within a package and has several input parameters.
Is it possible to do this without refering to the stored function within the GROUP BY statement?
You can wrap your query in an external one to isolate the function call and use an alias:
select count(*),
status
from (
select s.*,
my_function(entity_id) as status
from some_Table s
)
group by status
order by status
Related
Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html
I have an app that vends a 'code' to users through an api. A code belongs to a pool of codes that when a user hits an endpoint, he/she will get a code from this 'pool'. At the moment there is only 1 'pool' of codes where a code can be vended. That idea is best expressed below in the following sql.
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
SELECT "codes"."id"
FROM "codes"
INNER JOIN "code_batches" ON "code_batches"."id" = "codes"."code_batch_id"
WHERE "codes"."vended_at" IS NULL
AND "code_batches"."active" = true
ORDER BY "code_batches"."end_at" ASC
FOR UPDATE OF "codes" SKIP LOCKED
LIMIT 1
)
RETURNING *;
SQL
So basically, when the end point is pinged, I am returning a code that is active and its vended_at field is NULL.
Now what I need to do is to build off of this sql so that a user can get a code from this pool or from a second pool. So for example, lets say that if the user couldn't get a code from this pool (we will call it A represented by the above sql), I need to vend a code from another pool (we will call it B).
I looked up the documentation of postgresql and I think what I want to do is to either 1). Use a UNION somehow to combine pools A and B into one megapool to vend a code or if I can't vend a code through pool A, use postgresql's OR clause to select from pool B.
The problem is that I can't seem to be able to use either of these syntaxes. I've tried something along the lines like this, tweaking it with different variations.
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
SELECT "codes"."id"
FROM "codes"
INNER JOIN "code_batches" ON "code_batches"."id" = "codes"."code_batch_id"
WHERE "codes"."vended_at" IS NULL
AND "code_batches"."active" = true
ORDER BY "code_batches"."end_at" ASC
FOR UPDATE OF "codes" SKIP LOCKED
LIMIT 1
) UNION (
######## SELECT SOME OTHER STUFF #########
)
RETURNING *;
SQL
or
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
SELECT "codes"."id"
FROM "codes"
INNER JOIN "code_batches" ON "code_batches"."id" = "codes"."code_batch_id"
WHERE "codes"."vended_at" IS NULL
AND "code_batches"."active" = true
ORDER BY "code_batches"."end_at" ASC
FOR UPDATE OF "codes" SKIP LOCKED
LIMIT 1
) OR (
######## SELECT SOME OTHER STUFF USING OR #########
)
RETURNING *;
SQL
So far the syntax is off and I'm starting to wonder if I can even use this approach for what I'm trying to do. I can't determine if my approach is wrong or if maybe I am using UNION, OR, and SUB-SELECTS wrong. Does anyone have any advice I can try to accomplish my goal? Thank you.
####### EDIT ########
To illustrate and make the concept even easier, I essentially want to do this.
<<-SQL
UPDATE codes SET vended_at = NOW()
WHERE id = (
CRITERIA 1
)
OR/UNION
(
CRITERIA 2
)
RETURNING *;
SQL
Use one table to store both pools.
Add a pool_number column to the codes table to indicate which pool the code is in, then just add
ORDER BY pool_number
to your existing query.
I'm researching a dataset.
And I just wonder if there is a way to order like below in 1 query
Select * From MyTable where name ='international%' order by id
Select * From MyTable where name != 'international%' order by id
So first showing all international items, next by names who dont start with international.
My question is not about adding columns to make this work, or use multiple DB's, or a largerTSQL script to clone a DB into a new order.
I just wonder if anything after 'Where or order by' can be tricked to do this.
You can use expressions in the ORDER BY:
Select * From MyTable
order by
CASE
WHEN name like 'international%' THEN 0
ELSE 1
END,
id
(From your narrative, it also sounded like you wanted like, not =, so I changed that too)
Another way (slightly cleaner and a tiny bit faster)
-- Sample Data
DECLARE #mytable TABLE (id INT IDENTITY, [name] VARCHAR(100));
INSERT #mytable([name])
VALUES('international something' ),('ACME'),('international waffles'),('ABC Co.');
-- solution
SELECT t.*
FROM #mytable AS t
ORDER BY -PATINDEX('international%', t.[name]);
Note too that you can add a persisted computed column for -PATINDEX('international%', t.[name]) to speed things up.
I am using following code to insert date by Table Valued Parameter in my SP. Actually it works when one record exists in my TVP but when it has more than one record it raises the following error :
'Violation of Primary key constraint 'PK_ReceivedCash''. Cannot insert duplicate key in object 'Banking.ReceivedCash'. The statement has been terminated.
insert into banking.receivedcash(ReceivedCashID,Date,Time)
select (select isnull(Max(ReceivedCashID),0)+1 from Banking.ReceivedCash),t.Date,t.Time from #TVPCash as t
Your query is indeed flawed if there is more than one row in #TVPCash. The query to retrieve the maximum ReceivedCashID is a constant, which is then used for each row in #TVPCash to insert into Banking.ReceivedCash.
I strongly suggest finding alternatives rather than doing it this way. Multiple users might run this query and retrieve the same maximum. If you insist on keeping the query as it is, try running the following:
insert into banking.receivedcash(
ReceivedCashID,
Date,
Time
)
select
(select isnull(Max(ReceivedCashID),0) from Banking.ReceivedCash)+
ROW_NUMBER() OVER(ORDER BY t.Date,t.Time),
t.Date,
t.Time
from
#TVPCash as t
This uses ROW_NUMBER to count the row number in #TVPCash and adds this to the maximum ReceivedCashID of Banking.ReceivedCash.
I'm getting an error running this query
SELECT date(updated_at), count(updated_at) as total_count
FROM "persons"
WHERE ("persons"."updated_at" BETWEEN '2012-10-17 00:00:00.000000' AND '2012-11-07 12:25:04.082224')
GROUP BY date(updated_at)
ORDER BY persons.updated_at DESC
I get the error ERROR: column "persons.updated_at" must appear in the GROUP BY clause or be used in an aggregate function LINE 5: ORDER BY persons.updated_at DESC
This works if I remove the date( function from the group by call, however I'm using the date function because i want to group by date, not datetime
any ideas
At the moment it is unclear what you want Postgres to return. You say it should order by persons.updated_at but you do not retrieve that field from the database.
I think, what you want to do is:
SELECT date(updated_at), count(updated_at) as total_count
FROM "persons"
WHERE ("persons"."updated_at" BETWEEN '2012-10-17 00:00:00.000000' AND '2012-11-07 12:25:04.082224')
GROUP BY date(updated_at)
ORDER BY count(updated_at) DESC -- this line changed!
Now you are explicitly telling the DB to sort by the resulting value from the COUNT-aggregate. You could also use: ORDER BY 2 DESC, effectively telling the database to sort by the second column in the resultset. However I highly prefer explicitly stating the column for clarity.
Note that I'm currently unable to test this query, but I do think this should work.
the problem is that, because you are grouping by date(updated_at), the value for updated_at may not be unique, different values of updated_at can return the same value for date(updated_at). You need to tell the database which of the possible values it should use, or alternately use the value returned by the group by, probably one of
SELECT date(updated_at) FROM persons GROUP BY date(updated_at)
ORDER BY date(updated_at)
or
SELECT date(updated_at) FROM persons GROUP BY date(updated_at)
ORDER BY min(updated_at)