How to sort by a calculated column in PostgreSQL? - postgresql

We have a number of fields in our offers table that determine an offer's price, however one crucial component for the price is an exchange rate fetched from an external API. We still need to sort offers by actual current price.
For example, let's say we have two columns in the offers table: exchange and premium_percentage. The exchange is the name of the source for the exchange rate to which an external request will be made. premium_percentage is set by the user. In this situation, it is impossible to sort offers by current price without knowing the exchange rate, and that maybe different depending on what's in the exchange column.
How would one go about this? Is there a way to make Postgres calculate current price and then sort offers by it?

SELECT
product_id,
get_current_price(exchange) * (premium_percentage::float/100 + 1) AS price
FROM offers
ORDER BY 2;
Note the ORDER BY 2 to sort by the second ordinal column.
You can instead repeat the expression you want to sort by in the ORDER BY clause. But that can result in multiple evaluation.
Or you can wrap it all in a subquery so you can name the output columns and refer to them in other clauses.
SELECT product_id, price
FROM
(
SELECT
product_id,
get_current_price(exchange) * (premium_percentage::float/100 + 1)
FROM offers
) product_prices(product_id, price)
ORDER BY price;

Related

How to select max date value while selecting max value

I have the following sample from a table with students results with date for a school entry exam
First student passed exam - This is the most common record found for most students
Second student failed 1st time entry and passed second time based on the date
3rd student had a failed input entry and was corrected based on the Version
I need the results to like like the picture above, so we take into regard using the latest date and highest version!
My basic query thus far is
select studentid
,examdate --(Date)
,result -- (charvar)
from StudentEntryExam
How should I approach this issue?
demo:db<>fiddle
SELECT DISTINCT ON (studentid)
*
FROM mytable
ORDER BY studentid, examdate DESC, version DESC
DISTINCT ON returns the first record of an ordered group. In that case the groups are the studentids. You must find the correct order to set the required record first. So, you need to order by studentid, of course. Then you need the most recent examdate first, which can be achieved with DESC order. If there are two records on the same date, you need to order the highest version first as well using the DESC modifier, too.

Calculate sum of unique tag values

sometimes it appears that one have to calculate SUM of unique TAG values in InfluxDB. How to do it?
For example I have multiple users who downloads software. Now I want to extract how many unique users downloaded it.
Following query was tested in Grafana to calculate unique users and also consider time filter applied to the database.
To do this we need to apply subquery first to calculate mean values, this basically will result a table with value 1 associated for each user:
SELECT mean("count") FROM "autogen"."downloads" WHERE $timeFilter GROUP BY "username"
Here count is an integer value that is equal to 1 for each time user downloads the software.
After we can calculate sum of these mean values, yes, this is not cheap if you have a huge database, but still is a working solution:
SELECT SUM(mean) FROM (
SELECT mean("count") FROM "autogen"."downloads" WHERE $timeFilter GROUP BY "username"
)
Please go ahead and propose best performing/more native solution, this will be nice to apply for larger DBs

KDB - Filter List Column Based on Another Column

I'm struggling with eliminating data from my query. I have attached a picture with my data results (data itself is too large and has customer info so I can't include). I have two tables that I'm joining by SKU to show when we enter a SKU into the system and when we sell it. We reuse SKUs based on vendors which isn't the best practice but is currently a necessity. What I'd like to do is eliminate the InvoiceDates where InvoiceDate < TransferDate. So in the InvoiceDate column it would only show the highlighted yellow dates for the first few rows.
Please let me know if you have any questions and thanks for the help!
This would work:
q) update InvoiceDate:{x where x >= y}'[InvoiceDate;TransferDate] from tbl
Explanation:
Above query uses 'each-both(') function to iterate over InvoiceDate and TransferDate values pair wise(indirectly row wise), pass each pair to lambda function as 'x' and 'y' and then select 'x'(InvoiceDate) which are >= 'y'(TransferDate)
You question is cut off, but I'm guessing you want to filter on whether a particular date is in your invoiceDate lists. You can do this as follows:
q)select from tbl where in[2019.01.01;] each invoiceDate
If this isn't what you are looking for, please clarify above with an example

kdb/q: use function in a select from partitioned table

I'm trying to get max drawdown from a partitioned table across multiple dates. The query works fine when run with a date constrained to a specific day. E.g.
select {max neg x-maxs x} pnl from trades where date=last date
It's getting map-reduced over multiple dates so the above query no longer works. I can make the query run over multiple dates by adding another aggregation:
select max {max neg x-maxs x} pnl from trades
but it's not getting the max drawdown from continuous sequence of trades but a maximum of daily drawdowns.
I wonder if there's a way to make it work with a single select without chaining selects like
select {max neg x-maxs x} pnl from select pnl from trades
I've got a rather big query to pull a lot of various metrics on the trades where max drawdown is just one of them. Using chained select means that I need to break the big query into two queries, map-reduced and non-map-reduced, and then join them back which would make the query look ugly.
Thanks!
Select query runs on each date in partition db and apply function to each date values and finally aggregates them depending upon the call (user defined function behaves differently than plain 'q' functions).
So I don't think you can combine that into one query. But there are ways you can look for to make your query more generalized and reusable for different scenarios.
For ex. convert your query to functional form and use variables in that query for column name and user function. Put this in one function which will accept column name and user function. Now you can call this function with different set of (column ;function). Something like :
runF:{[col;usrfunc] funtional_query_uses_col_userfunc }
All this depends on your use cases. Also check for memory usage as you'll be taking lot of data into memory.

Greatest n per group with multiple criteria for greatest

I need to select the largest, most recent or currently active term across a number of schools, with the assumption that is possible for a school to have multiple concurrent terms (ie, one term that honors students are registered in, and another for non honors). Also need to take into account the end date, as the honors term may have the same start date but may be year long instead of just a semester, and I want the semester.
Code looks something like this:
SELECT t.school_id, t.term_id, COUNT(s.id) AS size, t.start_date, t.end_date
FROM term t
INNER JOIN students s ON t.term_id = s.term_id
WHERE t.school_id = (some school id)
GROUP BY t.school_id, t.term_id
ORDER BY t.start_date DESC, t.end_date ASC, size DESC LIMIT 1;
This works perfectly to find the largest currently or most recently active term, but I want to be able to eliminate the WHERE t.school_id = (some school id) part.
A standard greatest n per group can easily choose the largest OR most recent term, but I need to select the most recent term that ends soonest with the largest number of students.
Not sure I am interpreting your question correctly. Would be easier if you had supplied table definitions including primary and foreign keys.
If you want the the most recent term that ends soonest with the largest number of students per school, this might do it:
SELECT DISTINCT ON (t.school_id)
t.school_id, t.term_id, s.size, t.start_date, t.end_date
FROM term t
JOIN (
SELECT term_id, COUNT(s.id) AS size
FROM students
GROUP BY term_id
) s USING (term_id)
ORDER BY t.school_id, t.start_date DESC, t.end_date, size DESC;
More explanation for DISTINCT ON in this related answer:
Select first row in each GROUP BY group?