Issue with using percentile_cont function in Postgresql - postgresql

This is my table
ID Total
1 2019.21
3 87918.32
2 562900.3
3 982688.98
1 56788.34
2 56792.32
3 909728.23
Now I would like to find the 25th,50th,75th,90th and 100th percentile of the values (Total) in the above Table. Assume my table consists of Whole Lot of data (some 2 Million Records of the same format) . I've Used the Following code :
CODE :
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY Total) as disc_func
FROM my_table
The Error I've come across :
ERROR: syntax error at or near "("
LINE 3: percentile_disc(0.5) WITHIN GROUP (ORDER BY total...

You use PostgreSQL < 9.4 . It does not support WITHIN GROUP
https://www.postgresql.org/docs/9.4/static/functions-aggregate.html
https://www.postgresql.org/docs/9.3/static/functions-aggregate.html

Related

PostgreSQL how do I COUNT with a condition?

Can someone please assist with a query I am working on for school using a sample database from PostgreSQL tutorial? Here is my query in PostgreSQL that gets me the raw data that I can export to excel and then put in a pivot table to get the needed counts. The goal is to make a query that counts so I don't have to do the manual extraction to excel and subsequent pivot table:
SELECT
i.film_id,
r.rental_id
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
ORDER BY film_id, rental_id
;
From the database this gives me a list of films (by film_id) showing each time the film was rented (by rental_id). That query works fine if just exporting to excel. Since we don't want to do that manual process what I need is to add into my query how to count how many times a given film (by film_id) was rented. The results should be something like this (just showing the first five here, the query need not do that):
film_id | COUNT of rental_id
1 | 23
2 | 7
3 | 12
4 | 23
5 | 12
Database setup instructions can be found here: LINK
I have tried using COUNTIF and CASE (following other posts here) and I can't get either to work, please help.
Did you try this?:
SELECT
i.film_id,
COUNT(1)
FROM
rental as r
INNER JOIN inventory as i ON i.inventory_id = r.inventory_id
GROUP BY i.film_id
ORDER BY film_id;
If there can be >1 rental_id in your data you may want to use COUNT(DISTINCT r.rental_id)

'select count' returns "OS reports: No such file or directory"

I get this
"./2017.10.14/:./2017.10.15/tableName. OS reports: No such file or directory"
when I try to do
select count i by date from tableName where date within(.z.d-5;.z.d)
works if I do
select colName from trackFeedStats where date within(.z.d-5;.z.d)
I assume it's one of the columns acting strange ...
UPDATE: the issue seems to be mainly when using count i by colName1,colName2,colName3
UPDATE: I checked permissions and everything seems to be alright, table is in the given partition (2017.10.14), no symlinks
UPDATE: I am looking for suggestions on fixing the db. the query is not that important
This can occur for a number of reasons:
The file/directory legitimately isn't there. Have you checked the database directories to see if the table is in each date slice? If not you should look into .Q.chk for filling missing dates.
You can also get this error if you don't have permission to read the files/directories in that database. They could have been written by a different user etc.
I think 'count i' could be the culprit here. It behaves strangely when applied against partitioned databases.
Although you have specified a list of dates in your where clause, under the hood, kdb is executing 'count i' against all table partitions in the database.
For reference, .Q.pn maintains partition counts -
q).Q.pn
tabA |
tabB |
but, as demonstrated here, 'count i' will execute against all date partitions.
q)select count i from tabA where date within 2020.10.10 2020.09.12;
q).Q.pn
tabA | 8001 7998 8101 0 0 8002 8102 7940 7999 0 0 0 0 0 0 0 0 0 0 0 0
tabB | ()
For some reason, .Q.pn doesn't update if you add an additional constraint to your where clause.
q)select count i from tabA where date within 2020.10.10 2020.09.12,price>0;
q).Q.pn
tabA |
tabB |
and the query will actually run. For example, in this database, there are some empty partitions for tabB.
q)select count i from tabB where date within .z.d + -7 0
'./2020.09.30/tabB. OS reports: No such file or directory
but, if we add some other constraint to the where clause, the query will run as we want it to.
q)select count i from tabB where date within .z.d + -7 0,not null sym
x
-----
16948
Running 'count' against a column will work.
q)select count sym from tabB where date within .z.d + -7 0
x
-----
16948
As a workaround, you could update your query to perform the 'count' against some key column, if one exists. Alternatively, you could fill in the missing partitions, by running .Q.chk - https://code.kx.com/q/ref/dotq/#qchk-fill-hdb

Filter a value relevant to the maximum field

Here is my detail field with Order number and Amount.
Order Number Amount
2 3450
4 2300
8 4500
3 5100
Here the latest order is the maximum order number and I need to show it as follows in the report but not all these other records. So here I need to pick up the maximum order number and the relevant value for it. Help please.
Order Number Amount
8 4500
There are many ways to solve this one of the way is to use SQL Expression Fields.
Create a new SQL experssion field and write below formula
DB2 syntax
Select order number,amount from orders order by order number desc fetch first row only
oracle syntax:
SELECT order number,amount FROM (
select order number,amount ,ROW_NUMBER () OVER (ORDER BY order number DESC) RowNo from orders)
WHERE ROWNO<2
Now drag this to detail section.
Note: Above syntax is for DB2 if you are using oracle syntax will change..Let me know if you are using other than DB2 database

Postgres Crosstab Dynamic Number of Columns

In Postgres 9.4, I have a table like this:
id extra_col days value
-- --------- --- -----
1 rev 0 4
1 rev 30 5
2 cost 60 6
i want this pivoted result
id extra_col 0 30 60
-- --------- -- -- --
1 rev 4 5
2 cost 6
this is simple enough with a crosstab.
but i want the following specifications:
day column will be dynamic. sometimes increments of 1,2,3 (days), 0,30,60 days (accounting months), and sometimes in 360, 720 (accounting years).
range of days will be dynamic. (e.g., 0..500 days versus 1..10 days).
the first two columns are static (id and extra_col)
The return type for all the dynamic columns will remain the same type (in this example, integer)
Here are the solutions I've explored, none of which work for me for the following reasons:
Automatically creating pivot table column names in PostgreSQL -
requires two trips to the database.
Using crosstab_hash - is not dynamic
From all the solutions I've explored, it seems the only one that allows this to occur in one trip to the database requires that the same query be run three times. Is there a way to store the query as a CTE within the crosstab function?
SELECT *
FROM
CROSSTAB(
--QUERY--,
$$--RUN QUERY AGAIN TO GET NUMBER OF COLUMNS--$$
)
as ct (
--RUN QUERY AGAIN AND CREATE STRING OF COLUMNS WITH TYPE--
)
Every solution based on any buildin functionality needs to know a number of output columns. The PostgreSQL planner needs it. There is workaround based on cursors - it is only one way, how to get really dynamic result from Postgres.
The example is relative long and unreadable (the SQL really doesn't support crosstabulation), so I will not to rewrite code from blog here http://okbob.blogspot.cz/2008/08/using-cursors-for-generating-cross.html.

Equivalent of LIMIT for DB2

How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)
Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;
Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only
Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki
Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.
#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.
Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000
The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.
You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.
There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.
Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.