distinct key word only for one column - postgresql

I'm using postgresql as my database, I'm stuck with getting desired results with a query,
what I have in my table is something like following,
nid date_start date_end
1 20 25
1 20 25
2 23 26
2 23 26
what I want is following
nid date_start date_end
1 20 25
2 23 26
for that I used SELECT DISTINCT nid,date_start,date_end from table_1 but this result duplicate entries, how can I get distinct nid s with corresponding date_start and date_end?
can anyone help me with this?
Thanks a lot!

Based on your sample data and sample output, your query should work fine. I'll assume your sample input/output is not accurate.
If you want to get distinct values of a certain column, along with values from other corresponding columns, then you need to determine WHICH value from the corresponding columns to display (your question and query would otherwise not make sense). For this you need to use aggregates and group by. For example:
SELECT
nid,
MAX(date_start),
MAX(date_end)
FROM
table_1
GROUP BY
nid

That query should work unless you are selecting more columns.
Or maybe you are getting the same nid with a different start and/or end date

Try distinct on:
select distinct on (col1) col1, col2 from table;

DISTINCT can't result in duplicate entries - that's what it does... removed duplicates.
Is your posted data is incorrect? Exactly what are your data and output?

Related

How to aggregate results based on all distinct combinations in a column?

I have a table where I try to aggreate results (Sum) based on all possible combinations of Product_Ids per Order (Order_Id). Anybody that can guide me here?
I'm a bit lost here, but I have tried to group the different combinations but don't manage to get the right results.
I think you would like to group the results by order_id:
select array_agg(distinct product_id), sum(summ) total
from stat
group by order_id
order by total desc;
The function array_agg(distinct product_id) helps to concatenate unique values of product_id grouping values by order_id.
See the demo with all the details.

How to get distinct results with max of a field

I have a query :
select distinct(donorig_cdn),cerhue_num_rfa,max(cerhue_dt) from t_certif_hue
group by donorig_cdn,cerhue_num_rfa
order by donorig_cdn
it returns me some repeated ids with different cerhue_num_rfa
how do i return only one line for the repeated ids with cerhue_num_rfa that matches the max of date (cerhue_dt) .. and have at the end only 10 results instead of 15 ?
Postgres has SELECT DISTINCT ON to the rescue. It only returns the first row found for each value of the given column. So, all you need is an order that ensures the latest entry comes first. No need for grouping.
SELECT DISTINCT ON (donorig_cdn) donorig_cdn,cerhue_num_rfa,cerhue_dt
FROM t_certif_hue
ORDER BY donorig_cdn, cerhue_dt DESC;

Postgres Error while fetching data from the access_log table

Error : column access_log.id must appear in the GROUP BY clause or be used in an aggregate function] when the subquery is used
select
to_char(date_trunc('day',create_time),'DD MON, YYYY') as create_time,
to_char((max(create_time) - min(create_time)),'HH24:mi') as time_spent,
id
from
access_log
group by
user_id, actionlink_id, date_trunc('day',create_time)
access_log.id is unique for each row in that table, so it is very unlikely that you get useful information including that in the query. I believe you intended to do this by user_id instead. However what the error message is telling you the truth, IF you include id in the select clause, it should also occur in the group by clause. read on:
Think of each item in your select clause as falling into 2 types:
aggregating these are the ones with MIN MAX COUNT AVG and similar functions
non-aggregating these are the ones without thse functions, and it is these that determine how rows are formed. Each one of these non-aggregating items should appear in the group by clause. This information in these columns is then used to create the row structure of the final result. For example instead of rows for each time of the day, now rows will be "per day" because you included date_trunc('day',create_time) into the group by clause
select
user_id ---- changed this to user_id
, actionlink_id --- & added this
, to_char(date_trunc('day',create_time),'DD MON, YYYY') as create_time
, to_char((max(create_time) - min(create_time)),'HH24:mi') as time_spent
, MAX(id) as max_id
from
access_log
group by -- all non-aggregating select clause items go here
user_id
, actionlink_id
, date_trunc('day',create_time)

Equivalent of LIMIT for DB2

How do you do LIMIT in DB2 for iSeries?
I have a table with more than 50,000 records and I want to return records 0 to 10,000, and records 10,000 to 20,000.
I know in SQL you write LIMIT 0,10000 at the end of the query for 0 to 10,000 and LIMIT 10000,10000 at the end of the query for 10000 to 20,000
So, how is this done in DB2? Whats the code and syntax?
(full query example is appreciated)
Using FETCH FIRST [n] ROWS ONLY:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db29.doc.perf/db2z_fetchfirstnrows.htm
SELECT LASTNAME, FIRSTNAME, EMPNO, SALARY
FROM EMP
ORDER BY SALARY DESC
FETCH FIRST 20 ROWS ONLY;
To get ranges, you'd have to use ROW_NUMBER() (since v5r4) and use that within the WHERE clause: (stolen from here: http://www.justskins.com/forums/db2-select-how-to-123209.html)
SELECT code, name, address
FROM (
SELECT row_number() OVER ( ORDER BY code ) AS rid, code, name, address
FROM contacts
WHERE name LIKE '%Bob%'
) AS t
WHERE t.rid BETWEEN 20 AND 25;
Developed this method:
You NEED a table that has an unique value that can be ordered.
If you want rows 10,000 to 25,000 and your Table has 40,000 rows, first you need to get the starting point and total rows:
int start = 40000 - 10000;
int total = 25000 - 10000;
And then pass these by code to the query:
SELECT * FROM
(SELECT * FROM schema.mytable
ORDER BY userId DESC fetch first {start} rows only ) AS mini
ORDER BY mini.userId ASC fetch first {total} rows only
Support for OFFSET and LIMIT was recently added to DB2 for i 7.1 and 7.2. You need the following DB PTF group levels to get this support:
SF99702 level 9 for IBM i 7.2
SF99701 level 38 for IBM i 7.1
See here for more information: OFFSET and LIMIT documentation, DB2 for i Enhancement Wiki
Here's the solution I came up with:
select FIELD from TABLE where FIELD > LASTVAL order by FIELD fetch first N rows only;
By initializing LASTVAL to 0 (or '' for a text field), then setting it to the last value in the most recent set of records, this will step through the table in chunks of N records.
#elcool's solution is a smart idea, but you need to know total number of rows (which can even change while you are executing the query!). So I propose a modified version, which unfortunately needs 3 subqueries instead of 2:
select * from (
select * from (
select * from MYLIB.MYTABLE
order by MYID asc
fetch first {last} rows only
) I
order by MYID desc
fetch first {length} rows only
) II
order by MYID asc
where {last} should be replaced with row number of the last record I need and {length} should be replaced with the number of rows I need, calculated as last row - first row + 1.
E.g. if I want rows from 10 to 25 (totally 16 rows), {last} will be 25 and {length} will be 25-10+1=16.
Try this
SELECT * FROM
(
SELECT T.*, ROW_NUMBER() OVER() R FROM TABLE T
)
WHERE R BETWEEN 10000 AND 20000
The LIMIT clause allows you to limit the number of rows returned by the query. The LIMIT clause is an extension of the SELECT statement that has the following syntax:
SELECT select_list
FROM table_name
ORDER BY sort_expression
LIMIT n [OFFSET m];
In this syntax:
n is the number of rows to be returned.
m is the number of rows to skip before returning the n rows.
Another shorter version of LIMIT clause is as follows:
LIMIT m, n;
This syntax means skipping m rows and returning the next n rows from the result set.
A table may store rows in an unspecified order. If you don’t use the ORDER BY clause with the LIMIT clause, the returned rows are also unspecified. Therefore, it is a good practice to always use the ORDER BY clause with the LIMIT clause.
See Db2 LIMIT for more details.
You should also consider the OPTIMIZE FOR n ROWS clause. More details on all of this in the DB2 LUW documentation in the Guidelines for restricting SELECT statements topic:
The OPTIMIZE FOR clause declares the intent to retrieve only a subset of the result or to give priority to retrieving only the first few rows. The optimizer can then choose access plans that minimize the response time for retrieving the first few rows.
There are 2 solutions to paginate efficiently on a DB2 table :
1 - the technique using the function row_number() and the clause OVER which has been presented on another post ("SELECT row_number() OVER ( ORDER BY ... )"). On some big tables, I noticed sometimes a degradation of performances.
2 - the technique using a scrollable cursor. The implementation depends of the language used. That technique seems more robust on big tables.
I presented the 2 techniques implemented in PHP during a seminar next year. The slide is available on this link :
http://gregphplab.com/serendipity/uploads/slides/DB2_PHP_Best_practices.pdf
Sorry but this document is only in french.
Theres these available options:-
DB2 has several strategies to cope with this problem.
You can use the "scrollable cursor" in feature.
In this case you can open a cursor and, instead of re-issuing a query you can FETCH forward and backward.
This works great if your application can hold state since it doesn't require DB2 to rerun the query every time.
You can use the ROW_NUMBER() OLAP function to number rows and then return the subset you want.
This is ANSI SQL
You can use the ROWNUM pseudo columns which does the same as ROW_NUMBER() but is suitable if you have Oracle skills.
You can use LIMIT and OFFSET if you are more leaning to a mySQL or PostgreSQL dialect.

Duplicate values returned with joins

I was wondering if there is a way using TSQL join statement (or any other available option) to only display certain values. I will try and explain exactly what I mean.
My database has tables called Job, consign, dechead, decitem. Job, consign, and dechead will only ever have one line per record but decitem can have multiple records all tied to the dechead with a foreign key. I am writing a query that pulls various values from each table. This is fine with all the tables except decitem. From dechead I need to pull an invoice value and from decitem I need to grab the net wieghts. When the results are returned if dechead has multiple child decitem tables it displays all values from both tables. What I need it to do is only display the dechad values once and then all the decitems values.
e.g.
1 ¦123¦£2000¦15.00¦1
2 ¦--¦------¦20.00¦2
3 ¦--¦------¦25.00¦3
Line 1 displays values from dechead and the first line/Join from decitems. Lines 2 and 3 just display values from decitem. If I then export the query to say excel I do not have duplicate values in the first two fileds of lines 2 and 3
e.g.
1 ¦123¦£2000¦15.00¦1
2 ¦123¦£2000¦20.00¦2
3 ¦123¦£2000¦25.00¦3
Thanks in advance.
Check out 'group by' for your RDBMS http://msdn.microsoft.com/en-US/library/ms177673%28v=SQL.90%29.aspx
this is a task best left for the application, but if you must do it in sql, try this:
SELECT
CASE
WHEN RowVal=1 THEN dt.col1
ELSE NULL
END as Col1
,CASE
WHEN RowVal=1 THEN dt.col2
ELSE NULL
END as Col2
,dt.Col3
,dt.Col4
FROM (SELECT
col1, col2, col3
,ROW_NUMBER OVER(PARTITION BY Col1 ORDER BY Col1,Col4) AS RowVal
FROM ...rest of your big query here...
) dt
ORDER BY dt.col1,dt.Col4