Grouping data in postgresql - postgresql

If I have a table with multiple entries with same name I want to group only the name, i.e., show as many rows present in table but the name should appear only once and other data should show in multiple columns. i.e., for other rows name should be blank:
table expected result
---------------- ------------------
col1 col2 col1 col2
a 5 a 5
a 6 6
a 8 8
b 3 b 3
b 4 4
I'm using PostgreSQL 9.2.

You could use row_number to determine the first occurrence of each group, and from there, it's just a case away from not displaying it:
SELECT CASE rn WHEN 1 THEN col1 ELSE NULL END, col2
FROM (SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1
ORDER BY col2 ASC) AS rn
FROM my_table
ORDER BY col1, col2) t

Firstly I need to say that I do not have experience in PostgreSQL, just some basic SQL knowledge. It is not right to change data in original table itself, what you want is some 'view' of the data. Usually such things are made after data set is returned to client, actually it is a matter how to display the data (representation matter), and it should not be made in SQL query but on client side. But, if you want to bother the server with such things indeed, so I would do following: created copy of the table (it can be a temp table), then cleared values in col1 which are not the first in the subsequent select ordering records by col2. By the way, your table does not have primary key, so you will have a problem to implement that, since you can't identify parent record within the subsequent select.
So, the idea to archive that you need on client side (via a data cursor), just traversing records each by one, has even more points.

Related

How to sort table alphabetically by name initial?

I have a table contains columns 'employeename' and 'id', how can I sort the 'employeename' column following alphabetical order of the names initial?
Say the table is like this now:
employeename rid eid
Dave 1 1
Ben 4 2
Chloe 6 6
I tried the command ORDER BY, it shows what I want but when I query the data again by SELECT, the showed table data is the same as original, indicting ORDER BY does not modify the data, is this correct?
SELECT *
FROM employee
ORDER BY employeename ASC;
I expect the table data to be modified (sorted by names alphabetical order) like this:
employeename rid eid
Ben 4 2
Chloe 6 6
Dave 1 1
the showed table data is the same as original, indicting ORDER BY does not modify the data, is this correct?
Yes, this is correct. A SELECT statement does not change the data in a table. Only UPDATE, DELETE, INSERT or TRUNCATE statements will change the data.
However, your question shows a misconception on how a relational database works.
Rows in a table (of a relational database) are not sorted in any way. You can picture them as balls in a basket.
If you want to display data in a specific sort order, the only (really: the only) way to do that is to use an ORDER BY in your SELECT statement. There is no alternative to that.
Postgres allows to define a VIEW that includes an ORDER BY which might be an acceptable workaround for you:
CREATE VIEW sorted_employee;
AS
SELECT *
FROM employee
ORDER BY employeename ASC;
Then you can simply use
select *
from sorted_employees;
But be aware of the drawbacks. If you run select * from sorted_employees order by id then the data will be sorted twice. Postgres is not smart enough to remove the (useless) order by from the view's definition.
Some related questions:
Default row order in SELECT query - SQL Server 2008 vs SQL 2012
What is the default SQL result sort order with 'select *'?
Is PostgreSQL order fully guaranteed if sorting on a non-unique attribute?
Why do results from a SQL query not come back in the order I expect?

SELECT * except nth column

Is it possible to SELECT * but without n-th column, for example 2nd?
I have some view that have 4 and 5 columns (each has different column names, except for the 2nd column), but I do not want to show the second column.
SELECT * -- how to prevent 2nd column to be selected?
FROM view4
WHERE col2 = 'foo';
SELECT * -- how to prevent 2nd column to be selected?
FROM view5
WHERE col2 = 'foo';
without having to list all the columns (since they all have different column name).
The real answer is that you just can not practically (See LINK). This has been a requested feature for decades and the developers refuse to implement it. The best practice is to mention the column names instead of *. Using * in itself a source of performance penalties though.
However, in case you really need to use it, you might need to select the columns directly from the schema -> check LINK. Or as the below example using two PostgreSQL built-in functions: ARRAY and ARRAY_TO_STRING. The first one transforms a query result into an array, and the second one concatenates array components into a string. List components separator can be specified with the second parameter of the ARRAY_TO_STRING function;
SELECT 'SELECT ' ||
ARRAY_TO_STRING(ARRAY(SELECT COLUMN_NAME::VARCHAR(50)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME='view4' AND
COLUMN_NAME NOT IN ('col2')
ORDER BY ORDINAL_POSITION
), ', ') || ' FROM view4';
where strings are concatenated with the standard operator ||. The COLUMN_NAME data type is information_schema.sql_identifier. This data type requires explicit conversion to CHAR/VARCHAR data type.
But that is not recommended as well, What if you add more columns in the long run but they are not necessarily required for that query?
You would start pulling more column than you need.
What if the select is part of an insert as in
Insert into tableA (col1, col2, col3.. coln) Select everything but 2 columns FROM tableB
The column match will be wrong and your insert will fail.
It's possible but I still recommend writing every needed column for every select written even if nearly every column is required.
Conclusion:
Since you are already using a VIEW, the simplest and most reliable way is to alter you view and mention the column names, excluding your 2nd column..
-- my table with 2 rows and 4 columns
DROP TABLE IF EXISTS t_target_table;
CREATE TEMP TABLE t_target_table as
SELECT 1 as id, 1 as v1 ,2 as v2,3 as v3,4 as v4
UNION ALL
SELECT 2 as id, 5 as v1 ,-6 as v2,7 as v3,8 as v4
;
-- my computation and stuff that i have to messure, any logic could be done here !
DROP TABLE IF EXISTS t_processing;
CREATE TEMP TABLE t_processing as
SELECT *, md5(t_target_table::text) as row_hash, case when v2 < 0 THEN true else false end as has_negative_value_in_v2
FROM t_target_table
;
-- now we want to insert that stuff into the t_target_table
-- this is standard
-- INSERT INTO t_target_table (id, v1, v2, v3, v4) SELECT id, v1, v2, v3, v4 FROM t_processing;
-- this is andvanced ;-)
INSERT INTO t_target_table
-- the following row select only the columns that are pressent in the target table, and ignore the others.
SELECT r.* FROM (SELECT to_jsonb(t_processing) as d FROM t_processing) t JOIN LATERAL jsonb_populate_record(NULL::t_target_table, d) as r ON TRUE
;
-- WARNING : you need a object that represent the target structure, an exclusion of a single column is not possible
For columns col1, col2, col3 and col4 you will need to request
SELECT col1, col3, col4 FROM...
to omit the second column. Requesting
SELECT *
will get you all the columns

duplicate multi column entries postgresql

I have a bunch of data in a postgresql database. I think that two keys should form a unique pair,
so want to enforce that in the database. I try
create unique index key1_key2_idx on table(key1,key2)
but that fails, telling me that I have duplicate entries.
How do I find these duplicate entries so I can delete them?
select key1,key2,count(*)
from table
group by key1,key2
having count(*) > 1
order by 3 desc;
The critical part of the query to determine the duplicates is having count(*) > 1.
There are a whole bunch of neat tricks at the following link, including some examples of removing duplicates: http://postgres.cz/wiki/PostgreSQL_SQL_Tricks
Assuming you only want to delete the duplicates and keep the original, the accepted answer is inaccurate -- it'll delete your originals as well and only keep records that have one entry from the start. This works on 9.x:
SELECT * FROM tblname WHERE ctid IN
(SELECT ctid FROM
(SELECT ctid, ROW_NUMBER() OVER
(partition BY col1, col2, col3 ORDER BY ctid) AS rnum
FROM tblname) t
WHERE t.rnum > 1);
https://wiki.postgresql.org/wiki/Deleting_duplicates

Postgres: Distinct but only for one column

I have a table on pgsql with names (having more than 1 mio. rows), but I have also many duplicates. I select 3 fields: id, name, metadata.
I want to select them randomly with ORDER BY RANDOM() and LIMIT 1000, so I do this is many steps to save some memory in my PHP script.
But how can I do that so it only gives me a list having no duplicates in names.
For example [1,"Michael Fox","2003-03-03,34,M,4545"] will be returned but not [2,"Michael Fox","1989-02-23,M,5633"]. The name field is the most important and must be unique in the list everytime I do the select and it must be random.
I tried with GROUP BY name, bu then it expects me to have id and metadata in the GROUP BY as well or in a aggragate function, but I dont want to have them somehow filtered.
Anyone knows how to fetch many columns but do only a distinct on one column?
To do a distinct on only one (or n) column(s):
select distinct on (name)
name, col1, col2
from names
This will return any of the rows containing the name. If you want to control which of the rows will be returned you need to order:
select distinct on (name)
name, col1, col2
from names
order by name, col1
Will return the first row when ordered by col1.
distinct on:
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.
Anyone knows how to fetch many columns but do only a distinct on one column?
You want the DISTINCT ON clause.
You didn't provide sample data or a complete query so I don't have anything to show you. You want to write something like:
SELECT DISTINCT ON (name) fields, id, name, metadata FROM the_table;
This will return an unpredictable (but not "random") set of rows. If you want to make it predictable add an ORDER BY per Clodaldo's answer. If you want to make it truly random, you'll want to ORDER BY random().
To do a distinct on n columns:
select distinct on (col1, col2) col1, col2, col3, col4 from names
SELECT NAME,MAX(ID) as ID,MAX(METADATA) as METADATA
from SOMETABLE
GROUP BY NAME

Duplicate values returned with joins

I was wondering if there is a way using TSQL join statement (or any other available option) to only display certain values. I will try and explain exactly what I mean.
My database has tables called Job, consign, dechead, decitem. Job, consign, and dechead will only ever have one line per record but decitem can have multiple records all tied to the dechead with a foreign key. I am writing a query that pulls various values from each table. This is fine with all the tables except decitem. From dechead I need to pull an invoice value and from decitem I need to grab the net wieghts. When the results are returned if dechead has multiple child decitem tables it displays all values from both tables. What I need it to do is only display the dechad values once and then all the decitems values.
e.g.
1 ¦123¦£2000¦15.00¦1
2 ¦--¦------¦20.00¦2
3 ¦--¦------¦25.00¦3
Line 1 displays values from dechead and the first line/Join from decitems. Lines 2 and 3 just display values from decitem. If I then export the query to say excel I do not have duplicate values in the first two fileds of lines 2 and 3
e.g.
1 ¦123¦£2000¦15.00¦1
2 ¦123¦£2000¦20.00¦2
3 ¦123¦£2000¦25.00¦3
Thanks in advance.
Check out 'group by' for your RDBMS http://msdn.microsoft.com/en-US/library/ms177673%28v=SQL.90%29.aspx
this is a task best left for the application, but if you must do it in sql, try this:
SELECT
CASE
WHEN RowVal=1 THEN dt.col1
ELSE NULL
END as Col1
,CASE
WHEN RowVal=1 THEN dt.col2
ELSE NULL
END as Col2
,dt.Col3
,dt.Col4
FROM (SELECT
col1, col2, col3
,ROW_NUMBER OVER(PARTITION BY Col1 ORDER BY Col1,Col4) AS RowVal
FROM ...rest of your big query here...
) dt
ORDER BY dt.col1,dt.Col4