PostgreSQL two groups segregated but not ordered only by zero price column - postgresql

I need help with a bit of a crazy single-query goal please that I'm not sure if GROUP BY or sub-SELECT applies to?
The following query:
SELECT id_finish, description, inside_rate, outside_material, id_part, id_metal
FROM parts_finishing AS pf
LEFT JOIN parts_finishing_descriptions AS fd ON (pf.id_description=fd.id);
Returns the results like the following:
+-------------+-------------+------------------+--------------------------------+
| description | inside_rate | outside_material | id_part - id_finish - id_metal |
+-------------+-------------+------------------+--------------------------------+
| Nickle | 0 | 33.44 | 4444-44-44, 5555-55-55 |
+-------------+-------------+------------------+--------------------------------+
| Bend | 11.22 | 0 | 1111-11-11 |
+-------------+-------------+------------------+--------------------------------+
| Pack | 22.33 | 0 | 2222-22-22, 3333-33-33 |
+-------------+-------------+------------------+--------------------------------+
| Zinc | 0 | 44.55 | 6000-66-66 |
+-------------+-------------+------------------+--------------------------------+
I need the results to return in the fashion below but there are catches:
I need to group by either the inside_rate column or the outside_material column but ORDER BY the description column but not ORDER BY or sort them by price (inside_rate and outside_material are the prices). So we know that they belong to a group if inside_rate is 0 or to the other group if outside_material is 0.
I need to ORDER BY the description column desc secondary after they are returned per group.
I need to return a list of parts (composed of three separate columns) for that inside/outside group / price for that finishing.
Stack format fix.
+-------------+-------------+------------------+--------------------------------+
| description | inside_rate | outside_material | id_part - id_finish - id_metal |
+-------------+-------------+------------------+--------------------------------+
| Bend | 11.22 | 0 | 1111-11-11 |
+-------------+-------------+------------------+--------------------------------+
| Pack | 22.33 | 0 | 2222-22-22, 3333-33-33 |
+-------------+-------------+------------------+--------------------------------+
| Nickle | 0 | 33.44 | 4444-44-44, 5555-55-55 |
+-------------+-------------+------------------+--------------------------------+
| Zinc | 0 | 44.55 | 6000-66-66 |
+-------------+-------------+------------------+--------------------------------+
The tables I'm working with and their data types:
Table "public.parts_finishing"
Column | Type | Modifiers
------------------+---------+-------------------------------------------------------------
id | bigint | not null default nextval('parts_finishing_id_seq'::regclass)
id_part | bigint |
id_finish | bigint |
id_metal | bigint |
id_description | bigint |
date | date |
inside_hours_k | numeric |
inside_rate | numeric |
outside_material | numeric |
sort | integer |
Indexes:
"parts_finishing_pkey" PRIMARY KEY, btree (id)
Table "public.parts_finishing_descriptions"
Column | Type | Modifiers
------------+---------+------------------------------------------------------------------
id not null | bigint | default nextval('parts_finishing_descriptions_id_seq'::regclass)
date | date |
description | text |
rate_hour | numeric |
type | text |
Indexes:
"parts_finishing_descriptions_pkey" PRIMARY KEY, btree (id)
The second table's first column is just id. (Why are we still dealing with a 1024 static width layout in 2015?)
I'd make an SQL fiddle though it refuses to load for me regardless of the browser.

Not entirely sure I understand your question. Might look like this:
SELECT pd.description, pf.inside_rate, pf.outside_material
, concat_ws(' - ', pf.id_part::text
, pf.id_finish::text
, pf.id_metal::text) AS id_part_finish_metal
FROM parts_finishing pf
LEFT JOIN parts_finishing_descriptions fd ON pf.id_description = fd.id
ORDER BY (pf.inside_rate = 0) -- 1. sorts group "inside_rate" first
, pd.description DESC NULLS LAST -- 2. possible NULL values last
;

Related

Any way to find and delete almost similar records with SQL?

I have a table in Postgres DB, that has a lot of almost identical rows. For example:
1. 00Zicky_-_San_Pedro_Danilo_Vigorito_Remix
2. 00Zicky_-_San_Pedro__Danilo_Vigorito_Remix__
3. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx__
4. 0101_-_Try_To_Say__Strictlyjaz_Unit_Future_Rmx_
5. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
6. 01_-_Digital_Excitation_-_Brothers_Gonna_Work_it_Out__Piano_Mix__
I think about to writing a little golang script to remove duplicates, but maybe SQL can do it?
Table definition:
\d+ songs
Table "public.songs"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
---------------+-----------------------------+-----------+----------+----------------------------------------+----------+-------------+--------------+-------------
song_id | integer | | not null | nextval('songs_song_id_seq'::regclass) | plain | | |
song_name | character varying(250) | | not null | | extended | | |
fingerprinted | smallint | | | 0 | plain | | |
file_sha1 | bytea | | | | extended | | |
total_hashes | integer | | not null | 0 | plain | | |
date_created | timestamp without time zone | | not null | now() | plain | | |
date_modified | timestamp without time zone | | not null | now() | plain | | |
Indexes:
"pk_songs_song_id" PRIMARY KEY, btree (song_id)
Referenced by:
TABLE "fingerprints" CONSTRAINT "fk_fingerprints_song_id" FOREIGN KEY (song_id) REFERENCES songs(song_id) ON DELETE CASCADE
Access method: heap
Tried several methods to find duplicates, but that methods search only for exact similarity.
There is no operator which is essentially A almost = B. (Well there is full text search, but that seems to be a little excessive here.) If the only difference is the number of - and _ then just get rid of them and compare the the resulting difference. If they are equal, then one is a duplicate. You can use the replace() function to remove them. So something like: (see demo)
delete
from songs s2
where exists ( select null
from songs s1
where s1.song_id < s2.song_id
and replace(replace(s1.name, '_',''),'-','') =
replace(replace(s2.name, '_',''),'-','')
);
If your table is large this will not be fast, but a functional index may help:
create index song_name_idx on songs
(replace(replace(name, '_',''),'-',''));

Postgres query taking long to execute

I am using libpq to connect the Postgres server in c++ code. Postgres server version is 12.10
My table schema is defined below
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------------+----------+-----------+----------+------------+----------+--------------+-------------
event_id | bigint | | not null | | plain | |
event_sec | integer | | not null | | plain | |
event_usec | integer | | not null | | plain | |
event_op | smallint | | not null | | plain | |
rd | bigint | | not null | | plain | |
addr | bigint | | not null | | plain | |
masklen | bigint | | not null | | plain | |
path_id | bigint | | | | plain | |
attribs_tbl_last_id | bigint | | not null | | plain | |
attribs_tbl_next_id | bigint | | not null | | plain | |
bgp_id | bigint | | not null | | plain | |
last_lbl_stk | bytea | | not null | | extended | |
next_lbl_stk | bytea | | not null | | extended | |
last_state | smallint | | | | plain | |
next_state | smallint | | | | plain | |
pkey | integer | | not null | 1654449420 | plain | |
Partition key: LIST (pkey)
Indexes:
"event_pkey" PRIMARY KEY, btree (event_id, pkey)
"event_event_sec_event_usec_idx" btree (event_sec, event_usec)
Partitions: event_spl_1651768781 FOR VALUES IN (1651768781),
event_spl_1652029140 FOR VALUES IN (1652029140),
event_spl_1652633760 FOR VALUES IN (1652633760),
event_spl_1653372439 FOR VALUES IN (1653372439),
event_spl_1653786420 FOR VALUES IN (1653786420),
event_spl_1654449420 FOR VALUES IN (1654449420)
When I execute the following query it takes 1 - 2 milliseconds to execute.
Time is provided as a parameter to function executing this query, it contains epoche seconds and microseconds.
SELECT event_id FROM event WHERE (event_sec > time.seconds) OR ((event_sec=time.seconds) AND (event_usec>=time.useconds) ORDER BY event_sec, event_usec LIMIT 1
This query is executed every 30 seconds on the same client connection (Which is persistent for weeks). This process runs for weeks, but some time same query starts taking more than 10 minutes.
If I restart the process it recreated connection with the server and now execution time again falls back to 1-2 milliseconds. This issue is intermittent, sometimes it triggers after a week of running process and some time after 2 - 3 weeks of running process.
We add a new partition to table every Sunday and write new data in new partition.
I don't know why the performance is inconsistent, there are many possibilities we can't distinguish with the info provided. Like, does the plan change when the performance changes, or does the same plan just perform worse?
But your query is not written to take maximal advantage of the index. In my hands it can use the index for ordering, but then it still needs to read and individually skip over things that fail the WHERE clause until it finds the first one which passes. And due to partitioning, I think it is even worse than that, it has to do this read-and-skip until it finds the first one which passes in each partition.
You could rewrite it to do a tuple comparison, which can use the index to determine both the order, and where to start:
SELECT event_id FROM event
WHERE (event_sec,event_sec) >= (:seconds,:useconds)
ORDER BY event_sec, event_usec LIMIT 1;
Now this might also degrade, or might not, or maybe will degrade but still be so fast that it doesn't matter.

Sorting Issue with Underscore in Postgres

I'm trying to perform sorting on below data but postgres return the wrong sorting result.
Can someone please help me over her. How can I get proper sorting data.
Here I'm write below query to get data,
SELECT * FROM TempTable ORDER BY a_test ASC NULLS FIRST;
and it's return result like below,
| BB001217 |
| BB001217_000010 |
| BB001217_000011 |
| BB001217_00002 |
| BB001217_00003 |
| BB001218 |
| BB001219 |
| BB001220 |
| BB001220_000010 |
| BB001220_000011 |
| BB001220_00002 |
| BB001220_00003 |
| BB001220_00004 |
| BB001220_00005 |
| BB001220_00006 |
And I Expected result in below form,
| BB001217 |
| BB001217_00002 |
| BB001217_00003 |
| BB001217_000010 |
| BB001217_000011 |
| BB001218 |
| BB001219 |
| BB001220 |
| BB001220_00002 |
| BB001220_00003 |
| BB001220_00004 |
| BB001220_00005 |
| BB001220_00006 |
| BB001220_000010 |
| BB001220_000011 |
From PostgreSQL v10 on you could use an ICU collation that provides “natural sorting”:
CREATE COLLATION english_natural (
LOCALE = 'en-US-u-kn-true',
PROVIDER = icu
);
SELECT *
FROM TempTable
ORDER BY a_test COLLATE english_natural
ASC NULLS FIRST;
You are storing numbers in a VARCHAR column and the sorting is thus based on character sorting where '10' is considered to be smaller than '2'
You need to split the column into two parts, then convert the second to a number and sort on those two:
SELECT *
FROM temptable
ORDER BY split_part(a_test,'_',1),
nullif(split_part(a_test,'_',2),'')::int ASC NULLS FIRST;
Online example: https://rextester.com/RNU44666

Postgres: return true/false if matches found from inner join?

I have implemented a tagging system in my application, using Postgres 9.6. There are three tables.
Projects
Table "public.project"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+---------------------------------
id | integer | | not null | nextval('tag_id_seq'::regclass)
name | character varying(255) | | not null |
user_id | integer | | |
Tags
Table "public.tag"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+---------------------------------
id | integer | | not null | nextval('tag_id_seq'::regclass)
tag | character varying(255) | | not null |
user_id | integer | | |
is_internal | boolean | | not null | false
Project tags
Column | Type | Collation | Nullable | Default
------------------+-----------------------------+-----------+----------+-----------------------------------------
id | integer | | not null | nextval('project_tag_id_seq'::regclass)
tag_id | integer | | not null |
project_id | integer | | | |
user_id | integer | | not null |
Now I want to get a list of all the projects, annotated with a column that indicates (for a particular tag) whether it has that tag.
So I'd like the results to look like this:
id name has_favorite_tag
1 foo true
2 bar false
3 baz false
This is my query so far:
select project.*, CASE(XXXX) as has_project_tag
from project p
join (select * from project_tag where tag_id=1) pt on p.id=pt.project_id
I know that I want to use CASE to be true when the length of project_tag matches is greater than 0 - but how do I do this?
(In reality the project table has many more fields, of course.)
Here's a possibility (unfiltered for tag_id; add to inner select if necessary):
select project.*, exists(select * from project_tag where id=project.id) as has_project_tag from project;

Postgres group by with aggerate (last comment in a conversation across all conversations)

I want to get the last comment in a conversation between two people.
My table structure as follows:
Table "public.comments"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+-----------------------------+-------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('comments_id_seq'::regclass) | plain | |
body | text | not null | extended | |
target_id | integer | | plain | |
target_type | character varying(255) | | extended | |
created_at | timestamp without time zone | not null | plain | |
updated_at | timestamp without time zone | not null | plain | |
user_id | integer | | plain | |
My Attempt:
SELECT
comments.id,
max(SELECT id comments.created_at),
CASE
WHEN user_id = 1 THEN CONCAT(user_id,'_',target_id)
WHEN target_id = 1 THEN CONCAT(target_id,'_',user_id)
END
FROM comments
WHERE
comments.user_id = 1
OR
(comments.target_type = 'User'
AND
comments.target_id = 1)
GROUP BY
CASE
WHEN user_id = 1 THEN CONCAT(user_id,'_',target_id)
WHEN target_id = 1 THEN CONCAT(target_id,'_',user_id)
END
So I figured out how to group the comments but how to order by created_at and get the latest id and information is where I'm stuck.