Speed up query with multiple conditions - postgresql

I have a sql table of 5,000,000 entries with 5 columns over which I need to make a 4-condition (TEXT type) select. The table has columns id, name, street, city, zip and my select looks like this
SELECT id FROM register WHERE name=%s AND zip=%s AND city=%s AND street=%s
Problem is, i need to speed up this query, because i need to do 80 000 of this queries and now it takes half a day.

The %s placeholders imply that all four of your columns are varchar or text. If so, then the following index might help:
CREATE INDEX idx ON register (name, zip, city, street, id)
The first four parts of the index cover the WHERE clause, and the fifth part covers the id column which is needed for the SELECT clause.

Related

Is distinct function deterministic? T-sql

I have table like below. For distinct combination of user ID and Product ID SQL will select product bought from store ID 1 or 2? Is it determinictic?
My code
SELECT (DISTINCT CONCAT(UserID, ProductID)), Date, StoreID FROM X
This isn't valid syntax. You can have
select [column_list] from X
or you can have
select distinct [column_list] from X
The difference is that the first will return one row for every row in the table while the second will return one row for every unique combination of the column values in your column list.
Adding "distinct" to a statement will reliably produce the same results every time unless the underlying data changes, so in this sense, "distinct" is deterministic. However, it is not a function so the term "deterministic" doesn't really apply.
You may actually want a "group by" clause like the following (in which case you have to actually specify how you want the engine to pick values for columns not in your group):
select
concat(UserId, ProductID)
, min(Date)
, max(Store)
from
x
group by
concat(UserId, ProductID)
Results:
results

Convert certain columns that are recurring to rows in postgres

I have a table in Postgres that has ~50 million rows. I need to convert certain columns to rows.
I need to unpivot certain columns for individuals that repeat as an individual column and repeat the non-individual variables against the respective ID -
The following is the output I need -
Id appreciate any help on this.
50 million rows is not a big deal in Greenplum but returning that many rows to a client is kind of pointless. I'm guessing you want to create a new table for this new output. You are also going to be creating a table that is 2x larger because you are taking a single row and turning it into 2.
create table new_table as
select id, mid_1 as mid, name_1 as name, age_1 as age, location
from your_table
union all
select id, mid_2 as mid, name_2 as name, age_2 as age, location
from your_table
distributed by (id);

SQL statement that returns exactly one row with columns

I'm having trouble creating a query for the following task: i want to return exactly one row with columns: region_id, region_name, province_name, province_code, country_name, country_code for any given regionid. The database has 3 tables "countrylist" , "provinces" and "regionlist"
the table countrylist has the following columns : countryid, language code, countryname, countrycode and continentid
provinces : country_code, country_name, province_code, province_name
regionlist: regionid, regiontype.
So I tried writing a query for joining the table but I'm sure if I'm doing it correct.
exactly one row with columns: region_id, region_name, province_name, province_code, country_name, country_code for any given regionid.
I am not 100% aware of the differences between Postgres and MySQL - but guess you get the idea at the very least.
One way to do it, to get your id with WHERE regionlist.regionid = and join the other tables. From either the regionlist you can use the LIMIT (reference) to get a limited amount of rows.
Apparently neither provinces nor country have a common column with regionlist, so I can not tell where the link between those are. However, once you have 1 row of the region list you should have no troubles joining them with the others (if the links are trivial).

PostgreSQL 9.5 Select only non matching records from two tables

I have three tables representing some geographical datas and :
- one with the actual datas,
- one storing the name of the streets,
- one storing the combination between the street number and the street name (table address).
I already have some address existing in my table, in order to realize an INSERT INTO SELECT in a fourth table, I am looking on how to build the SELECT query to retrieve only the objects not already existing in the address table.
I tried different approaches, including the NOT EXISTS and the id_street IS NULL conditions, but I didn't manage to make it work.
Here is an example : http://rextester.com/KMSW4349
Thanks
You can simply use EXCEPT to remove the rows already in address:
INSERT INTO address(street_number,id_street)
SELECT DISTINCT datas.street_number, street.id_street
FROM datas
LEFT JOIN street USING (street_name)
EXCEPT
SELECT street_number, id_street FROM address;
You could end up with duplicates if there are concurrent data modifications on address.
To avoid that, you'd add a unique constraint and use INSERT ... ON CONFLICT DO NOTHING.
Your sub query is not correct. You have to match with the outer tables:
INSERT INTO address(street_number,id_street)
SELECT DISTINCT street_number, id_street
FROM datas
LEFT JOIN street ON street.street_name=datas.street_name
WHERE NOT EXISTS (SELECT * FROM address a2 WHERE a2.street_number = datas.street_number AND a2.id_street = street.id_street);

Use SQL to export Parent/Child rows into a flat file

With an Orders table:
(OrderID, date, customerID, status, etc)
and an OrderDetails table:
(ParentID, itemID, quantity, price, etc)
I would like to create a SQL Query that will export a CSV flat file with Order and OrderDetail rows interspersed. For instance, output might look like this (H and D indicate "Header" and "Detail" respectively.):
"H",2345,"6/1/09",856,"Shipped"
"D",2345,52,1,1.50
"D",2345,92,2,3.25
"D",2345,74,1,9.99
"H",2346,"6/1/09",474,"Shipped"
"D",2346,74,1,9.99
"D",2346,52,1,1.50
Not sure where to even start with this. Any ideas? TIA.
You'll want to take advantage of the fact that union all will honor the order by clause at the end on the entire result set. Therefore, if you order by the second column (2!) ascending, and the first column (1!) descending, you'll get the header row, then the detail rows underneath that.
Also, make sure that you have the same number of columns in the two queries. They don't have to be of the same data type, since you're exporting to CSV, but they do have to be the same number. Otherwise, the union all won't be able to pile them onto each other. Sometimes, you'll just have to pad columns with null if you need extra ones, or '' if you don't want the word null in your CSV.
select
'H',
OrderID,
Date,
CustomerID,
Status
from
Headers
union all
select
'D',
ParentID,
ItemID,
Quantity,
Price
from
Details
order by
2 asc, 1 desc