Insert new records from different table on condition - postgresql

I am trying to add new records to table_1 from table_2 via JOIN but I am not getting the correct result.
I am trying to copy latitude and longitude from table_2 to table_1.
I have table_1 with columns address, latitude, longitude
where latitude & longitude is blank.
table_2 has columns address, latitude, longitude, name, address, etc.
In both tables address has just a city name.
insert into table_1 (latitude, longitude)
select table_2.latitude, table_2.longitude from table_2 JOIN table_1 ON table_1.address = table_2.address;
Getting this output: table_1
address latitude longitude
city_1 NULL NULL
city_2 NULL NULL
city_3 NULL NULL
NULL 123.12 123.12
NULL 123.12 123.12
NULL 123.12 123.12
Where I am expecting something like this:
address latitude longitude
city_1 123.12 123.12
city_2 123.12 123.12
city_3 123.12 123.12

In your case, you should use the update statement instead of insert, since the entries in the first table one already exist, and you just need to update them
update table_1 set latitude = mapped_t2.latitude, longitude = mapped_t2.longitude
from (select table_2.address, table_2.latitude, table_2.longitude from table_2 join table_1 on table_1.address = table_2.address) as mapped_t2
where mapped_t2.address = table_1.address
Demo in dbfiddle

Related

How to upsert based on values from another table?

I'm trying to UPSERT into postgres DB based on values from another table using PeeWee.
**table1**
pk_t1 int
name
city
country
**table2**
pk_t2 int
name
city
country
comments
INSERT INTO table2 (pk_t2, name, city, country)
SELECT pk_1, name, city, country
FROM table1
ON CONFLICT (pk_t2) DO UPDATE
SET name = excluded.name, city = excluded.city, country = excluded.country;
But I'm unable to find a suitable peewee example from documents or SO.
Here you go:
q = T1.select()
iq = (T2
.insert_from(q, fields=[T2.id, T2.name, T2.city, T2.country])
.on_conflict(conflict_target=[T2.id], preserve=[T2.name, T2.city, T2.country]))
Corresponding SQL peewee generates:
insert into t2 (id, name, city, country)
select t1.id, t1.name, t1.city, t1.country
from t1
on conflict(id) do update set
name=excluded.name,
city=excluded.city,
country=excluded.country

Join multiple tables of Amazon Redshift in a single one obtains error: column X is of type boolean but expression is of type character varying Hint:

I am trying to join multiple tables of Amazon Redshift in a single one.
One of the initial tables is this one:
create table order_customers(
id int,
email varchar(254),
phone varchar(50),
customer_id int,
order_id int NOT NULL,
ip text,
geoip_location varchar(1024),
logged_in boolean,
PRIMARY KEY (id),
FOREIGN KEY (order_id) REFERENCES orders (id)
);
I am using the command to insert the data into the large table:
INSERT INTO orders_large ( id, showid, created_at, status, status_enum, currency, tax_orders, shipping, discount_orders,
subtotal, total, store_id, payment_method_id, shipping_method_name, shipping_method_id, additional_information,
payment_information, locale, shipping_required_orders, payment_method_type, coupons, payment_notification_id,
recover_token, updated_at, external, shipping_tax, shipping_discount, shipping_discount_decimal,
completed_at, payment_name, shipping_service_id, app_id, fulfillment_status, date_traffic_sources,
landing_url, referral_url, referral_code, utm_campaign, utm_source, utm_term, utm_medium, utm_content,
user_agent, subscription_id_traffic_sources, email, phone, customer_id_order_customers,
ip, geoip_location, logged_in, name, surname, company, address, street_number, city, postal, country, region,
type, taxid, default_, region_format, municipality, latitude, longitude, subscription_id_addresses,
customer_id_addresses, pickup_point_id, taxid_type, sku, qty, price, product_id, weight,
product_option_property_id, discount_order_products, shipping_required_orders_products, brand,
tax_order_products, width, height, length, volume, diameter, package_format)
SELECT o.id, o.showid, o.created_at, o.status, o.status_enum, o.currency, o.tax, o.shipping, o.discount,
o.subtotal, o.total, o.store_id, o.payment_method_id, o.shipping_method_name, o.shipping_method_id, o.additional_information,
o.payment_information, o.locale, o.shipping_required, payment_method_type, coupons, payment_notification_id,
recover_token, o.updated_at, o.external, shipping_tax, o.shipping_discount, shipping_discount_decimal,
completed_at, payment_name, shipping_service_id, o.app_id, o.fulfillment_status, t.date,
t.landing_url, t.referral_url, t.referral_code, t.utm_campaign, t.utm_source, t.utm_term, t.utm_medium, t.utm_content,
t.user_agent, t.subscription_id, oc.email, oc.phone, oc.customer_id, oc.order_id, oc.ip,
oc.geoip_location, oc.logged_in, a.name, a.surname, a.company, a.address, a.street_number, a.city, a.postal,
a.country, a.region, a.type, a.taxid, a.default_, a.region_format, a.municipality, a.latitude, a.longitude, a.order_id,
a.subscription_id, a.customer_id, a.pickup_point_id, a.taxid_type, op.sku, op.qty, op.price, op.product_id,
op.order_id, op.weight, op.product_option_property_id, op.discount, op.shipping_required, op.brand,
op.tax, op.width, op.height, op.length, op.volume, op.diameter, op.package_format
FROM orders o
INNER JOIN traffic_sources t ON o.id = t.order_id
INNER JOIN order_customers oc ON o.id = oc.order_id
INNER JOIN addresses a ON o.id = a.order_id
INNER JOIN order_products op ON o.id = op.order_id;
An I obtain this error mensage:
ERROR: column "logged_in" is of type boolean but expression is of type character varying Hint: You will need to rewrite or cast the expression.
I try using DECODE(oc.logged_in, 'false', '0', 'true', '1')::varchar::bool in the oc.logged_in field, but another error message appears:
ERROR: cannot cast type character varying to boolean
The problem was the correspondence between the fields. It worked after removing the fiels with "order_id". To cast in Redshift there are 2 options:
CONVERT ( type, expression )
CAST ( expression AS type ) or expression :: type
font: https://docs.aws.amazon.com/redshift/latest/dg/r_CAST_function.html#convert-function
As the message says - column "logged_in" is of type boolean
So in your DECODE you need to compare it to boolean values, not strings. Try:
DECODE(oc.logged_in, true, 'true', 'false')
The code above works for my understanding of you issue. Below is test SQL which runs fine on Redshift.
create table oc as (select 1=1 as logged_in union all select 1=0);
select * from oc;
select DECODE(oc.logged_in, true, 'true string', 'false string') as test from oc;
I now expect that the issue is not in using oc.logged_in but rather with orders_large.logged_in and what you are putting in it. What data type is logged_in defined as in orders_large? Boolean, I assume. Which should take a boolean value just fine w/o casting.
Looking at your SQL I see that the number of elements in INSERT clause doesn't match the number of elements in the SELECT clause. This mismatch is causing your SQL to try and put a different (text) value into orders_large.logged_in. Here's a "diff" between the 2 lists (SELECT on the left / INSERT on the right):
id id
showid showid
created_at created_at
status status
status_enum status_enum
currency currency
tax_orders | tax
shipping shipping
discount_orders | discount
subtotal subtotal
total total
store_id store_id
payment_method_id payment_method_id
shipping_method_name shipping_method_name
shipping_method_id shipping_method_id
additional_information additional_information
payment_information payment_information
locale locale
shipping_required_orders | shipping_required
payment_method_type payment_method_type
coupons coupons
payment_notification_id payment_notification_id
recover_token recover_token
updated_at updated_at
external external
shipping_tax shipping_tax
shipping_discount shipping_discount
shipping_discount_decimal shipping_discount_decimal
completed_at completed_at
payment_name payment_name
shipping_service_id shipping_service_id
app_id app_id
fulfillment_status fulfillment_status
date_traffic_sources | date
landing_url landing_url
referral_url referral_url
referral_code referral_code
utm_campaign utm_campaign
utm_source utm_source
utm_term utm_term
utm_medium utm_medium
utm_content utm_content
user_agent user_agent
subscription_id_traffic_sources | subscription_id
email email
phone phone
customer_id_order_customers | customer_id
> order_id
ip ip
geoip_location geoip_location
logged_in logged_in
name name
surname surname
company company
address address
street_number street_number
city city
postal postal
country country
region region
type type
taxid taxid
default_ default_
region_format region_format
municipality municipality
latitude latitude
longitude longitude
subscription_id_addresses | order_id
customer_id_addresses | subscription_id
> customer_id
pickup_point_id pickup_point_id
taxid_type taxid_type
sku sku
qty qty
price price
product_id product_id
> order_id
weight weight
product_option_property_id product_option_property_id
discount_order_products | discount
shipping_required_orders_products | shipping_required
brand brand
tax_order_products | tax
width width
height height
length length
volume volume
diameter diameter
package_format package_format
As you can see there is an unmatched "orders_id" in the INSERT list just a couple of columns before logged_in. You need to get the column alignment fixed.

Postgres: one query with multiple JOINs vs multiple queries

I am working on Posrgres 9.6 with PostGIS 2.3, hosted on AWS RDS. I'm trying to optimize some geo-radius queries for data that comes from different tables.
I'm considering two approaches: single query with multiple joins or two separate but simpler queries.
At a high level, and simplifying the structure, my schema is:
CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE TABLE addresses (
id bigint NOT NULL,
latitude double precision,
longitude double precision,
line1 character varying NOT NULL,
"position" geography(Point,4326),
CONSTRAINT enforce_srid CHECK ((st_srid("position") = 4326))
);
CREATE INDEX index_addresses_on_position ON addresses USING gist ("position");
CREATE TABLE locations (
id bigint NOT NULL,
uuid uuid DEFAULT uuid_generate_v4() NOT NULL,
address_id bigint NOT NULL
);
CREATE TABLE shops (
id bigint NOT NULL,
name character varying NOT NULL,
location_id bigint NOT NULL
);
CREATE TABLE inventories (
id bigint NOT NULL,
shop_id bigint NOT NULL,
status character varying NOT NULL
);
The addresses table holds the geographical data. The position column is calculated from the lat-lng columns when the rows are inserted or updated.
Each address is associated to one location.
Each address may have many shops, and each shop will have one inventory.
I've omitted them for brevity, but all the tables have the proper foreign key constraints and btree indexes on the reference columns.
The tables have a few hundreds of thousands of rows.
With that in place, my main use case can be satisfied by this single query, which searches for addresses within 1000 meters from a central geographical point (10.0, 10.0) and returns data from all the tables:
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.uuid AS location_uuid,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
JOIN locations l ON l.address_id = a.id
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
This query works, and EXPLAIN ANALYZE shows that it does correctly use the GIST index.
However, I could also split this query in two and manage the intermediate results in the application layer. For example, this works too:
--- only search for the addresses
SELECT
a.id as addr_id,
a.line1 AS addr_line,
a.latitude AS lat,
a.longitude AS lng
FROM addresses a
WHERE ST_DWithin(
a.position, -- the position of each address
ST_SetSRID(ST_Point(10.0, 10.0), 4326), -- the center of the circle
1000, -- radius distance in meters
true
);
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE
l.address_id IN (1, 2, 3, 4, 5) -- potentially thousands of values
;
where the values in l.address_id IN (1, 2, 3, 4, 5) come from the first query.
The query plans for the two split queries look simpler than the first one's, but I wonder if that in itself means that the second solution is better.
I know that inner joins are pretty well optimized, and that a single round-trip to the DB would be preferable.
What about memory usage? Or resource contention on the tables? (e.g. locks)
I (re-)combined your second code into a single query, using IN(...):
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE l.address_id IN ( --- only search for the addresses
SELECT a.id
FROM addresses a
WHERE ST_DWithin(a.position, ST_SetSRID(ST_Point(10.0, 10.0), 4326), 1000 true)
);
Or, similarly ,using EXISTS(...):
--- get the rest of the data
SELECT
s.id AS shop_id,
s.name AS shop_name,
i.status AS inventory_status,
l.id AS location_id,
l.uuid AS location_uuid
FROM locations l
JOIN shops s ON s.location_id = l.id
JOIN inventories i ON i.shop_id = s.id
WHERE EXISTS ( SELECT * --- only search for the addresses
FROM addresses a
WHERE a.id = l.address_id
AND ST_DWithin( a.position, ST_SetSRID(ST_Point(10.0, 10.0), 4326), 1000, true)
);

Returning rows with distinct column value with data jpa named query

Assuming I have a table with 3 columns, ID, Name, City and I want to use named query to return rows with unique city..can it be done?
Are you asking whether it is possible to write a query that will return the cities that appear in exactly one row, in a table that has ID/Name/City triplets where there could be multiple rows for the same city but with different names?
If so, it would depend on the database engine behind the scenes - but you could try things like:
with candidates (city, num) as (
select city, count(*) from table
group by city
)
select city from candidates where num = 1
Or
select t1.city from table t1
where not exists (
select * from table t2
where t2.city = t1.city and t2.id <> t1.id
)
where table is your table with these triplets.

find rows with same values in mulitple columns with ID's

i have a relatively big table, with a lot of columns and rows.
Among them i have ID, longitude and latitude.
I would like to have a list of ID's which have the same coordinates (latitude and longitude)
something like this
ID¦latitude¦longitude¦number
1 ¦ 12.12¦ 34.54¦1
12¦ 12.12¦ 34.54¦1
52¦ 12.12¦ 34.54¦1
3 ¦ 56.08¦ -45.87¦1
67¦ 56.08¦ -45.87¦1
Thanks
You can either use an EXISTS query:
select *
from the_table t1
where exists (select 1
from the_table t2
where t1.id <> t2.id
and (t1.latitude, t1.longitude) = (t2.latitude, t2.longitude))
order by latitude, longitude;
or a window function:
select *
from (
select t.*,
count(*) over (partition by latitude, longitude) as cnt
from the_table t
) t
where cnt > 1
order by latitude, longitude;
Online example: http://rextester.com/ITKJ70005
Simple solution:
SELECT
t.id, t.latitude, t.longitude, grp.tot
FROM
your_table t INNER JOIN (
SELECT latitude, longitude, count(*) AS tot
FROM your_table
GROUP BY latitude, longitude
HAVING count(*) > 1
) grp ON (t.latitude = grp.latitude AND t.longitude = grp.longitude);
Or to get duplicates for lat/lng:
SELECT
latitude, longitude,
array_agg(id ORDER BY id) AS ids
FROM
place
GROUP BY
latitude, longitude
HAVING
count(*) > 1;