postgres inner JOIN query out of memory - postgresql

I am trying to consult a database using pgAdmin3 and I need to join to tables. I am using the following code:
SELECT table1.species, table1.trait, table1.value, table1.units, table2.id, table2.family, table2.latitude, table2.longitude, table2.species as speciescheck
FROM table1 INNER JOIN table2
ON table1.species = table2.species
But I keep running this error:
an out of memory error
So I've tried to insert my result in a new table, as follow:
CREATE TABLE new_table AS
SELECT table1.species, table1.trait, table1.value, table1.units, table2.id, table2.family, table2.latitude, table2.longitude, table2.species as speciescheck
FROM table1 INNER JOIN table2
ON table1.species = table2.species
And still got an error:
ERROR: could not extend file "base/17675/43101.15": No space left on device
SQL state: 53100
Hint: Check free disk space.
I am very very new at this (is the first time I have to deal with PostgreSQL) and I guess I can do something to optimize this query and avoid this type of error. I have no privileges in the database. Can anyone help??
Thanks in advance!
Updated:
Table 1 description
-- Table: table1
-- DROP TABLE table1;
CREATE TABLE table1
(
species character varying(100),
trait character varying(50),
value double precision,
units character varying(50)
)
WITH (
OIDS=FALSE
);
ALTER TABLE table1
OWNER TO postgres;
GRANT ALL ON TABLE table1 TO postgres;
GRANT SELECT ON TABLE table1 TO banco;
-- Index: speciestable1_idx
-- DROP INDEX speciestable1_idx;
CREATE INDEX speciestable1_idx
ON table1
USING btree
(species COLLATE pg_catalog."default");
-- Index: traittype_idx
-- DROP INDEX traittype_idx;
CREATE INDEX traittype_idx
ON table1
USING btree
(trait COLLATE pg_catalog."default");
and table2 as:
-- Table: table2
-- DROP TABLE table2;
CREATE TABLE table2
(
id integer NOT NULL,
family character varying(40),
species character varying(100),
plotarea real,
latitude double precision,
longitude double precision,
source integer,
latlon geometry,
CONSTRAINT table2_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE table2
OWNER TO postgres;
GRANT ALL ON TABLE table2 TO postgres;
GRANT SELECT ON TABLE table2 TO banco;
-- Index: latlon_gist
-- DROP INDEX latlon_gist;
CREATE INDEX latlon_gist
ON table2
USING gist
(latlon);
-- Index: species_idx
-- DROP INDEX species_idx;
CREATE INDEX species_idx
ON table2
USING btree
(species COLLATE pg_catalog."default");

You're performing a join between two tables on the column species.
Not sure what's in your data, but if species is a column with significantly fewer values than the number of records (e.g. if species is "elephant", "giraffe" and you're analyzing all animals in Africa), this join will match every elephant with every elephant.
When joining two tables most of the time you try to use a unique or close to unique attribute, like id (not sure what id means in your case, but could be it).

Related

Postgres: change primary key in existing table

I have a situation with two tables where one has a foreign key pointing to the other table (simplified) schema:
CREATE TABLE table1 (
name VARCHAR(64) NOT NULL,
PRIMARY KEY(name)
);
CREATE TABLE table2 (
id SERIAL PRIMARY KEY,
table1_name VARCHAR(64) NOT NULL REFERENCES table1(name)
);
Now I regret using the name column as primary key in table1 - and would like to add integer serial key instead. Since I already have data in the database I guess I need to do this carefully. My current plan is as follows:
Drop the foreign key constraint: table2(name) with ALTER TABLE table2 DROP CONSTRAINT table2_table1_name_fkey;
Drop the primary key constraint on table1(name) with ALTER TABLE table1 DROP CONSTRAINT name_pkey;.
Add a unique constraint on table1(name) with ALTER TABLE table1 ADD UNIQUE(name);
Add a automatic primary key to table1 with ALTER TABLE table1 ADD COLUMN ID SERIAL PRIMARY KEY;.
Add a new column table1_id to table2 with ALTER TABLE table2 ADD COLUMN table1_id INT;
Update all rows in table2 - so that the new column (which will be promoted to a foreign key) gets the correct value - as inferred by the previous (still present) foreign key table1_name.
I have completed steps up to an including step 5, but the UPDATE (with JOIN?) required to complete 6 is beyond my SQL paygrade. My current (google based ...) attempt looks like:
UPDATE
table2
SET
table2.table1_id = t1.id
FROM
table1 t1
LEFT JOIN table2 t2
ON t2.table1_name = t1.name;
You do not need JOIN in UPDATE.
UPDATE
table2 t2
SET
table1_id = t1.id
FROM
table1 t1
WHERE
t2.table1_name = t1.name;

Postgres - Oracle data type conversion

We have a foreign table that is connecting to Oracle. In Oracle, the columns are:
ticker: VARCHAR2(5)
article_id: NUMBER
In Postgres, we have tried to create the article_id as INTEGER and NUMERIC, but every time we try and query we get this error:
column "article_id" of foreign table "latest_article_id" cannot be converted to or from Oracle data type
How can we create this foreign table so we can query it? The article_id is a number, so is there additional commands we must use?
We are on Postgres 10.10.
CREATE FOREIGN TABLE latest_article_id
(ticker VARCHAR,
article_id NUMERIC)
SERVER usercomm
OPTIONS ( table '(SELECT article_id, ticker
FROM (SELECT a.article_id, t.ticker,
ROW_NUMBER() OVER (PARTITION BY t.ticker
ORDER BY a.publish_date DESC NULLS LAST) AS rnum
FROM tickers t, article_tickers at, articles a
WHERE t.ticker_id = at.ticker_id
AND at.article_id = a.article_id
AND a.status_id = 6
AND a.pull_flag = ''Y'')
WHERE rnum = 1)');

I'm getting column "my_column" contains null values' when adding a composite primary key

Is it not supposed to delete null values before altering the table? I'm confused...
My query looks roughly like this:
BEGIN;
DELETE FROM my_table
WHERE my_column IS NULL;
ALTER TABLE my_table DROP CONSTRAINT my_table_pk;
ALTER TABLE my_table ADD PRIMARY KEY (id, my_column);
-- this is to repopulate the data afterwards
INSERT INTO my_table (name, other_table_id, my_column)
SELECT
ya.name,
ot.id,
my_column
FROM other_table ot
LEFT JOIN yet_another ya
ON ya.id = ot."fileId"
WHERE NOT EXISTS (
SELECT
1
FROM my_table mt
WHERE ot.id = mt.other_table_id AND ot.my_column = mt.my_column
) AND my_column IS NOT NULL;
COMMIT;
sorry for naming
There are two possible explanations:
A concurrent session inserted a new row with a NULL value between the start of the DELETE and the start of ALTER TABLE.
To avoid that, lock the table in SHARE mode before you DELETE.
There is a row where id has a NULL value.

PostgreSQL count other values of ID that have the same value of other column

Let's say we have the following table that stores id of an observation and its address_id. You can create the table with the following code:
drop table if exists schema.pl_address_cnt;
create table schema.pl_address_cnt (
id serial,
address_id int);
insert into schema.pl_address_cnt(address_id) values
(100), (101), (100), (101), (100), (125), (128), (200), (200), (100);
My task is to count for each id how many other ids (thus -1) have the same address_id. I've come up with a solution that turns out to be quite expensive (explain) on the original dataset. I wonder whether my solution can be somehow optimised.
with tmp_table as (select address_id
, count(distinct id) as id_count
from schema.pl_address_cnt
group by address_id
)
select id
, id_count - 1
from schema.pl_address_cnt as pac
left join tmp_table as tt on tt.address_id=pac.address_id;
You can try to omit the CTE and do a self left join on common address but different ID and then aggregate this.
SELECT pac1.id,
count(pac2.id)
FROM pl_address_cnt pac1
LEFT JOIN pl_address_cnt pac2
ON pac1.address_id = pac2.address_id
AND pac1.id <> pac2.id
GROUP BY pac1.id
ORDER BY pac1.id;
For performance you can try indexes on (address_id, id) and (id).

Use COPY FROM command in PostgreSQL to insert in multiple tables

I'm trying to use the performance of COPY FROM command in PostgreSQL to get all data of 1 table of a CSV file (CSV -> table1) and I need to insert other data, but, in a new table. I will need of a primary key of first table to put as a foreign key in second table.
Example:
I need to insert 1,000,000 of names in table1 and 500,000 of names in table2, but, all names in table2 reference to 1 tuple in table1.
CREATE TABLE table1 (
table1Id bigserial NOT NULL,
Name varchar(100) NULL,
CONSTRAINT table1Id PRIMARY KEY (table1Id)
);
CREATE TABLE table2 (
table2Id bigserial NOT NULL,
Other_name varchar(100) NOT NULL
table1_table1Id int8 NOT NULL,
CONSTRAINT table2_pk PRIMARY KEY (table2Id)
);
Command COPY does not allow table manipulations while copying data (such as look up to other table for fetching proper foreign keys to insert). To insert into table2 ids for corresponding rows from table1 you need to drop NOT NULL constraint for that field, COPY data and then UPDATE that fields separately.
Assuming table1 and table2 tables can be joined by table1.Name = table2.Other_name, the code is:
Before COPY:
ALTER TABLE table2 ALTER COLUMN table1_table1Id DROP NOT NULL;
After COPY:
UPDATE table2 SET table2.table1_table1Id = table1.table1Id
FROM table1
WHERE table1.Name = table2.Other_name;
ALTER TABLE table2 ALTER COLUMN table1_table1Id SET NOT NULL;