How to copy into Postgres table from csv with added column?

How to copy into Postgres table from csv with added column? - postgresql

I have a table in Postgres that I would like to copy into from a csv file. I usually do as so:
\copy my_table from '/workdir/some_file.txt' with null as 'NULL' delimiter E'|' csv header;
The problem is now however that my_table has one column extra that I would like to fill in manually on copy, with the same value 'b'. Here are my tables:
some_file.txt:
col1 | col2 | col3
0 0 1
0 1 3
my_table :
xtra_col | col1 | col2 | col3
a 5 2 5
a 6 2 5
a 7 2 5
Desired my_table after copy into:
xtra_col | col1 | col2 | col3
a 5 2 5
a 6 2 5
a 7 2 5
b 0 0 1
b 0 1 3
Is there a way to mention the persisting 'b' value in the copy statement for column `xtra_col'. If not, how should I approach this problem?

You could set a (temporary) default value for the xtra_col:
ALTER TABLE my_table ALTER COLUMN xtra_col SET DEFAULT 'b';
COPY my_table (col1, col2, col3) FROM '/workdir/some_file.txt' WITH (FORMAT CSV, DELIMITER '|', NULL 'NULL', HEADER true);
ALTER TABLE my_table ALTER COLUMN xtra_col DROP DEFAULT;
is there a way to not repeat columns in my_table? the real my_table has 20 columns and i wouldnt want to call all of them.
If my_table has a lot of columns and you wish to avoid having to type out all the column names,
you could dynamically generate the COPY command like this:
SELECT format($$COPY my_table(%s) FROM '/workdir/some_file.txt' WITH (FORMAT CSV, DELIMITER '|', NULL 'NULL', HEADER true);$$
, string_agg(quote_ident(attname), ','))
FROM pg_attribute
WHERE attrelid = 'my_table'::regclass
AND attname != 'xtra_col'
AND attnum > 0
you could then copy-and-paste the SQL to run it.
Or, for totally hands-free operation, you could create a function to generate the SQL and execute it:
CREATE OR REPLACE FUNCTION test_func(filepath text, xcol text, fillval text)
RETURNS void
LANGUAGE plpgsql
AS $func$
DECLARE sql text;
BEGIN
EXECUTE format($$ALTER TABLE my_table ALTER COLUMN %s SET DEFAULT '%s';$$, xcol, fillval);
SELECT format($$COPY my_table(%s) FROM '%s' WITH (FORMAT CSV, DELIMITER '|', NULL 'NULL', HEADER true);$$
, string_agg(quote_ident(attname), ','), filepath)
INTO sql
FROM pg_attribute
WHERE attrelid = 'my_table'::regclass
AND attname != 'xtra_col'
AND attnum > 0;
EXECUTE sql;
EXECUTE format($$ALTER TABLE my_table ALTER COLUMN %s DROP DEFAULT;$$, xcol);
END;
$func$;
SELECT test_func('/workdir/some_file.txt', 'xtra_col', 'b');
This is the sql I used to test the solution above:
DROP TABLE IF EXISTS test;
CREATE TABLE test (
xtra_col text
, col1 int
, col2 int
, col3 int
);
INSERT INTO test VALUES
('a', 5, 2, 5)
, ('a', 6, 2, 5)
, ('a', 7, 2, 5);
with the contents of /tmp/data being
col1 | col2 | col3
0 | 0 | 1
0 | 1 | 3
Then
SELECT test_func('/tmp/data', 'xtra_col', 'b');
SELECT * FROM test;
results in
+----------+------+------+------+
| xtra_col | col1 | col2 | col3 |
+----------+------+------+------+
| a | 5 | 2 | 5 |
| a | 6 | 2 | 5 |
| a | 7 | 2 | 5 |
| b | 0 | 0 | 1 |
| b | 0 | 1 | 3 |
+----------+------+------+------+
(5 rows)
Regarding the pg.dropped column:
The test_func call does not seem to produce the pg.dropped column, at least on the test table used above:
unutbu=# SELECT *
FROM pg_attribute
WHERE attrelid = 'test'::regclass;
+----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------+
| attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions |
+----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------+
| 53393 | tableoid | 26 | 0 | 4 | -7 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | cmax | 29 | 0 | 4 | -6 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | xmax | 28 | 0 | 4 | -5 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | cmin | 29 | 0 | 4 | -4 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | xmin | 28 | 0 | 4 | -3 | 0 | -1 | -1 | t | p | i | t | f | | f | t | 0 | 0 | | | |
| 53393 | ctid | 27 | 0 | 6 | -1 | 0 | -1 | -1 | f | p | s | t | f | | f | t | 0 | 0 | | | |
| 53393 | xtra_col | 25 | -1 | -1 | 1 | 0 | -1 | -1 | f | x | i | f | f | | f | t | 0 | 100 | | | |
| 53393 | col1 | 23 | -1 | 4 | 2 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | |
| 53393 | col2 | 23 | -1 | 4 | 3 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | |
| 53393 | col3 | 23 | -1 | 4 | 4 | 0 | -1 | -1 | t | p | i | f | f | | f | t | 0 | 0 | | | |
+----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------+
(10 rows)
As far as I know, the pg.dropped column is a natural result of how PostgreSQL works when a column is dropped. So no fix is necessary.
Rows whose attname contains pg.dropped also have a negative attnum.
This is why attnum > 0 was used in test_func -- to remove such rows from the generated list of column names.
My experience with Postgresql is limited, so I might be wrong. If you can produce an example which generates a pg.dropped "column" with positive attnum, I'd very much like to see it.

I usually load a file into a temporary table then insert (or update) from there. In this case,
CREATE TEMP TABLE input (LIKE my_table);
ALTER TABLE input DROP xtra_col;
\copy input from 'some_file.txt' ...
INSERT INTO my_table
SELECT 'b', * FROM input;
The INSERT statement looks tidy, but that can only really be achieved when the columns you want to exclude are on either end of my_table. In your (probably simplified) example, xtra_col is at the front so we can quickly append the remaining columns using *.
If the arrangement of CSV file columns differs my_table much more than that, you'll need to start typing out column names.

Related

Find rows in relation with at least n rows in a different table without joins

I have a table as such (tbl):
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
| 3 | faso | 11 |
+----+------+-----+
And another table in n-to-1 relationship with tbl (tbl2):
+----+-----+
| pk | rel |
+----+-----+
| 0 | 0 |
| 1 | 1 |
| 2 | 0 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 1 |
| 7 | 2 |
+----+-----+
(tbl2.rel -> tbl.pk.)
I would like to select only the rows from tbl which are in relationship with at least n rows from tbl2.
I.e., for n = 2, I want this table:
+----+------+-----+
| pk | attr | val |
+----+------+-----+
| 0 | ohif | 4 |
| 1 | foha | 56 |
| 2 | slns | 2 |
+----+------+-----+
This is the solution I came up with:
SELECT DISTINCT ON (tbl.pk) tbl.*
FROM (
SELECT tbl.pk
FROM tbl
RIGHT OUTER JOIN tbl2 ON tbl2.rel = tbl.pk
GROUP BY tbl.pk
HAVING COUNT(tbl2.*) >= 2 -- n
) AS tbl_candidates
LEFT OUTER JOIN tbl ON tbl_candidates.pk = tbl.pk
Can it be done without selecting the candidates with a subquery and re-joining the table with itself?
I'm on Postgres 10. A standard SQL solution would be better, but a Postgres solution is acceptable.

OK, just join once, as below:
select
t1.pk,
t1.attr,
t1.val
from
tbl t1
join
tbl2 t2 on t1.pk = t2.rel
group by
t1.pk,
t1.attr,
t1.val
having(count(1)>=2) order by t1.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Or just join once and use CTE(with clause), as below:
with tmp as (
select rel from tbl2 group by rel having(count(1)>=2)
)
select b.* from tmp t join tbl b on t.rel = b.pk order by b.pk;
pk | attr | val
----+------+-----
0 | ohif | 4
1 | foha | 56
2 | slns | 2
(3 rows)
Is the SQL clearer?

Replace null by negative id number in not consecutive rows in hive

I have this table in my database:
| id | desc |
|-------------|
| 1 | A |
| 2 | B |
| NULL | C |
| 3 | D |
| NULL | D |
| NULL | E |
| 4 | F |
---------------
And I want to transform this table into a table that replace nulls by consecutive negative ids:
| id | desc |
|-------------|
| 1 | A |
| 2 | B |
| -1 | C |
| 3 | D |
| -2 | D |
| -3 | E |
| 4 | F |
---------------
Anyone knows how can I do this in hive?

Below approach works
select coalesce(id,concat('-',ROW_NUMBER() OVER (partition by id))) as id,desc from database_name.table_name;

Finding value difference in column pairs

I'm using SQL server 2008R2 and I have a view which returns the following:
+----+-------+-------+-------+-------+-------+-------+
| ID | col1A | col1B | col2A | col2B | col3A | col3B |
+----+-------+-------+-------+-------+-------+-------+
| 1 | 1 | 1 | 3 | 5 | 4 | 4 |
| 2 | 1 | 1 | 5 | 5 | 5 | 4 |
| 3 | 3 | 4 | 5 | 5 | 4 | 4 |
| 4 | 1 | 2 | 5 | 5 | 4 | 3 |
| 5 | 1 | 1 | 2 | 2 | 3 | 3 |
+----+-------+-------+-------+-------+-------+-------+
As you can see this view contains column pairs (col1A and col1B), (col2A and col2B), (col3A and col3B).
I need to query this view and find rows where the column pairs contain different values.
So I would be looking to return:
+----+------------+---+-----+
| ID | ColumnType | A | B |
+----+------------+---+-----+
| 1 | Col2 | 3 | 5 |
| 2 | Col3 | 5 | 4 |
| 3 | Col1 | 3 | 4 |
| 4 | Col1 | 1 | 2 |
| 4 | Col3 | 4 | 3 |
+----+------------+---+-----+
I think I need to use UNPIVOT but not sure how – appreciate any suggestions?

Since you are using SQL Server 2008+ you can use CROSS APPLY to unpivot the pair of columns and then you can easily compare the values in the A and B to return the rows that don't match:
select t.ID,
c.ColumnType,
c.A,
c.B
from [dbo].[yourview] t
cross apply
(
values
('Col1', Col1A, Col1B),
('Col2', Col2A, Col2B),
('Col3', Col3A, Col3B)
) c (ColumnType, A, B)
where c.A <> c.B;
If you have different datatypes in your columns, then you'll need to convert the data to the same type. You can do this conversion within the VALUES clause:
select t.ID,
c.ColumnType,
c.A,
c.B
from [dbo].[yourview] t
cross apply
(
values
('Col1', cast(Col1A as varchar(50)), Col1B),
('Col2', cast(Col2A as varchar(50)), Col2B),
('Col3', cast(Col3A as varchar(50)), Col3B)
) c (ColumnType, A, B)
where c.A <> c.B

Postgres values as columns

I am working with PostgreSQL 9.3, and I have this:
PARENT_TABLE
ID | NAME
1 | N_A
2 | N_B
3 | N_C
CHILD_TABLE
ID | PARENT_TABLE_ID | KEY | VALUE
1 | 1 | K_A | V_A
2 | 1 | K_B | V_B
3 | 1 | K_C | V_C
5 | 2 | K_A | V_D
6 | 2 | K_C | V_E
7 | 3 | K_A | V_F
8 | 3 | K_B | V_G
9 | 3 | K_C | V_H
Note that I might add K_D in KEY's, it's completely dynamic.
What I want is a query that returns me the following:
QUERY_TABLE
ID | NAME | K_A | K_B | K_C | others K_...
1 | N_A | V_A | V_B | V_C | ...
2 | N_B | V_D | | V_E | ...
3 | N_C | V_F | V_G | V_H | ...
Is this possible to do ? If so, how ?

Since there can be values missing, you need the "safe" form of crosstab() with the column names as second parameter:
SELECT * FROM crosstab(
'SELECT p.id, p.name, c.key, c."value"
FROM parent_table p
LEFT JOIN child_table c ON c.parent_table_id = p.id
ORDER BY 1'
,$$VALUES ('K_A'::text), ('K_B'), ('K_C')$$)
AS t (id int, name text, k_a text, k_b text, k_c text; -- use actual data types
Details in this related answer:
PostgreSQL Crosstab Query
About adding "extra" columns:
Pivot on Multiple Columns using Tablefunc

PostgreSQL group by bug on Unicode strings?

I have a very weird thing happening, where I noticed that a group by (word) wasn't always grouping by word if that word is a UTF-8 string. In the same query, I get cases where it's been grouped correctly, and cases where it hasn't. I wonder if anybody knows what's up with that?
select *,count(*) over (partition by md5(word)) as k
from (
select word,count(*) as n
from :tmpwl
group by 1
) a order by 1,2 limit 12;
/* gives:
word | n | k
------+---+---
いい | 1 | 1
くず | 1 | 1
ごみ | 1 | 1
さま | 1 | 1
さん | 1 | 1
へま | 1 | 1
まめ | 1 | 1
よく | 1 | 1
ろく | 1 | 1
ネガ | 1 | 2 -- what the heck?
ネガ | 1 | 2
パス | 1 | 1
*/
Note that the following workaround works fine:
select word,n,count(*) over (partition by md5(word)) as k
from (
select md5(word),max(word) as word,count(*) as n
from :tmpwl
group by 1
) a order by 1,2 limit 12;
/* gives:
word | n | k
------+---+---
いい | 1 | 1
くず | 1 | 1
ごみ | 1 | 1
さま | 1 | 1
さん | 1 | 1
へま | 1 | 1
まめ | 1 | 1
よく | 1 | 1
ろく | 1 | 1
ネガ | 2 | 1
パス | 1 | 1
プア | 1 | 1
*/
The version is PostgreSQL 8.2.14 (Greenplum Database 4.0.4.0 build 3 Single-Node Edition) on x86_64-unknown-linux-gnu, compiled by GCC gcc.exe (GCC) 4.1.1 compiled on Nov 30 2010 17:20:26.
The source table :tmpwl:
\d :tmpwl
Table "pg_temp_25149.pdtmp_foo706453357357532"
Column | Type | Modifiers
----------+---------+-----------
baseword | text |
word | text |
value | integer |
lexicon | text |
nalts | bigint |
Distributed by: (word)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to copy into Postgres table from csv with added column? - postgresql

Related

Find rows in relation with at least n rows in a different table without joins

Replace null by negative id number in not consecutive rows in hive

Finding value difference in column pairs

Postgres values as columns

PostgreSQL group by bug on Unicode strings?

Categories

Resources