How do I split text into multiple fields using Postgresql? - postgresql

I have a table with a column that needs to be split and inserted into a new table. Column's name is location and has data that could look like Detroit, MI, USA;Chicago, IL, USA or as simple as USA.
Ultimately, I want to insert the data into a new dimension table that looks like:
City | State | Country|
Detroit MI USA
Chicago IL USA
NULL NULL USA
I came across the string_to_array function and am able to split the larger example (Detroit, MI, USA; Chicago, IL, USA) into 2 strings of Detroit, MI, USA and Chicago, IL, USA.
Now I'm stumped on how to split those strings again and then insert them. Since there are two strings separated by a comma, does using string_to_array again work? It doesn't seem to work in Sqlfiddle.
Note: I'm using Sqlfiddle right now since I don't have access to my Redshift table at the moment.
This is for Redshift, which unfortunately is still using PostGresql 8.0.2 and thus does not have the unnest function

postgres=# select v[1] as city, v[1] as state, v[2] as country
from (select string_to_array(unnest(string_to_array(
'Detroit, MI, USA;Chicago, IL, USA',';')),',')) s(v);
┌─────────┬─────────┬─────────┐
│ city │ state │ country │
╞═════════╪═════════╪═════════╡
│ Detroit │ Detroit │ MI │
│ Chicago │ Chicago │ IL │
└─────────┴─────────┴─────────┘
(2 rows)
Tested on Postgres, not sure if it will work on Redshift too
Next query should to work on every Postgres
select v[1] as city, v[1] as state, v[2] as country
from (select string_to_array(v, ',') v
from unnest(string_to_array(
'Detroit, MI, USA;Chicago, IL, USA',';')) g(v)) s;
It use old PostgreSQL trick - using derived table.
SELECT v[1], v[2] FROM (SELECT string_to_array('1,2',',')) g(v)
Unnest function:
CREATE OR REPLACE FUNCTION _unnest(anyarray)
RETURNS SETOF anyelement AS '
BEGIN
FOR i IN array_lower($1,1) .. array_upper($1,1) LOOP
RETURN NEXT $1[i];
END LOOP;
RETURN;
END;
' LANGUAGE plpgsql;

Related

How to concatenate strings of a string field in a PostgreSQL 'WITH RECURSIVE' query?

As a follow up to this question How to concatenate strings of a string field in a PostgreSQL 'group by' query?
I am looking for a way to concatenate the strings of a field within a WITH RECURSIVE query (and NOT using GORUP BY). So for example, I have a table:
ID COMPANY_ID EMPLOYEE
1 1 Anna
2 1 Bill
3 2 Carol
4 2 Dave
5 3 Tom
and I wanted to group by company_id, ordered by the count of EMPLOYEE, to get something like:
COMPANY_ID EMPLOYEE
3 Tom
1 Anna, Bill
2 Carol, Dave
It's simple with GROUP BY:
SELECT company_id, string_agg(employee, ', ' ORDER BY employee) AS employees
FROM tbl
GROUP BY company_id
ORDER BY count(*), company_id;
Sorting in a subquery is typically faster:
SELECT company_id, string_agg(employee, ', ') AS employees
FROM (SELECT company_id, employee FROM tbl ORDER BY 1, 2) t
GROUP BY company_id
ORDER BY count(*), company_id;
As academic proof of concept: an rCTE solution without using any aggregate or window functions:
WITH RECURSIVE rcte AS (
(
SELECT DISTINCT ON (1)
company_id, employee, ARRAY[employee] AS employees
FROM tbl
ORDER BY 1, 2
)
UNION ALL
SELECT r.company_id, e.employee, r.employees || e.employee
FROM rcte r
CROSS JOIN LATERAL (
SELECT t.employee
FROM tbl t
WHERE t.company_id = r.company_id
AND t.employee > r.employee
ORDER BY t.employee
LIMIT 1
) e
)
SELECT company_id, array_to_string(employees, ', ') AS employees
FROM (
SELECT DISTINCT ON (1)
company_id, cardinality(employees) AS emp_ct, employees
FROM rcte
ORDER BY 1, 2 DESC
) sub
ORDER BY emp_ct, company_id;
db<>fiddle here
Related:
Select first row in each GROUP BY group?
Optimize GROUP BY query to retrieve latest row per user
Concatenate multiple result rows of one column into one, group by another column
No group by here:
select * from tarded;
┌────┬────────────┬──────────┐
│ id │ company_id │ employee │
├────┼────────────┼──────────┤
│ 1 │ 1 │ Anna │
│ 2 │ 1 │ Bill │
│ 3 │ 2 │ Carol │
│ 4 │ 2 │ Dave │
│ 5 │ 3 │ Tom │
└────┴────────────┴──────────┘
(5 rows)
with recursive firsts as (
select id, company_id,
first_value(id) over w as first_id,
row_number() over w as rn,
count(1) over (partition by company_id) as ncompany,
employee
from tarded
window w as (partition by company_id
order by id)
), names as (
select company_id, id, employee, rn, ncompany
from firsts
where id = first_id
union all
select p.company_id, c.id, concat(p.employee, ', ', c.employee), c.rn, p.ncompany
from names p
join firsts c
on c.company_id = p.company_id
and c.rn = p.rn + 1
)
select company_id, employee
from names
where rn = ncompany
order by ncompany, company_id;
┌────────────┬─────────────┐
│ company_id │ employee │
├────────────┼─────────────┤
│ 3 │ Tom │
│ 1 │ Anna, Bill │
│ 2 │ Carol, Dave │
└────────────┴─────────────┘
(3 rows)

Postgres array comparison - find missing elements

I have the table below.
╔════════════════════╦════════════════════╦═════════════╗
║id ║arr1 ║arr2 ║
╠════════════════════╬════════════════════╬═════════════╣
║1 ║{1,2,3,4} ║{2,1,7} ║
║2 ║{0} ║{3,4,5} ║
╚════════════════════╩════════════════════╩═════════════╝
I want to find out the elements which are in arr1 and not in arr2.
Expected output
╔════════════════════╦════════════════════╗
║id ║diff ║
╠════════════════════╬════════════════════╣
║1 ║{3,4} ║
║2 ║{0} ║
╚════════════════════╩════════════════════╝
If I have 2 individual arrays, I can do as follows:
select array_agg(elements)
from (
select unnest(array[0])
except
select unnest(array[3,4,5])
) t (elements)
But I am unable to integrate this code to work by selecting from my table.
Any help would be highly appreciated. Thank you!!
I would write a function for this:
create function array_diff(p_one int[], p_other int[])
returns int[]
as
$$
select array_agg(item)
from (
select *
from unnest(p_one) item
except
select *
from unnest(p_other)
) t
$$
language sql
stable;
Then you can use it like this:
select id, array_diff(arr1, arr2)
from the_table
A much faster alternative is to install the intarray module and use
select id, arr1 - arr2
from the_table
You should use except for each id and after that group by for each group
Demo
with diff_data as (
select id, unnest(arr1) as data
from test_table
except
select id, unnest(arr2) as data
from test_table
)
select id, array_agg(data order by data) as diff
from diff_data
group by id

How can I get a CIDR from two IPs in PostgreSQL?

In PostgreSQL, I can get the upper and lower boundary of a CIDR-range, like below.
But how can I get the CIDR from two IP addresses (by SQL) ?
e.g.
input "192.168.0.0";"192.168.255.255"
output "192.168.0.0/16"
SELECT
network
,network::cidr
-- http://technobytz.com/ip-address-data-types-postgresql.html
--,netmask(network::cidr) AS nm
--,~netmask(network::cidr) AS nnm
,host(network::cidr) AS lower
,host(broadcast(network::cidr)) AS upper -- broadcast: last address in the range
,family(network::cidr) as fam -- IPv4, IPv6
,masklen(network::cidr) as masklen
FROM
(
SELECT CAST('192.168.1.1/32' AS varchar(100)) as network
UNION SELECT CAST('192.168.0.0/16' AS varchar(100)) as network
--UNION SELECT CAST('192.168.0.1/16' AS varchar(100)) as network
) AS tempT
I think you are looking for inet_merge:
test=> SELECT inet_merge('192.168.0.0', '192.168.128.255');
┌────────────────┐
│ inet_merge │
├────────────────┤
│ 192.168.0.0/16 │
└────────────────┘
(1 row)

comparing each record of a table for some columns

I have a table TEST which has like this records
ID USERNAME IPADDRS CONNTIME country
8238237 XYZ 10.16.199.20 11:00:00 USA
8255237 XYZ 10.16.199.20 11:00:00 UK
485337 ABC 10.16.199.22 12:25:00 UK
8238237 ABC 10.16.199.23 02:45:00 INDIA
I have to compare each record and has to get ID value of the records which has the country column as UK and having same USERNAME,IPADDRS and CONNTME.
means USERNAME,IPADDRSS,CONNTIME should be equal but final filter will go on country UK.
so output will be ID=8255237 for above Table.
Appreciate your help.Thanks!
Well, SQL is descriptive. So you should describe what you want. How about this?
select a.id from ip a where a.country='UK' and (a.username,a.ipaddrs,a.conntime) in (select username,ipaddrs,conntime from ip where country<>'UK')
Basically you select the ID for those that match the required triplet, but the matching record should not be from UK. This is basic SQL and should run on all systems. Disclaimer: You might need indexes for performance.
Try this:
SELECT a.ID FROM (SELECT ID,USERNAME,IPADDRS,CONNTIME,COUNTRY,ROW_NUMBER()OVER(PARTITION BY USERNAME,IPADDRS,CONNTIME ORDER BY USERNAME,IPADDRS,CONNTIME) AS seq
FROM EMP_IP) a WHERE a.COUNTRY = 'UK' AND a.seq > 1;

removing zwnj character from string in postgres during select statement

The utf encoded string contains Zwnj(zero width non joiner) at the end and is stored in database.
Is it possible to remove that character during select statement. I tried trim() but doesn't work.
CREATE TABLE test (x text);
INSERT INTO test VALUES (E'abc');
INSERT INTO test VALUES (E'foo\u200C'); -- U+200C = ZERO WIDTH NON-JOINER
SELECT x, octet_length(x) FROM test;
x │ octet_length
─────┼──────────────
abc │ 3
foo │ 6
(2 rows)
CREATE TABLE test2 AS SELECT replace(x, E'\u200C', '') AS x FROM test;
SELECT x, octet_length(x) FROM test2;
x │ octet_length
─────┼──────────────
abc │ 3
foo │ 3
(2 rows)
You need to use replace(your_column, 'Zwnj','') instead of trim()