removing zwnj character from string in postgres during select statement

removing zwnj character from string in postgres during select statement - postgresql

The utf encoded string contains Zwnj(zero width non joiner) at the end and is stored in database.
Is it possible to remove that character during select statement. I tried trim() but doesn't work.

CREATE TABLE test (x text);
INSERT INTO test VALUES (E'abc');
INSERT INTO test VALUES (E'foo\u200C'); -- U+200C = ZERO WIDTH NON-JOINER
SELECT x, octet_length(x) FROM test;
x │ octet_length
─────┼──────────────
abc │ 3
foo │ 6
(2 rows)
CREATE TABLE test2 AS SELECT replace(x, E'\u200C', '') AS x FROM test;
SELECT x, octet_length(x) FROM test2;
x │ octet_length
─────┼──────────────
abc │ 3
foo │ 3
(2 rows)

You need to use replace(your_column, 'Zwnj','') instead of trim()

Related

How can I split a string to x number columns and y number of rows?

I have a string looks like 'ab bc 123 cd de ef 232' and I need to split this to look like :
col1
col2
col3
ab
bc
123
cd de
ef
232
Numbers have to be in the first column, the last string before the numbers has to be in the second column and all characters before that has to be in the first column.
I am working on PostreSQL and have no idea how to do that

step-by-step demo: db<>fiddle
You can use regular expressions to split your strings:
regexp_match(mystring,'^(.+)\s(.+)\s(\d+)\s(.+)\s(.+)\s(\d+)$')
(see how the RegExp works: demo: regex101)
This results in an array of strings you expect. This array can be used to fill your table:
WITH textblocks AS ( -- 1
SELECT
regexp_match(mystring,'^(.+)\s(.+)\s(\d+)\s(.+)\s(.+)\s(\d+)$') AS r
FROM mytable1
)
INSERT INTO mytable2 (col1, col2, col3)
SELECT
r[1], r[2], r[3]
FROM textblocks
UNION -- 2
SELECT
r[4], r[5], r[6]
FROM textblocks
Execute the RegExp which splits the original string into a text array
Create two records from the text array and insert it into your table

How to concatenate strings of a string field in a PostgreSQL 'WITH RECURSIVE' query?

As a follow up to this question How to concatenate strings of a string field in a PostgreSQL 'group by' query?
I am looking for a way to concatenate the strings of a field within a WITH RECURSIVE query (and NOT using GORUP BY). So for example, I have a table:
ID COMPANY_ID EMPLOYEE
1 1 Anna
2 1 Bill
3 2 Carol
4 2 Dave
5 3 Tom
and I wanted to group by company_id, ordered by the count of EMPLOYEE, to get something like:
COMPANY_ID EMPLOYEE
3 Tom
1 Anna, Bill
2 Carol, Dave

It's simple with GROUP BY:
SELECT company_id, string_agg(employee, ', ' ORDER BY employee) AS employees
FROM tbl
GROUP BY company_id
ORDER BY count(*), company_id;
Sorting in a subquery is typically faster:
SELECT company_id, string_agg(employee, ', ') AS employees
FROM (SELECT company_id, employee FROM tbl ORDER BY 1, 2) t
GROUP BY company_id
ORDER BY count(*), company_id;
As academic proof of concept: an rCTE solution without using any aggregate or window functions:
WITH RECURSIVE rcte AS (
(
SELECT DISTINCT ON (1)
company_id, employee, ARRAY[employee] AS employees
FROM tbl
ORDER BY 1, 2
)
UNION ALL
SELECT r.company_id, e.employee, r.employees || e.employee
FROM rcte r
CROSS JOIN LATERAL (
SELECT t.employee
FROM tbl t
WHERE t.company_id = r.company_id
AND t.employee > r.employee
ORDER BY t.employee
LIMIT 1
) e
)
SELECT company_id, array_to_string(employees, ', ') AS employees
FROM (
SELECT DISTINCT ON (1)
company_id, cardinality(employees) AS emp_ct, employees
FROM rcte
ORDER BY 1, 2 DESC
) sub
ORDER BY emp_ct, company_id;
db<>fiddle here
Related:
Select first row in each GROUP BY group?
Optimize GROUP BY query to retrieve latest row per user
Concatenate multiple result rows of one column into one, group by another column

No group by here:
select * from tarded;
┌────┬────────────┬──────────┐
│ id │ company_id │ employee │
├────┼────────────┼──────────┤
│ 1 │ 1 │ Anna │
│ 2 │ 1 │ Bill │
│ 3 │ 2 │ Carol │
│ 4 │ 2 │ Dave │
│ 5 │ 3 │ Tom │
└────┴────────────┴──────────┘
(5 rows)
with recursive firsts as (
select id, company_id,
first_value(id) over w as first_id,
row_number() over w as rn,
count(1) over (partition by company_id) as ncompany,
employee
from tarded
window w as (partition by company_id
order by id)
), names as (
select company_id, id, employee, rn, ncompany
from firsts
where id = first_id
union all
select p.company_id, c.id, concat(p.employee, ', ', c.employee), c.rn, p.ncompany
from names p
join firsts c
on c.company_id = p.company_id
and c.rn = p.rn + 1
)
select company_id, employee
from names
where rn = ncompany
order by ncompany, company_id;
┌────────────┬─────────────┐
│ company_id │ employee │
├────────────┼─────────────┤
│ 3 │ Tom │
│ 1 │ Anna, Bill │
│ 2 │ Carol, Dave │
└────────────┴─────────────┘
(3 rows)

PSQL filter each group of rows

Recently I've faced with pretty rare filtering case in PSQL.
My question is: How to filter redundant elements in each group of the grouped table?
For example: we have a nexp table:
id | group_idx | filter_idx
1 1 x
2 3 z
3 3 x
4 2 x
5 1 x
6 3 x
7 2 x
8 1 z
9 2 z
Firstly, to group rows:
SELECT group_idx FROM table
GROUP BY group_idx;
But how I can filter redundant fields (filter_idx = z) from each group after grouping?
P.S. I can't just write like that because I need to find groups firstly.
SELECT group_idx FROM table
where filter_idx <> z;
Thanks.

Assuming that you want to see all groups at all times, even when you filter out all records of some group:
drop table if exists test cascade;
create table test (id integer, group_idx integer, filter_idx character);
insert into test
(id,group_idx,filter_idx)
values
(1,1,'x'),
(2,3,'z'),
(3,3,'x'),
(4,2,'x'),
(5,1,'x'),
(6,3,'x'),
(7,2,'x'),
(8,1,'z'),
(9,2,'z'),
(0,4,'y');--added an example of a group that would be discarded using WHERE.
Get groups in one query, filter your rows in another, then left join the two.
select groups.group_idx,
string_agg(filtered_rows.filter_idx,',')
from
(select distinct group_idx from test) groups
left join
(select group_idx,filter_idx from test where filter_idx<>'y') filtered_rows
using (group_idx)
group by 1;
-- group_idx | string_agg
-------------+------------
-- 3 | z,x,x
-- 4 |
-- 2 | x,x,z
-- 1 | x,x,z
--(4 rows)

How can I get a CIDR from two IPs in PostgreSQL?

In PostgreSQL, I can get the upper and lower boundary of a CIDR-range, like below.
But how can I get the CIDR from two IP addresses (by SQL) ?
e.g.
input "192.168.0.0";"192.168.255.255"
output "192.168.0.0/16"
SELECT
network
,network::cidr
-- http://technobytz.com/ip-address-data-types-postgresql.html
--,netmask(network::cidr) AS nm
--,~netmask(network::cidr) AS nnm
,host(network::cidr) AS lower
,host(broadcast(network::cidr)) AS upper -- broadcast: last address in the range
,family(network::cidr) as fam -- IPv4, IPv6
,masklen(network::cidr) as masklen
FROM
(
SELECT CAST('192.168.1.1/32' AS varchar(100)) as network
UNION SELECT CAST('192.168.0.0/16' AS varchar(100)) as network
--UNION SELECT CAST('192.168.0.1/16' AS varchar(100)) as network
) AS tempT

I think you are looking for inet_merge:
test=> SELECT inet_merge('192.168.0.0', '192.168.128.255');
┌────────────────┐
│ inet_merge │
├────────────────┤
│ 192.168.0.0/16 │
└────────────────┘
(1 row)

How do I split text into multiple fields using Postgresql?

I have a table with a column that needs to be split and inserted into a new table. Column's name is location and has data that could look like Detroit, MI, USA;Chicago, IL, USA or as simple as USA.
Ultimately, I want to insert the data into a new dimension table that looks like:
City | State | Country|
Detroit MI USA
Chicago IL USA
NULL NULL USA
I came across the string_to_array function and am able to split the larger example (Detroit, MI, USA; Chicago, IL, USA) into 2 strings of Detroit, MI, USA and Chicago, IL, USA.
Now I'm stumped on how to split those strings again and then insert them. Since there are two strings separated by a comma, does using string_to_array again work? It doesn't seem to work in Sqlfiddle.
Note: I'm using Sqlfiddle right now since I don't have access to my Redshift table at the moment.
This is for Redshift, which unfortunately is still using PostGresql 8.0.2 and thus does not have the unnest function

postgres=# select v[1] as city, v[1] as state, v[2] as country
from (select string_to_array(unnest(string_to_array(
'Detroit, MI, USA;Chicago, IL, USA',';')),',')) s(v);
┌─────────┬─────────┬─────────┐
│ city │ state │ country │
╞═════════╪═════════╪═════════╡
│ Detroit │ Detroit │ MI │
│ Chicago │ Chicago │ IL │
└─────────┴─────────┴─────────┘
(2 rows)
Tested on Postgres, not sure if it will work on Redshift too
Next query should to work on every Postgres
select v[1] as city, v[1] as state, v[2] as country
from (select string_to_array(v, ',') v
from unnest(string_to_array(
'Detroit, MI, USA;Chicago, IL, USA',';')) g(v)) s;
It use old PostgreSQL trick - using derived table.
SELECT v[1], v[2] FROM (SELECT string_to_array('1,2',',')) g(v)
Unnest function:
CREATE OR REPLACE FUNCTION _unnest(anyarray)
RETURNS SETOF anyelement AS '
BEGIN
FOR i IN array_lower($1,1) .. array_upper($1,1) LOOP
RETURN NEXT $1[i];
END LOOP;
RETURN;
END;
' LANGUAGE plpgsql;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

removing zwnj character from string in postgres during select statement - postgresql

The utf encoded string contains Zwnj(zero width non joiner) at the end and is stored in database. Is it possible to remove that character during select statement. I tried trim() but doesn't work.

You need to use replace(your_column, 'Zwnj','') instead of trim()

Related

How can I split a string to x number columns and y number of rows?

How to concatenate strings of a string field in a PostgreSQL 'WITH RECURSIVE' query?

PSQL filter each group of rows

How can I get a CIDR from two IPs in PostgreSQL?

How do I split text into multiple fields using Postgresql?

Categories

Resources