Postgresql, get string after multiple ':' character - postgresql

I have the following column with entries
01:02:02:02
02:01:100:300
128:02:12:02
input
I need a way to choose parts I want to display like
01:02:02
02:01:100
128:02:12
output
or
01:02
02:01
128:02
I tried suggested solutions in similar questions without success like
select substring(column_name, '[^:]*$') from table_name;
how could this work?

To get the first three parts, you can use
SELECT substring(column_name FROM '^(([^:]*:){2}[^:]*)')
FROM table_name;
For the first two parts, omit the {2}. For the first four parts, make it {3}.

try split_part (where you can specify which occurrence you want), eg:
t=# with s as (select '128:02:12:02'::text m) select split_part(m,':',1),split_part(m,':',2) from s;
split_part | split_part
------------+------------
128 | 02
(1 row)

Related

Using a list as replacement for singular patterns in regexp_replace

I have a table that I need to delete random words/characters out of. To do this, I have been using a regexp_replace function with the addition of multiple patterns. An example is below:
select regexp_replace(combined,'\y(NAME|001|CONTAINERS:|MT|COUNT|PCE|KG|PACKAGE)\y','', 'g')
as description, id from export_final;
However, in the full list, there are around 70 different patterns that I replace out of the description. As you can imagine, the code if very cluttered: This leads me to my question. Is there a way to put these patterns into another table then use that table to check the descriptions?
Of course. Populate your desired 'other' table with what patterns you need. Then create a CTE that uses string_agg function to build the regex. Example:
create table exclude_list( pattern_word text);
insert into exclude_list(pattern_word)
values('NAME'),('001'),('CONTAINERS:'),('MT'),('COUNT'),('PCE'),('KG'),('PACKAGE');
with exclude as
( select '\y(' || string_agg(pattern_word,'|') || ')\y' regex from exclude_list )
-- CTE simulates actual table to provide test data
, export_final (id,combined) as (values (0,'This row 001 NAME Main PACKAGE has COUNT 3 units'),(1,'But single package can hold 6 KG'))
select regexp_replace(combined,regex,'', 'g')
as description, id
from export_final cross join exclude;

Add padding to IP addresses in PostgreSQL SELECT Query?

I've already got a method for excel, but I want the padding to be done via the query to reduce my effort later in the process
Excel Example
=TEXT(LEFT(A2,FIND(".",A2,1)-1),"000") & "." & TEXT(MID(A2,FIND(
".",A2,1)+1,FIND(".",A2,FIND(".",A2,1)+1)-FIND(".",A2,1)-1),"000")
& "." & TEXT(MID(A2,FIND(".",A2,FIND(".",A2,1)+1)+1,FIND(".",A2,
FIND(".",A2,FIND(".",A2,1)+1)+1)-FIND(".",A2,FIND(".",A2,1)+1)-1),
"000") & "." & TEXT(RIGHT(A2,LEN(A2)-FIND(".",A2,FIND(".",A2,FIND(
".",A2,1)+1)+1)),"000")
I tried searching the PostgreSQL documentation but nothing was obvious on converting to padded
I also investigated potentially doing a CAST as I have done for hostnames utilizing regex
Hostname CAST Example for PostgreSQL
UPPER(regexp_replace(da.host_name, '([\.][\w\.]+)', '', 'g')) AS hostname
But, I am hitting a roadblock here. Any suggestions?
I'm making the assumption that by "Padding an IP" you are wanting to lpad 0's to the front of each ip part.
Using regexp_replace you can do the following:
SELECT regexp_replace(regexp_replace('19.2.2.2', '([0-9]{1,3})', '00\1', 'g'), '(0*)([0-9]{3})', '\2', 'g');
Optionally if you are on 9.4 or newer you could get crafty with UNNEST() or REGEXP_SPLIT_TO_TABLE() and the new WITH ORDINALITY keyword to split each ip part (and the key from the table) out to its own row. Then you can lpad() with 0's and string_agg() it back together using the ordinal that was preserved in the unnest or regexp_split_to_table():
user=# SELECT * FROM test;
id | ip
----+--------------
1 | 19.16.2.2
2 | 20.321.123.1
(2 rows)
user=# SELECT id, string_agg(lpad(ip_part, 3, '0'),'.' ORDER BY rn) FROM test t, regexp_split_to_table(t.ip, '\.') WITH ORDINALITY s(ip_part, rn) GROUP BY id;
id | string_agg
----+-----------------
1 | 019.016.002.002
2 | 020.321.123.001
(2 rows)
Theoretically this would work in older versions since it seems like ordinals are preserved during unnest(), but it feels more like luck and I wouldn't productionize any code that depends on that.

Inserting commas into values

I have a field, let's call it total_sales where the value it returns is 3621731641
I would like to convert that so it has a thousand separator commas inserted into it. So it would ultimately return as 3,621,731,641
I've looked through the Redshift documentation and have not been able to find anything.
Similar to following query should work for you.
select to_char(<columnname>,'999,999,999,999') from table1;
Make sure to put Maximum size in while specifying pattern into second parameter.
It should not give you $ if don't specify 'l' in second parameter like below.
select to_char(<columnname>,'l999,999,999,999') from table1;
Money format: select '$'||trim(to_char(1000000000.555,'999G999G999G999.99'))
select to_char(<columnname>,'999,999,999,999') from table1;
or
select to_char(<columnname>,'999G999G999G999') from table1;

how to remove my stop words from a string column in postgresql

I have a table with a string column. I want to remove the stop words. I used this query which seems Ok.
SELECT to_tsvector('english',colName)from tblName order by colName asc;
it does not update the column in table
I want to see the stop words of Postgresql and what the query found.Then in case I can replace it with my own file. I also checked this address and could not find the stop words list file. Actually, the address does not exist.
$SHAREDIR/tsearch_data/english.stop
There is no function to do that.
You could use something like this (in this example in German):
SELECT array_to_string(tsvector_to_array(to_tsvector('Hallo, Bill und Susi!')), ' ');
array_to_string
-----------------
bill hallo susi
(1 row)
This removes stop words, but also stems and non-words, and it does not care about word order, so I doubt that the result will make you happy.
If that doesn't fit the bill, you can use regexp_replace like this:
SELECT regexp_replace('Bill and Susi, hand over or die!', '\y(and|or|if)\y', '', 'g');
regexp_replace
-----------------------------
Bill Susi, hand over die!
(1 row)
But that requires that you include your list of stop words in the query string. An improved version would store the stop words in a table.
The chosen answer did not match my requirement, but I found a solution for this:
SELECT regexp_replace('Bill and Susi, hand over or die!', '[^ ]*$','');
regexp_replace
-----------------------------
Bill and Susi, hand over or
(1 row)

SqlAlchemy: count of distinct over multiple columns

I can't do:
>>> session.query(
func.count(distinct(Hit.ip_address, Hit.user_agent)).first()
TypeError: distinct() takes exactly 1 argument (2 given)
I can do:
session.query(
func.count(distinct(func.concat(Hit.ip_address, Hit.user_agent))).first()
Which is fine (count of unique users in a 'pageload' db table).
This isn't correct in the general case, e.g. will give a count of 1 instead of 2 for the following table:
col_a | col_b
----------------
xx | yy
xxy | y
Is there any way to generate the following SQL (which is valid in postgresql at least)?
SELECT count(distinct (col_a, col_b)) FROM my_table;
distinct() accepts more than one argument when appended to the query object:
session.query(Hit).distinct(Hit.ip_address, Hit.user_agent).count()
It should generate something like:
SELECT count(*) AS count_1
FROM (SELECT DISTINCT ON (hit.ip_address, hit.user_agent)
hit.ip_address AS hit_ip_address, hit.user_agent AS hit_user_agent
FROM hit) AS anon_1
which is even a bit closer to what you wanted.
The exact query can be produced using the tuple_() construct:
session.query(
func.count(distinct(tuple_(Hit.ip_address, Hit.user_agent)))).scalar()
Looks like sqlalchemy distinct() accepts only one column or expression.
Another way around is to use group_by and count. This should be more efficient than using concat of two columns - with group by database would be able to use indexes if they do exist:
session.query(Hit.ip_address, Hit.user_agent).\
group_by(Hit.ip_address, Hit.user_agent).count()
Generated query would still look different from what you asked about:
SELECT count(*) AS count_1
FROM (SELECT hittable.user_agent AS hittableuser_agent, hittable.ip_address AS sometable_column2
FROM hittable GROUP BY hittable.user_agent, hittable.ip_address) AS anon_1
You can add some variables or characters in concat function in order to make it distinct. Taking your example as reference it should be:
session.query(
func.count(distinct(func.concat(Hit.ip_address, "-", Hit.user_agent))).first()