Sum all ascii values for every character of varchar in PostgreSQL - postgresql

I have a table I want to partition based on HASH. This table has a column with varchar, which is the key I want to use to partition.
Ofc. I can't partition based on HASH with varchar, therefore I will SUM all the ASCII values of each character in the varchar.
I hope to get some help to stitch together a function, which takes a varchar parameter and returns the SUM as an INTEGER.
I have tried several variations - some of them commented out -, this is how it looks so far:
CREATE OR REPLACE FUNCTION sum_string_ascii_values(theString varchar)
RETURNS INTEGER
LANGUAGE plpgsql
AS
$$
DECLARE
theSum INTEGER;
BEGIN
-- Sum on all ascii values coming from the every single char from the input varchar.
SELECT SUM( val )
FROM LATERAL ( SELECT ASCII( UNNEST( STRING_TO_ARRAY( LOWER(theString), null) ) ) ) AS val
INTO theSum;
--SELECT SUM(val) FROM ASCII( UNNEST( STRING_TO_ARRAY( LOWER(theString), null) ) ) AS val INTO theSUM;
--RETURN SUM( ASCII( UNNEST( STRING_TO_ARRAY( LOWER(theString), null) ) ) );
RETURN theSUM;
END;
$$;
I hope someone will be able to write and explain a solution to this problem.

Instead of using SELECT to sum the characters, you can loop through the string instead
CREATE OR REPLACE FUNCTION sum_string_ascii_values(input text) RETURNS int LANGUAGE plpgsql AS $$
DECLARE
hash int = 0;
pos int = 0;
BEGIN
WHILE pos <= length(input) LOOP
hash = hash + ascii(upper(substr(input, pos, 1)));
pos = pos + 1;
END LOOP;
RETURN hash;
END;
$$;
Here is a link to a dbfiddle to demonstrate https://dbfiddle.uk/yfhpHyT1

Related

How to use text input as column name(s) in a Postgres function?

I'm working with Postgres and PostGIS. Trying to write a function that that selects specific columns according to the given argument.
I'm using a WITH statement to create the result table before converting it to bytea to return.
The part I need help with is the $4 part. I tried it is demonstrated below and $4::text and both give me back the text value of the input and not the column value in the table if cols=name so I get back from the query name and not the actual names in the table. I also try data($4) and got type error.
The code is like this:
CREATE OR REPLACE FUNCTION select_by_txt(z integer,x integer,y integer, cols text)
RETURNS bytea
LANGUAGE 'plpgsql'
AS $BODY$
declare
res bytea;
begin
WITH bounds AS (
SELECT ST_TileEnvelope(z, x, y) AS geom
),
mvtgeom AS (
SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.geom) AS geom, $4
FROM table1 t, bounds
WHERE ST_Intersects(t.geom, ST_Transform(bounds.geom, 4326))
)
SELECT ST_AsMVT(mvtgeom, 'public.select_by_txt')
INTO res
FROM mvtgeom;
RETURN res;
end;
$BODY$;
Example for calling the function:
select_by_txt(10,32,33,"col1,col2")
The argument cols can be multiple column names from 1 and not limited from above. The names of the columns inside cols will be checked before calling the function that they are valid columns.
Passing multiple column names as concatenated string for dynamic execution urgently requires decontamination. I suggest a VARIADIC function parameter instead, with properly quoted identifiers (using quote_ident() in this case):
CREATE OR REPLACE FUNCTION select_by_txt(z int, x int, y int, VARIADIC cols text[] = NULL, OUT res text)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format(
$$
SELECT ST_AsMVT(mvtgeom, 'public.select_by_txt')
FROM (
SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.geom) AS geom%s
FROM table1 t
JOIN (SELECT ST_TileEnvelope($1, $2, $3)) AS bounds(geom)
ON ST_Intersects(t.geom, ST_Transform(bounds.geom, 4326))
) mvtgeom
$$, (SELECT ', ' || string_agg(quote_ident (col), ', ') FROM unnest(cols) col)
)
INTO res
USING z, x, y;
END
$func$;
db<>fiddle here
The format specifier %I for format() deals with a single identifier. You have to put in more work for multiple identifiers, especially for a variable number of 0-n identifiers. This implementation quotes every single column name, and only add a , if any column names have been passed. So it works for every possible input, even no input at all. Note VARIADIC cols text[] = NULL as last input parameter with NULL as default value:
Optional argument in PL/pgSQL function
Related:
quote_ident() does not add quotes to column name "first"
Column names are case sensitive in this context!
Call for your example (important!):
SELECT select_by_txt(10,32,33,'col1', 'col2');
Alternative syntax:
SELECT select_by_txt(10,32,33, VARIADIC '{col1,col2}');
More revealing call, with a third column name and malicious (though futile) intent:
SELECT select_by_txt(10,32,33,'col1', 'col2', $$col3'); DROP TABLE table1;--$$);
About that odd third column name and SQL injection:
https://www.explainxkcd.com/wiki/index.php/Little_Bobby_Tables
About VAIRADIC parameters:
Return rows matching elements of input array in plpgsql function
Pass multiple values in single parameter
Using an OUT parameter for simplicity. That's totally optional. See:
Returning from a function with OUT parameter
What I would not do
If you really, really trust the input to be a properly formatted list of 1 or more valid column names at all times - and you asserted that ...
the names of the columns inside cols will be checked before calling the function that they are valid columns
You could simplify:
CREATE OR REPLACE FUNCTION select_by_txt(z int, x int, y int, cols text, OUT res text)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format(
$$
SELECT ST_AsMVT(mvtgeom, 'public.select_by_txt')
FROM (
SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.geom) AS geom, %s
FROM table1 t
JOIN (SELECT ST_TileEnvelope($1, $2, $3)) AS bounds(geom)
ON ST_Intersects(t.geom, ST_Transform(bounds.geom, 4326))
) mvtgeom
$$, cols
)
INTO res
USING z, x, y;
END
$func$;
(How can you be so sure that the input will always be reliable?)
You would need to use a dynamic query:
CREATE OR REPLACE FUNCTION select_by_txt(z integer,x integer,y integer, cols text)
RETURNS bytea
LANGUAGE 'plpgsql'
AS $BODY$
declare
res bytea;
begin
EXECUTE format('
WITH bounds AS (
SELECT ST_TileEnvelope($1, $2, $3) AS geom
),
mvtgeom AS (
SELECT ST_AsMVTGeom(ST_Transform(t.geom, 3857), bounds.geom) AS geom, %I
FROM table1 t, bounds
WHERE ST_Intersects(t.geom, ST_Transform(bounds.geom, 4326))
)
SELECT ST_AsMVT(mvtgeom, ''public.select_by_txt'')
FROM mvtgeom', cols)
INTO res
USING z,x,y;
RETURN res;
end;
$BODY$;

Best way to remove ordered sequential duplicates in a comma separated list with postgresSQL

I have a column with data that looks like this in a single field:
"a,a,b,b,c,a,b,b,b,a,a,a,a,a,a,c,a,a,b"
Using some sort of regex or SQL function I would like to make it look like this:
"a,b,c,a,b,a,c,a,b"
Essentially I am trying to get rid of repeated values that appear in order but keep the unique changes from one value to another.
My knowledge of reg-expressions pretty much ends at removing duplicates. Any help is greatly appreciated!
use regexp:
SELECT regexp_replace('a,a,b,b,c,a,b,b,b,a,a,a,a,a,a,c,a,a,b', '(\w)(,\1)+', '\1', 'g')
(\w)(,\1)+ mutches: (any word char) and following (, and this same word char) more than one time...
Fiddle example
RegExr example
You can convert the elements into rows, check if the previous row is different to the current and then keep only those where something changed. This can then be aggregated back into a comma separated list:
select string_agg(ch, ',' order by idx)
from (
select u.ch, u.idx,
coalesce(u.ch <> lag(u.ch) over (order by u.idx), true) as is_change
from unnest(string_to_array('a,a,b,b,c,a,b,b,b,a,a,a,a,a,a,c,a,a,b', ',')) with ordinality as u(ch, idx)
) t
where is_change
The with ordinality returns the original array index, so that we can sort the elements correctly when aggregating them.
This can also be put into a function:
create or replace function cleanup(p_input text)
returns text
as
$$
select string_agg(ch, ',' order by idx)
from (
select u.ch, u.idx,
coalesce(u.ch <> lag(u.ch) over (order by u.idx), true) as is_change
from unnest(string_to_array(p_input, ',')) with ordinality as u(ch, idx)
) t
where is_change;
$$
language sql;
Online example
My understanding is:
If the character is the same as previous character, you want to remove it from the string.
So I will use while loop and if statement in this case:
--CREATE TABLE TEST (ID VARCHAR(100));
--INSERT INTO TEST VALUES ('a,a,b,b,c,a,b,b,b,a,a,a,a,a,a,c,a,a,b');
DO $$
DECLARE
V_NEWSTRING VARCHAR(100) := '';
V_I INTEGER := 1;
V_LENGTH INTEGER := 0;
V_CURRENT VARCHAR(10) := '';
V_LAST VARCHAR(10) := '';
BEGIN
SELECT LENGTH(ID) FROM TEST INTO V_LENGTH;
WHILE V_I <= V_LENGTH LOOP
SELECT SUBSTRING(ID,V_I,1) from TEST INTO V_CURRENT;
IF V_CURRENT <> V_LAST THEN
V_NEWSTRING = V_NEWSTRING || V_CURRENT || ',';
END IF;
V_LAST = V_CURRENT;
V_I = V_I + 2;
END LOOP;
raise notice 'Value: %', V_NEWSTRING;
END $$;
Test Result (PostgreSQL-9.4):

In clause in postgres

Need Output from table with in clause in PostgreSQL
I tried to make loop or ids passed from my code. I did same to update the rows dynamically, but for select I m not getting values from DB
CREATE OR REPLACE FUNCTION dashboard.rspgetpendingdispatchbyaccountgroupidandbranchid(
IN accountgroupIdCol numeric(8,0),
IN branchidcol character varying
)
RETURNS void
AS
$$
DECLARE
ArrayText text[];
i int;
BEGIN
select string_to_array(branchidcol, ',') into ArrayText;
i := 1;
loop
if i > array_upper(ArrayText, 1) then
exit;
else
SELECT
pd.branchid,pd.totallr,pd.totalarticle,pd.totalweight,
pd.totalamount
FROM dashboard.pendingdispatch AS pd
WHERE
pd.accountgroupid = accountgroupIdCol AND pd.branchid IN(ArrayText[i]::numeric);
i := i + 1;
end if;
END LOOP;
END;
$$ LANGUAGE 'plpgsql' VOLATILE;
There is no need for a loop (or PL/pgSQL actually)
You can use the array directly in the query, e.g.:
where pd.branchid = any (string_to_array(branchidcol, ','));
But your function does not return anything, so obviously you won't get a result.
If you want to return the result of that SELECT query, you need to define the function as returns table (...) and then use return query - or even better make it a SQL function:
CREATE OR REPLACE FUNCTION dashboard.rspgetpendingdispatchbyaccountgroupidandbranchid(
IN accountgroupIdCol numeric(8,0),
IN branchidcol character varying )
RETURNS table(branchid integer, totallr integer, totalarticle integer, totalweight numeric, totalamount integer)
AS
$$
SELECT pd.branchid,pd.totallr,pd.totalarticle,pd.totalweight, pd.totalamount
FROM dashboard.pendingdispatch AS pd
WHERE pd.accountgroupid = accountgroupIdCol
AND pd.branchid = any (string_to_array(branchidcol, ',')::numeric[]);
$$
LANGUAGE sql
VOLATILE;
Note that I guessed the data types for the columns of the query based on their names. You have to adjust the line with returns table (...) to match the data types of the select columns.

Extract integer value from string column with additional text

I'm converting a BDE query (Paradox) to a Firebird (2.5, not 3.x) and I have a very convenient conversion in it:
select TRIM(' 1') as order1, CAST(' 1' AS INTEGER) AS order2 --> 1
select TRIM(' 1 bis') as order1, CAST(' 1 bis' AS INTEGER) AS order2 --> 1
Then ordering by the cast value then the trimmed value (ORDER order2, order1) provide me the result I need:
1
1 bis
2 ter
100
101 bis
However, in Firebird casting an incorrect integer will raise an exception and I did not find any way around to provide same result. I think I can tell if a number is present with something like below, but I couldn't find a way to extract it.
TRIM(' 1 bis') similar to '[ [:ALPHA:]]*[[:DIGIT:]]+[ [:ALPHA:]]*'
[EDIT]
I had to handle cases where text were before the number, so using #Arioch'The's trigger, I got this running great:
SET TERM ^ ;
CREATE TRIGGER SET_MYTABLE_INTVALUE FOR MYTABLE ACTIVE
BEFORE UPDATE OR INSERT POSITION 0
AS
DECLARE I INTEGER;
DECLARE S VARCHAR(13);
DECLARE C VARCHAR(1);
DECLARE R VARCHAR(13);
BEGIN
IF (NEW.INTVALUE is not null) THEN EXIT;
S = TRIM( NEW.VALUE );
R = NULL;
I = 1;
WHILE (I <= CHAR_LENGTH(S)) DO
BEGIN
C = SUBSTRING( S FROM I FOR 1 );
IF ((C >= '0') AND (C <= '9')) THEN LEAVE;
I = I + 1;
END
WHILE (I <= CHAR_LENGTH(S)) DO
BEGIN
C = SUBSTRING( S FROM I FOR 1 );
IF (C < '0') THEN LEAVE;
IF (C > '9') THEN LEAVE;
IF (C IS NULL) THEN LEAVE;
IF (R IS NULL) THEN R=C; ELSE R = R || C;
I = I + 1;
END
NEW.INTVALUE = CAST(R AS INTEGER);
END^
SET TERM ; ^
Converting such a table, you have to add a special indexed integer column for keeping the extracted integer data.
Note, this query while using "very convenient conversion" is actually rather bad: you should use indexed columns to sort (order) large amounts of data, otherwise you are going into slow execution and waste a lot of memory/disk for temporary sorting tables.
So you have to add an extra integer indexed column and to use it in the query.
Next question is how to populate that column.
Better would be to do it once, when you move your entire database and application from BDE to Firebird. And from that point make your application when entering new data rows fill BOTH varchar and integer columns properly.
One time conversion can be done by your convertor application, then.
Or you can use selectable Stored Procedure that would repeat the table with such and added column. Or you can make Execute Block that would iterate through the table and update its rows calculating the said integer value.
How to SELECT a PROCEDURE in Firebird 2.5
If you would need to keep legacy applications, that only insert text column but not integer column, then I think you would have to use BEFORE UPDATE OR INSERT triggers in Firebird, that would parse the text column value letter by letter and extract integer from it. And then make sure your application never changes that integer column directly.
See a trigger example at Trigger on Update Firebird
PSQL language documentation: https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-psql.html
Whether you would write procedure or trigger to populate the said added integer indexed column, you would have to make simple loop over characters, copying string from first digit until first non-digit.
https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-functions-scalarfuncs.html#fblangref25-functions-string
https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref25-psql-coding.html#fblangref25-psql-declare-variable
Something like that
CREATE TRIGGER my_trigger FOR my_table
BEFORE UPDATE OR INSERT
AS
DECLARE I integer;
DECLARE S VARCHAR(100);
DECLARE C VARCHAR(100);
DECLARE R VARCHAR(100);
BEGIN
S = TRIM( NEW.MY_TXT_COLUMN );
R = NULL;
I = 1;
WHILE (i <= CHAR_LENGTH(S)) DO
BEGIN
C = SUBSTRING( s FROM i FOR 1 );
IF (C < '0') THEN LEAVE;
IF (C > '9') THEN LEAVE;
IF (C IS NULL) THEN LEAVE;
IF (R IS NULL) THEN R=C; ELSE R = R || C;
I = I + 1;
END
NEW.MY_INT_COLUMN = CAST(R AS INTEGER);
END;
In this example your ORDER order2, order1 would become
SELECT ..... FROM my_table ORDER BY MY_INT_COLUMN, MY_TXT_COLUMN
Additionally, it seems your column actually contains a compound data: an integer index and an optional textual postfix. If so, then the data you have is not normalized and the table better be restructured.
CREATE TABLE my_table (
ORDER_Int INTEGER NOT NULL,
ORDER_PostFix VARCHAR(24) CHECK( ORDER_PostFix = TRIM(ORDER_PostFix) ),
......
ORDER_TXT COMPUTED BY (ORDER_INT || COALESCE( ' ' || ORDER_PostFix, '' )),
PRIMARY KEY (ORDER_Int, ORDER_PostFix )
);
When you would move your data from Paradox to Firebird - make your convertor application check and split those values like "1 bis" into two new columns.
And your query then would be like
SELECT ORDER_TXT, ... FROM my_table ORDER BY ORDER_Int, ORDER_PostFix
if you're using fb2.5 you can use the following:
execute block (txt varchar(100) = :txt )
returns (res integer)
as
declare i integer;
begin
i=1;
while (i<=char_length(:txt)) do begin
if (substring(:txt from i for 1) not similar to '[[:DIGIT:]]')
then txt =replace(:txt,substring(:txt from i for 1),'');
else i=i+1;
end
res = :txt;
suspend;
end
in fb3.0 you have more convenient way to do the same
select
cast(substring(:txt||'#' similar '%#"[[:DIGIT:]]+#"%' escape '#') as integer)
from rdb$database
--assuming that the field is varchar(15))
select cast(field as integer) from table;
Worked in firebird version 2.5.

PostgreSQL using variables in FOR Loop

I have two tables:
CREATE TABLE arapply
(
arapply_id serial NOT NULL,
arapply_postdate date,
arapply_source_docnumber text,
arapply_target_docnumber text,
arapply_target_paid numeric
);
CREATE TABLE aropenbal
(
ar_id integer,
doc_number text,
doc_type text,
doc_date date,
base_amount numeric,
paid_amount numeric,
open_balance numeric
);
For each entry in aropenbal, I want to SUM arapply.arapply_target_paid values where arapply.arapply_source_docnumber = aropenbal.doc_number (if aropenbal.doctype is C or R) or arapply.arapply_target_docnumber = aropenbal.doc_number (if aropenbal.doctype is not C or R) AND also arapply_postdate <= aropenbal.doc_date. The result should be stored to aropenbal.paid_amount.
I then wish to update aropenbal.open_balance with aropenbal.base_amount + aropenbal.paid_amount.
The function should return the total (SUM) of aropenbal.open_balance.
I'm having problems with the code below. The SELECT statements inside the FOR loop don't work, unless I manually assign a value say '362' in place of r.docnumber. Otherwise, the result is zero.
Seems to be a formatting problem.
Any insights?
CREATE OR REPLACE FUNCTION testit() RETURNS NUMERIC AS
$BODY$
DECLARE
r RECORD;
BEGIN
FOR r IN
SELECT * FROM aropenbal ORDER BY doc_date
LOOP
UPDATE aropenbal SET paid_amount = (
CASE WHEN (doc_type IN ('C', 'R')) THEN
(SELECT COALESCE (SUM (arapply_target_paid)* -1, 0)
FROM arapply
WHERE arapply_source_docnumber = r.doc_number
AND arapply_postdate <= r.doc_date)
ELSE
(SELECT COALESCE(SUM (arapply_target_paid),0)
FROM arapply
WHERE arapply_target_docnumber = r.doc_number
AND arapply_postdate <= r.doc_date)
END) WHERE ar_id = r.ar_id;
UPDATE aropenbal SET open_balance = (base_amount - paid_amount)
WHERE ar_id = r.ar_id;
END LOOP;
RETURN (SELECT SUM(open_balance) FROM aropenbal);
END;
$BODY$
LANGUAGE plpgsql;