I have a very large database populated from social media. I'm trying to make a new column to make JSON for word_counter for faster analytics.
I'm first creating a function in PostgreSQL to take a string array, count the occurrences, and return a jsonb that gets inserted. Here is the following function
CREATE
OR REPLACE FUNCTION count_elements (TEXT []) RETURNS JSONB AS $$
DECLARE js JSONB := '{}' ;
DECLARE jjson JSONB ;
BEGIN
SELECT
jsonb_agg (
(
'{"' || i|| '":"' || C || '"}'
) :: JSONB
) INTO jjson
FROM
(
SELECT
i,
COUNT (*) C
FROM
(SELECT UNNEST($1 :: TEXT []) i) i
GROUP BY
i
ORDER BY
C DESC
) foo ; RETURN jjson ;
END ; $$ LANGUAGE plpgsql;
Here is the issue. When running the following query
select count_elements(string_to_array(lower(tweet_text), ' ')),tweet_text from tweet_database
limit 10
I get this error
[Err] ERROR: invalid input syntax for type json
DETAIL: Character with value 0x0a must be escaped.
CONTEXT: JSON data, line 1: {"winning?
SQL statement "SELECT
I tried escaping the column, and then regex replacing some of the items but it hasn't worked yet.
the to_json function can be used to escape text:
SELECT
jsonb_agg (
(
'{' || to_json(i) || ':' || C || '}'
) :: JSONB
) INTO jjson
then
select count_elements(E'{a, a, b, a\nb, a}'::text[]);
results in
[{"a":3}, {"b":1}, {"a\nb":1}]
Related
I am writing 1 PostgreSQL function for some operation.
Writing SQL migration for that function but facing formatting error as liquibase is not able to recognize some portion.
Function Liquibase Migration:
CREATE OR REPLACE FUNCTION schema.fncn(trId integer, sts integer, stIds character varying)
RETURNS double precision
LANGUAGE plpgsql
AS '
DECLARE
abc integer;
query CHAR(1500);
xyz integer;
BEGIN
query := ''select sum(t.a)
FROM schema.tbl t
where t.id in(1,2)
and t.status ='' || sts ||
'' and t.status <> 2
and t.tr_id ='' || trId ||
'' and t.sw in('''', ''N'')'';
IF stIds is not null then
query := query || '' AND t.st_id IN ('' || stIds || '')'';
ELSE
END IF;
EXECUTE query INTO abc;
SELECT abc INTO xyz;
RETURN xyz;
END;
'
;
Following error it throwing:
Caused by: org.postgresql.util.PSQLException: ERROR: syntax error at or near "N"
Reason: liquibase.exception.DatabaseException: ERROR: syntax error at or near "N"
Any suggestion what I am missing?
The immediate problem is the nesting of ' of single quotes. To make that easier, use dollar quoting for the function body. You can nest dollar quoted string by choosing different delimiters.
To avoid any problems with concatenation of parameters, use parameter place holders in the query and pass the values with the USING clause. That will however require two different execute calls.
I assume stIds is a comma separated string of values. To use that as a (single) placeholder, convert it to an array using string_to_array() - or even better: change the type of the input parameter to text[] and pass an array directly.
The query variable is better defined as text, don't use char. There is also no need to copy the result of the query into a different variable (which by the way would be more efficient using xyz := abc; rather than a select into)
CREATE OR REPLACE FUNCTION schema.fncn(trId integer, sts integer, stIds character varying)
RETURNS double precision
LANGUAGE plpgsql
AS
$body$
DECLARE
abc integer;
query text;
BEGIN
query := $q$ select sum(t.a)
FROM schema.tbl t
where t.id in (1,2)
and t.status = $1
and t.status <> 2
and t.tr_id = $2
and t.sw in ('''', 'N') $q$;
IF stIds is not null then
query := query || $sql$ AND t.st_id = ANY (string_to_array($4, ',') $sql$;
EXECUTE query INTO abc
using trid, sts, stids;
ELSE
EXECUTE query INTO abc
using trid, sts;
END IF;
RETURN abc;
END;
$body$
;
Note that in the Liquibase change, you must use splitStatements=false in order to run this without errors.
Sample Code as follows : ALL or ANY operator is not working. I need to compare ALL the values of the array
CREATE OR REPLACE FUNCTION public.sample_function(
tt_sample_function text)
RETURNS TABLE (..... )
LANGUAGE 'plpgsql'
COST 100
VOLATILE
ROWS 1000
AS $BODY$
declare
e record;
v_cnt INTEGER:=0;
rec record;
str text;
a_v text [];
BEGIN
FOR rec IN ( SELECT * FROM json_populate_recordset(null::sample_function ,sample_function::json) )
LOOP
a_v:= array_append(a_v, ''''||rec.key || '#~#' || rec.value||'''');
END LOOP;
SELECT MAInfo.userid FROM
(SELECT DISTINCT i.userid,
CASE WHEN (i.settingKey || '#~#' || i.settingvalue) = ALL (a_v)
THEN i.settingKey || '#~#' || 'Y'
ELSE i.settingKey || '#~#' || 'N' END
AS MatchResult
FROM public.sample_table i
WHERE (i.settingKey || '#~#' || i.settingvalue) = ALL (a_v)
GROUP BY i.userid, MatchResult) AS MAInfo
GROUP BY MAInfo.userid
HAVING COUNT(MAInfo.userid) >= 1;
RETURN QUERY (....);
END;
$BODY$;
CREATE TYPE tt_sample_function AS
(
key character varying,
value character varying
)
Inputs are
SELECT public.sample_function(
'[{"key":"devicetype", "value":"TestType"},{"key":"ostype", "value":"TestType"}]'
)
Any suggestion, why my ALL operator is not working. I mean its always giving false, it should match with all the array elements...
Note: ofcourse data is there in table.
You are over complicating things. You don't need the FOR loop or the array to do the comparison. You can do that all in a single statement. No need for an extra TYPE or generating an array.
The parameter to the function should be declared as jsonb as you clearly want to pass valid JSON there.
I don't understand what you are trying to achieve with the CASE expression. The WHERE clause only returns rows that match the first condition in the CASE, so the second one will never be reached.
I also don't understand why you have the CASE at all, as you discard the result of that in the outer query completely.
But keeping the original structure as close as possible, I think you can simplify this to a single CREATE TABLE AS statement and get rid of all the array processing.
CREATE OR REPLACE FUNCTION public.sample_function(p_settings jsonb)
RETURNS TABLE (..... )
LANGUAGE plpgsql
AS $BODY$
declare
...
begin
CREATE TEMP TABLE hold_userID AS
SELECT MAInfo.userid
FROM (
-- the distinct is useless as the GROUP BY already does that
SELECT i.userid,
CASE
-- this checks if the parameter contains the settings key/value from sample_table
-- but the WHERE clause already makes sure of that???
WHEN p_settings #> jsonb_build_object('key', i.settingKey, 'value', i.settingvalue)
THEN i.settingKey || '#~#' || 'Y'
ELSE i.settingKey || '#~#' || 'N'
END AS MatchResult
FROM public.sample_table i
WHERE (i.settingKey, i.settingvalue) = IN (select t.element ->> 'key' as key,
t.element ->> 'value' as value
from jsonb_array_elements(p_settings) as t(element))
GROUP BY i.userid, MatchResult
) AS MAInfo
GROUP BY MAInfo.userid
HAVING COUNT(MAInfo.userid) >= 1;
return query ...;
end;
$body$
If you want to check if certain users have all the settings passed to the function, you don't really need a CASE expression, just a proper having condition
So maybe you want this instead:
CREATE TEMP TABLE hold_userID AS
SELECT i.userid,
FROM public.sample_table i
WHERE (i.settingKey, i.settingvalue) = IN (select t.element ->> 'key' as key,
t.element ->> 'value' as value
from jsonb_array_elements(p_settings) as t(element))
GROUP BY i.userid
HAVING COUNT(*) = jsonb_array_length(p_settings);
Or alternatively:
SELECT i.userid
FROM (
select userid, settingkey as key, settingvalue as value
from public.sample_table
) i
group by i.userid
HAVING jsonb_object_agg(key, value) = p_settings
In a PL/pgSQL function, I am creating a view using the EXECUTE statement. The where clause in the view takes as input some jenkins job names. These job names are passed to the function as a comma-separated string. They are then converted to an array so that they can be used as argument to ANY in the where clause. See basic code below:
CREATE OR REPLACE FUNCTION FETCH_ALL_TIME_AGGR_KPIS(jobs VARCHAR)
RETURNS SETOF GenericKPI AS $$
DECLARE
job_names TEXT[];
BEGIN
job_names = string_to_array(jobs,',');
EXECUTE 'CREATE OR REPLACE TEMP VIEW dynamicView AS ' ||
'with pipeline_aggregated_kpis AS (
select
jenkins_build_parent_id,
sum (duration) as duration
from test_all_finished_job_builds_enhanced_view where job_name = ANY (' || array(select quote_ident(unnest(job_names))) || ') and jenkins_build_parent_id is not null
group by jenkins_build_parent_id)
select ' || quote_ident('pipeline-job') || ' as job_name, b1.jenkins_build_id, pipeline_aggregated_kpis.status, pipeline_aggregated_kpis.duration FROM job_builds_enhanced_view b1 INNER JOIN pipeline_aggregated_kpis ON (pipeline_aggregated_kpis.jenkins_build_parent_id = b1.jenkins_build_id)';
RETURN QUERY (select
count(*) as total_executions,
round(avg (duration) FILTER (WHERE status = 'SUCCESS')::numeric,2) as average_duration
from dynamicView);
END
$$
LANGUAGE plpgsql;
The creation of the function is successful but an error message is returned when I try to call the function. See below:
eea_ci_db=> select * from FETCH_ALL_TIME_AGGR_KPIS('integration,test');
ERROR: malformed array literal: ") and jenkins_build_parent_id is not null
group by jenkins_build_parent_id)
select "
LINE 7: ...| array(select quote_ident(unnest(job_names))) || ') and jen...
^
DETAIL: Array value must start with "{" or dimension information.
CONTEXT: PL/pgSQL function fetch_all_time_aggr_kpis(character varying) line 8 at EXECUTE
It seems like there is something going wrong with quotes & the passing of an array of string. I tried all following options with the same result:
where job_name = ANY (' || array(select quote_ident(unnest(job_names))) || ') and jenkins_build_parent_id is not null
or
where job_name = ANY (' || quote_ident(job_names)) || ') and jenkins_build_parent_id is not null
or
where job_name = ANY (' || job_names || ') and jenkins_build_parent_id is not null
Any ideas?
Thank you
There is no need for dynamic SQL at all. There isn't even the need for PL/pgSQL to do this:
CREATE OR REPLACE FUNCTION FETCH_ALL_TIME_AGGR_KPIS(jobs VARCHAR)
RETURNS SETOF GenericKPI
AS
$$
with pipeline_aggregated_kpis AS (
select jenkins_build_parent_id,
sum (duration) as duration
from test_all_finished_job_builds_enhanced_view
where job_name = ANY (string_to_array(jobs,','))
and jenkins_build_parent_id is not null
group by jenkins_build_parent_id
), dynamic_view as (
select "pipeline-job" as job_name,
b1.jenkins_build_id,
pipeline_aggregated_kpis.status,
pipeline_aggregated_kpis.duration
FROM job_builds_enhanced_view b1
JOIN pipeline_aggregated_kpis
ON pipeline_aggregated_kpis.jenkins_build_parent_id = b1.jenkins_build_id
)
select count(*) as total_executions,
round(avg (duration) FILTER (WHERE status = 'SUCCESS')::numeric,2) as average_duration
from dynamic_view;
$$
language sql;
You could do this with PL/pgSQL as well, you just need to use RETURN QUERY WITH ....
I am trying to create crosstab queries in PostgreSQL such that it automatically generates the crosstab columns instead of hardcoding it. I have written a function that dynamically generates the column list that I need for my crosstab query. The idea is to substitute the result of this function in the crosstab query using dynamic sql.
I know how to do this easily in SQL Server, but my limited knowledge of PostgreSQL is hindering my progress here. I was thinking of storing the result of function that generates the dynamic list of columns into a variable and use that to dynamically build the sql query. It would be great if someone could guide me regarding the same.
-- Table which has be pivoted
CREATE TABLE test_db
(
kernel_id int,
key int,
value int
);
INSERT INTO test_db VALUES
(1,1,99),
(1,2,78),
(2,1,66),
(3,1,44),
(3,2,55),
(3,3,89);
-- This function dynamically returns the list of columns for crosstab
CREATE FUNCTION test() RETURNS TEXT AS '
DECLARE
key_id int;
text_op TEXT = '' kernel_id int, '';
BEGIN
FOR key_id IN SELECT DISTINCT key FROM test_db ORDER BY key LOOP
text_op := text_op || key_id || '' int , '' ;
END LOOP;
text_op := text_op || '' DUMMY text'';
RETURN text_op;
END;
' LANGUAGE 'plpgsql';
-- This query works. I just need to convert the static list
-- of crosstab columns to be generated dynamically.
SELECT * FROM
crosstab
(
'SELECT kernel_id, key, value FROM test_db ORDER BY 1,2',
'SELECT DISTINCT key FROM test_db ORDER BY 1'
)
AS x (kernel_id int, key1 int, key2 int, key3 int); -- How can I replace ..
-- .. this static list with a dynamically generated list of columns ?
You can use the provided C function crosstab_hash for this.
The manual is not very clear in this respect. It's mentioned at the end of the chapter on crosstab() with two parameters:
You can create predefined functions to avoid having to write out the
result column names and types in each query. See the examples in the
previous section. The underlying C function for this form of crosstab
is named crosstab_hash.
For your example:
CREATE OR REPLACE FUNCTION f_cross_test_db(text, text)
RETURNS TABLE (kernel_id int, key1 int, key2 int, key3 int)
AS '$libdir/tablefunc','crosstab_hash' LANGUAGE C STABLE STRICT;
Call:
SELECT * FROM f_cross_test_db(
'SELECT kernel_id, key, value FROM test_db ORDER BY 1,2'
,'SELECT DISTINCT key FROM test_db ORDER BY 1');
Note that you need to create a distinct crosstab_hash function for every crosstab function with a different return type.
Related:
PostgreSQL row to columns
Your function to generate the column list is rather convoluted, the result is incorrect (int missing after kernel_id), it can be replaced with this SQL query:
SELECT 'kernel_id int, '
|| string_agg(DISTINCT key::text, ' int, ' ORDER BY key::text)
|| ' int, DUMMY text'
FROM test_db;
And it cannot be used dynamically anyway.
#erwin-brandstetter: The return type of the function isn't an issue if you're always returning a JSON type with the converted results.
Here is the function I came up with:
CREATE OR REPLACE FUNCTION report.test(
i_start_date TIMESTAMPTZ,
i_end_date TIMESTAMPTZ,
i_interval INT
) RETURNS TABLE (
tab JSON
) AS $ab$
DECLARE
_key_id TEXT;
_text_op TEXT = '';
_ret JSON;
BEGIN
-- SELECT DISTINCT for query results
FOR _key_id IN
SELECT DISTINCT at_name
FROM report.company_data_date cd
JOIN report.company_data_amount cda ON cd.id = cda.company_data_date_id
JOIN report.amount_types at ON cda.amount_type_id = at.id
WHERE date_start BETWEEN i_start_date AND i_end_date
AND interval_type_id = i_interval
LOOP
-- build function_call with datatype of column
IF char_length(_text_op) > 1 THEN
_text_op := _text_op || ', ' || _key_id || ' NUMERIC(20,2)';
ELSE
_text_op := _text_op || _key_id || ' NUMERIC(20,2)';
END IF;
END LOOP;
-- build query with parameter filters
RETURN QUERY
EXECUTE '
SELECT array_to_json(array_agg(row_to_json(t)))
FROM (
SELECT * FROM crosstab(''SELECT date_start, at.at_name, cda.amount ct
FROM report.company_data_date cd
JOIN report.company_data_amount cda ON cd.id = cda.company_data_date_id
JOIN report.amount_types at ON cda.amount_type_id = at.id
WHERE date_start between $$' || i_start_date::TEXT || '$$ AND $$' || i_end_date::TEXT || '$$
AND interval_type_id = ' || i_interval::TEXT || ' ORDER BY date_start'')
AS ct (date_start timestamptz, ' || _text_op || ')
) t;';
END;
$ab$ LANGUAGE 'plpgsql';
So, when you run it, you get the dynamic results in JSON, and you don't need to know how many values were pivoted:
select * from report.test(now()- '1 week'::interval, now(), 1);
tab
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[{"date_start":"2015-07-27T08:40:01.277556-04:00","burn_rate":0.00,"monthly_revenue":5800.00,"cash_balance":0.00},{"date_start":"2015-07-27T08:50:02.458868-04:00","burn_rate":34000.00,"monthly_revenue":15800.00,"cash_balance":24000.00}]
(1 row)
Edit: If you have mixed datatypes in your crosstab, you can add logic to look it up for each column with something like this:
SELECT a.attname as column_name, format_type(a.atttypid, a.atttypmod) AS data_type
FROM pg_attribute a
JOIN pg_class b ON (a.attrelid = b.relfilenode)
JOIN pg_catalog.pg_namespace n ON n.oid = b.relnamespace
WHERE n.nspname = $$schema_name$$ AND b.relname = $$table_name$$ and a.attstattarget = -1;"
I realise this is an older post but struggled for a little while on the same issue.
My Problem Statement:
I had a table with muliple values in a field and wanted to create a crosstab query with 40+ column headings per row.
My Solution was to create a function which looped through the table column to grab values that I wanted to use as column headings within the crosstab query.
Within this function I could then Create the crosstab query. In my use case I added this crosstab result into a separate table.
E.g.
CREATE OR REPLACE FUNCTION field_values_ct ()
RETURNS VOID AS $$
DECLARE rec RECORD;
DECLARE str text;
BEGIN
str := '"Issue ID" text,';
-- looping to get column heading string
FOR rec IN SELECT DISTINCT field_name
FROM issue_fields
ORDER BY field_name
LOOP
str := str || '"' || rec.field_name || '" text' ||',';
END LOOP;
str:= substring(str, 0, length(str));
EXECUTE 'CREATE EXTENSION IF NOT EXISTS tablefunc;
DROP TABLE IF EXISTS temp_issue_fields;
CREATE TABLE temp_issue_fields AS
SELECT *
FROM crosstab(''select issue_id, field_name, field_value from issue_fields order by 1'',
''SELECT DISTINCT field_name FROM issue_fields ORDER BY 1'')
AS final_result ('|| str ||')';
END;
$$ LANGUAGE plpgsql;
The approach described here worked well for me.
Instead of retrieving the pivot table directly. The easier approach is to let the function generate a SQL query string. Dynamically execute the resulting SQL query string on demand.
I need to check the condition within function using string_agg() function and need to assign it to variable. After assigning I need to execute the variable with value.
Example:
create or replace function funct1(a int,b varchar)
returns void as
$$
declare
wrclause varchar := '';
sqlq varchar ;
t varchar;
begin
IF (b IS NOT NULL ) THEN
wrclause := 'AND b IN ('|| b || ')';
END IF;
sqlq := string_agg('select *, abcd as "D" from ' ||table_namess,' Union all ') as namess
from tablescollection2 ud
inner join INFORMATION_SCHEMA.Tables so on ud.table_namess = so.Table_name
WHERE cola NOT IN (SELECT cola FROM tablet WHERE colb = || a ||) || wrclause; /* Error occurred here at = || a */
raise info '%',sqlq;
execute sqlq into t;
raise info '%',t;
end;
$$
language plpgsql;
Calling Function:
select funct1(1,'1,2,3');
Error:
ERROR: operator does not exist: || integer
|| is an operator for catenating two pieces of text, it requires you to have text (or something convertible to text) both before and after the operator, like so:
select 'a' || 'b'
select 'a' || 3
So while these seem to be valid:
wrclause := 'AND b IN ('|| b || ')';
sqlq := string_agg('select *, abcd as "D" from ' ||table_namess,' Union all ') as namess
This is definitely not:
WHERE cola NOT IN (SELECT cola FROM tablet WHERE colb = || a ||) || wrclause;
What were you trying to achieve here?
It looks like you may be trying to construct a query dynamically. You need to remember that you cannot mix free text with SQL and expect Postgres to sort it out, no programming or query language does that.
If that's your intention, you should construct the query string first in its entirety (in a variable), and then call EXECUTE with it to have it interpreted.
Have a look at these:
Postgres Dynamic Query Function
PostgreSQL - dynamic value as table name
This piece contains the syntax error
... IN (SELECT cola FROM tablet WHERE colb = || a ||) || ...
PostgreSQL can understand this, but will try to search for unary prefix (and a postfix) || operator, which are not exist by default (they can be created however, but the error message says, that's not the case)
Edit:
F.ex. these are valid (predefined) unary operators on numbers:
SELECT |/ 25.0, -- prefix, square root, result: 5.0
5 !, -- postfix, factorial, result: 120,
# -5, -- prefix, absolute, result: 5
# -5 !; -- mixed, result: 120