increment sequence in PostgreSQL stored procedure - postgresql

How to auto increment sequence number once for every run of a stored procedure and how to use it in the where condition of an update statement?
I already assigned a sequence number to the next value in each run, but I'm not able to use it in the where condition.
CREATE OR REPLACE FUNCTION ops.mon_connect_easy()
RETURNS void
LANGUAGE plpgsql
AS $function$
declare
_inserted_rows bigint = 0;
sql_run bigint = 0;
--assigning the sequence number to the variable
select nextval('ops.mon_connecteasy_seq') into run_seq_num;
-- use for selection iteration_id. this is hwere I'm getting stuck
update t_contract c
set end_date = ce.correct_end_date, status='Active',
orig_end_date =ce.correct_end_date
from ops.t_mon_ConnectEasy ce
where c.contract_id = ce.contract_id
and run_seq_num = ??;

nextval() advances the sequence automatically before returning the resulting value. You don't need anything extra. Just use the function in your query directly:
update t_contract c
set end_date = ce.correct_end_date
, status = 'Active'
, orig_end_date = ce.correct_end_date
from ops.t_mon_ConnectEasy ce
where c.contract_id = ce.contract_id
and iteration_id = nextval('ops.mon_connecteasy_seq');
Be aware that concurrent transactions also might advance the sequence, creating virtual gaps in the sequential numbers.
And I have a nagging suspicion that this might not be the best way to achieve your undisclosed goals.

Related

How to use dynamic regex to match value in Postgres

SUMMARY: I've two tables I want to derive info out of: family_values (family_name, item_regex) and product_ids (product_id) to be able to update the property family_name in a third.
Here the plan is to grab a json array from the small family_values table and use the column value item_regex to do a test match against the product_id for every row in product_ids.
MORE DETAILS: Importing static data from CSV to table of orders. But, in evaluating cost of goods and market value I'm needing to continuously determine family from a prefix regex (item_regex from family_values) match on the product_id.
On the client this looks like this:
const families = {
FOOBAR: 'Big Ogre',
FOOBA: 'Wood Elf',
FOO: 'Valkyrie'
};
// And to find family, and subsequently COGs and Market Value:
const findFamily = product_id => Object.keys(families).find(f => new RegExp('^' + f).test(product_id));
This is a huge hit for the client so I made a family_values table in PG to include a representative: family_name, item_regex, cogs, market_value.
Then, the product_ids has a list of only the products the app cares about (out of millions). This is actually used with an insert trigger 'on before' to ignore any CSV entries that aren't in the product_ids view. So, I guess after that the product_ids view could be taken out of the equation because the orders, after inserting readonly data, has its own matching product_id. It does NOT have family_name, so I still have the issue of determining that client-side.
PSUEDO CODE: update family column of orders with family_name from family_values regex match against orders.product_id
OR update the product_ids table with a new family column and use that with the existing on insert trigger (used to left pad zeros and normalize data right now). Now I'm thinking this may be just an update as suggested, but not real good with regex in PG. I'm a PG novice.
PROBLEM: But, I'm having a hangup in doing what I thought would be like a JS Array Find operation. The family_values have been sorted on the item_regex so that the most strict match would be on top, and therefor found first.
For example, with sorting we have:
family_values_array = [
{"family_name": "Big Ogre", "item_regex": "FOOBAR"},
{"family_name": "Wood Elf", "item_regex": "FOOBA"},
{"family_name": "Valkyrie", "item_regex": "FOO"}]
So, that the comparison of product_id of ^FOOBA would yield family "Wood Elf".
SOLUTION:
The solution I finally came about using was simply using concat to write out the front-anchored regex. It was so simple in the end. The key line I was missing is:
select * into family_value_row from iol.family_values
where lvl3_id = product_row.lvl3_id and product_row.product_id
like concat(item_regex, '%') limit 1;
Whole function:
create or replace function iol.populate_families () returns void as $$
declare
product_row record;
family_value_row record;
begin
for product_row in
select product_id, lvl3_id from iol.products
loop
-- family_name is what we want after finding the BEST match fr a product_id against item_regex
select * into family_value_row from iol.family_values
where lvl3_id = product_row.lvl3_id and product_row.product_id like concat(item_regex, '%') limit 1;
-- update family_name and value columns
update iol.products set
family_name = family_value_row.family_name,
cog_cents = family_value_row.cog_cents,
market_value_cents = family_value_row.market_value_cents
where product_id = product_row.product_id;
end loop;
end;
$$
LANGUAGE plpgsql;
Use concat as updated above:
select * into family_value_row from iol.family_values
where lvl3_id = product_row.lvl3_id and product_row.product_id
like concat(item_regex, '%') limit 1;

How do I reduce the cost of set_bit in Postgres?

I am running PostgreSQL 9.6 and am running an experiment on the following table structure:
CREATE TABLE my_bit_varying_test (
id SERIAL PRIMARY KEY,
mr_bit_varying BIT VARYING
);
Just to understand how much performance I could expect if I was resetting bits on 100,000-bit data concurrently, I wrote a small PL/pgSQL block like this:
DO $$
DECLARE
t BIT VARYING(100000) := B'0';
idd INT;
BEGIN
FOR I IN 1..100000
LOOP
IF I % 2 = 0 THEN
t := t || B'1';
ELSE
t := t || B'0';
end if;
END LOOP ;
INSERT INTO my_bit_varying_test (mr_bit_varying) VALUES (t) RETURNING id INTO idd;
UPDATE my_bit_varying_test SET mr_bit_varying = set_bit(mr_bit_varying, 100, 1) WHERE id = idd;
UPDATE my_bit_varying_test SET mr_bit_varying = set_bit(mr_bit_varying, 99, 1) WHERE id = idd;
UPDATE my_bit_varying_test SET mr_bit_varying = set_bit(mr_bit_varying, 34587, 1) WHERE id = idd;
UPDATE my_bit_varying_test SET mr_bit_varying = set_bit(mr_bit_varying, 1, 1) WHERE id = idd;
FOR I IN 1..100000
LOOP
IF I % 2 = 0 THEN
UPDATE my_bit_varying_test
SET mr_bit_varying = set_bit(mr_bit_varying, I, 1)
WHERE id = idd;
ELSE
UPDATE my_bit_varying_test
SET mr_bit_varying = set_bit(mr_bit_varying, I, 0)
WHERE id = idd;
end if;
END LOOP ;
END
$$;
When I run the PL/pgSQL though, it takes several minutes to complete, and I've narrowed it down to the for loop that is updating the table. Is it running slowly because of the compression on the BIT VARYING column? Is there any way to improve the performance?
Edit This is a simulated, simplified example. What this is actually for is that I have tens of thousands of jobs running that each need to report back their status, which updates every few seconds.
Now, I could normalize it and have a "run status" table that held all the workers and their statuses, but that would involve storing tens of thousands of rows. So, my thought is that I could use a bitmap to store the client and status, and the mask would tell me in order which ones had run and which ones had completed. The front bit would be used as an "error bit" since I don't need to know exactly which client failed, only that a failure exists.
So for example, you might have 5 workers for one job. If they all completed, then the status would be "01111", indicating that all jobs were complete, and none of them failed. If worker number 2 fails, then the status is "111110", indicating that there was an error, and all workers completed except the last one.
So, you can see this as a contrived way of handling large numbers of job statuses. Of course I'm up for other ideas, but even if I go that route, for the future, I'd still like to know how to update a variable bit quickly, because well, I'm curious.
If it is really the TOAST compression that is your problem, you can simply disable it for that table:
ALTER TABLE my_bit_varying_test SET STORAGE EXTERNAL;
You can try a set based approach to replace the second loop. A set based approach is usually fatser than looping. Use generate_series() to get the indexes.
UPDATE my_bit_varying_test
SET mr_bit_varying = set_bit(mr_bit_varying, gs.i, abs(gs.i % 2 - 1))
FROM generate_series(1, 100000) gs(i)
WHERE id = idd;
Also consider creating an index on my_bit_varying_test (id), if you don't already have one.

How to avoid multiple insert in PostgreSQL

In my query im using for loop. Each and every time when for loop is executed, at the end some values has to be inserted into table. This is time consuming because for loop has many records. Due to this each and every time when for loop is executed, insertion is happening. Is there any other way to perform insertion at the end after the for loop is executed.
For i in 1..10000 loop ....
--coding
insert into datas.tb values(j,predictednode); -- j and predictednode are variables which will change for every loop
End loop;
Instead of inserting each and every time i want the insertion should happen at the end.
If you show how the variables are calculated it could be possible to build something like this:
insert into datas.tb
select
calculate_j_here,
calculate_predicted_node_here
from generate_series(1, 10000)
One possible solution is to build a large VALUES String. In Java, something like
StringBuffer buf = new StringBuffer(100000); // big enough?
for ( int i=1; i<=10000; ++i ) {
buf.append("(")
.append(j)
.append(",")
.append(predicted_node)
.append("),"); // whatever j and predict_node are
}
buf.setCharAt(buf.length()-1, ' '); // kill last comma
String query = "INSERT INTO datas.tb VALUES " + buf.toString() + ";"
// send query to DB, just once
The fact j and predict_node appear to be constant has me a little worried, though. Why are you putting a constant in 100000 times?
Another approach is to do the predicting in a Postgres procedural language, and have the DB itself calculate the value on insert.

How to pass object (table or column) names into a stored procedure

As far as the specs to get out of the way, it is PostgreSQL 9.2, with the database through PgAdmin3, and the format of the table as a shapefile. If there are any additional details necessary I will edit and provide.
I essentially want to update integer values in a column of integer values based on whether string values in some other columns are null or not. I eventually want to port my stored function to execute in a java app using jdbc, but I first want to test my function within pgadmin3. I have very little experience with postgresql and am a little fuzzy on the syntax.
So the parameters that I've got somewhat of an idea on needing and what to put in thus far are: name, argmode, argname, argtype, column_name, and lang_name.
I understand that if I'm not returning anything I can either include RETURNS void as $$ or simply not include a return statement. I don't think I need result sets, I just want to replace integer values in a column. I'm not sure how to reference the table that I need in order to pass in the columns I want to process.
Here is the code I have cobbled together thus far:
CREATE [OR REPLACE] FUNCTION handle_malformed([[VARIADIC][]
table_name.column_name, table_name.column_name, table_name.column_name, table_name.column_name, table_name.column_name, table_name.column_name, table_name.column_name, table_name.column_name])
BEGIN
LOOP
IF NAME_1 IS NULL THEN
IF LEVEL_DEPT != 0 THEN
UPDATE AdminBoundaries SET \"LEVEL_DEPT\" = 0;
ELSE IF NAME_2 IS NULL THEN
IF LEVEL_DEPT != 1 THEN
UPDATE AdminBoundaries SET \"LEVEL_DEPT\" = 1;
ELSE IF NAME_3 IS NULL THEN
IF LEVEL_DEPT != 2 THEN
UPDATE AdminBoundaries SET \"LEVEL_DEPT\" = 2;
ELSE IF NAME_4 IS NULL THEN
IF LEVEL_DEPT != 3 THEN
UPDATE AdminBoundaries SET \"LEVEL_DEPT\" = 3;
ELSE IF NAME_5 IS NULL THEN
IF LEVEL_DEPT != 4 THEN
UPDATE AdminBoundaries SET \"LEVEL_DEPT\" = 4;
ELSE
IF LEVEL_DEPT !=5 THEN
UPDATE AdminBoundaries SET \"LEVEL_DEPT\" = 5;
EXCEPTION
END LOOP;
END PROCEDURE;
$$ LANGUAGE plpgsql;
update handle_malformed(AdminBoundaries.NAME_5, AdminBoundaries.NAME_4, AdminBoundaries.NAME_3,
AdminBoundaries.NAME_2, AdminBoundaries.NAME_1, AdminBoundaries.NAME_0, AdminBoundaries.WIKI_URL, AdminBoundaries.LEVEL_DEPT)
The logic that I put in is the logic that I put in between the loop and end loop is the logic I want to accomplish.
My specific question is, how do I use the column names from arguments in the SQL?
Based on the comments I have edited the question to include a question.... As I understand it, you want a value from an argument to be used in the query. To do this, use EXECUTE and assemble the query as a string. Note that you cannot parameterize identifiers of this sort and so you must concat them in, and use quote_ident() to prevent in-stored-proc sql injection.
So instead of:
UPDATE AdminBoundaries SET \"LEVEL_DEPT\" = 1;
use:
EXECUTE $e$ UPDATE AdminBoundaries SET $e$ || quote_ident(LEVEL_DEPT) || $e$ = 1 $e$;
You can use execute to handle dynamic checks as well:
EXECUTE INTO my_bool $e$ SELECT a.$e$ || quote_ident(level_dept) $e$ is not null $e$;
I think this answers your question based on the comments. If not, feel free to clarify.

Trying to create aggregate function in PostgreSQL

I'm trying to create new aggregate function in PostgreSQL to use instead of the sum() function
I started my journey in the manual here.
Since I wanted to create a function that takes an array of double precision values, sums them and then does some additional calculations I first created that final function:
takes double precision as input and gives double precision as output
DECLARE
v double precision;
BEGIN
IF tax > 256 THEN
v := 256;
ELSE
v := tax;
END IF;
RETURN v*0.21/0.79;
END;
Then I wanted to create the aggregate function that takes an array of double precision values and puts out a single double precision value for my previous function to handle.
CREATE AGGREGATE aggregate_ee_income_tax (float8[]) (
sfunc = array_agg
,stype = float8
,initcond = '{}'
,finalfunc = eeincometax);
What I get when I run that command is:
ERROR: function array_agg(double precision, double precision[]) does
not exist
I'm somewhat stuck here, because the manual lists array_agg() as existing function. What am I doing wrong?
Also, when I run:
\da
List of aggregate functions
Schema | Name | Result data type | Argument data types | Description
--------+------+------------------+---------------------+-------------
(0 rows)
My installation has no aggregate functions at all? Or does only list user defined functions?
Basically what I'm trying to understand:
1) Can I use an existing functions to sum up my array values?
2) How can I find out about input and ouptut data types of functions? Docs claim that array_agg() takes any kind of input.
3) What is wrong with my own aggregate function?
Edit 1
To give more information and clearer picture of what I'm trying to achieve:
I have one huge query over several tables which goes something like this:
SELECT sum(tax) ... from (SUBQUERY) as foo group by id
I want to replace that sum function with my own aggregate function so I don't have to do additional calculations on backend - since they can all be done on database level.
Edit 2
Accepted Ants's answer. Since final solution comes from comments I post it here for reference:
CREATE AGGREGATE aggregate_ee_income_tax (float8)
(
sfunc = float8pl
,stype = float8
,initcond = '0.0'
,finalfunc = eeincometax
);
Array agg is an aggregate function not a regular function, so it can't be used as a state transition function for a new aggregate. What you want to do is to create an aggregate function which has a state transition function that is identical to array_agg and a custom final func.
Unfortunately the state transition function of array_agg is defined in terms of an internal datatype so it can't be reused. Fortunately there is an existing function in core that already does what you want.
CREATE AGGREGATE aggregate_ee_income_tax (float8)(
sfunc = array_append,
stype = float8[],
initcond = '{}',
finalfunc = eeincometax);
Also note that you had your types mixed up, you probably want aggregate a set of floats to an array, not a set of arrays to a float.
In addition to #Ants excellent advice:
1.) Your final function could be simplified to:
CREATE FUNCTION eeincometax(float8)
RETURNS float8 LANGUAGE SQL AS
$func$
SELECT (least($1, 256) * 21) / 79
$func$;
2.) It seems like you are dealing with money? In this case I would strongly advise to use the type numeric (preferred) or money for the purpose. Floating point operations are often not precise enough.
3.) The initial condition of the aggregate can simply be just 0:
CREATE AGGREGATE aggregate_ee_income_tax(float8)
(
sfunc = float8pl
,stype = float8
,initcond = 0
,finalfunc = eeincometax
);
4.) In your case (least(sum(tax), 256) * 21) / 79 is probably faster than your custom aggregate. Aggregate functions provided by PostgreSQL are written in C and optimized for performance. I would use that instead.