Array of array in Postgres - postgresql

In a Postgres DB I have a field field defined like this:
CREATE TABLE t (
id SERIAL PRIMARY KEY,
field character varying(255)[] DEFAULT ARRAY[]::character varying[],
);
There I store values like:
ID FIELD
1 {{lower,0},{greater,10}}
2 {{something_else,7},{lower,5}}
1 - How can I select the lower/greater value? I'd like a query response like this:
ID LOWER
1 0
2 5
2 - How can I filter by those lower/greater values?
Thanks!

It's pretty awkward to do but this accomplishes it. I use PG 9.3 so I don't know if there are better ways to do this in later versions.
SELECT id, (SELECT field[ss][2] FROM generate_subscripts(field, 1) ss WHERE field[ss][1] = 'lower') AS lower
FROM t;
Basically, for each record, generate the subscripts to use as indexes into the main array to access the subarrays. For each, look for an array where the first item is 'lower'. If found, return the value of the second item.

Related

PostgreSQL array of data composite update element using where condition

I have a composite type:
CREATE TYPE mydata_t AS
(
user_id integer,
value character(4)
);
Also, I have a table, uses this composite type as an array of mydata_t.
CREATE TABLE tbl
(
id serial NOT NULL,
data_list mydata_t[],
PRIMARY KEY (id)
);
Here I want to update the mydata_t in data_list, where mydata_t.user_id is 100000
But I don't know which array element's user_id is equal to 100000
So I have to make a search first to find the element where its user_id is equal to 100000 ... that's my problem ... I don't know how to make the query .... in fact, I want to update the value of the array element, where it's user_id is equal to 100000 (Also where the id of tbl is for example 1) ... What will be my query?
Something like this (I know it's wrong !!!)
UPDATE "tbl" SET "data_list"[i]."value"='YYYY'
WHERE "id"=1 AND EXISTS (SELECT ROW_NUMBER() OVER() AS i
FROM unnest("data_list") "d" WHERE "d"."user_id"=10000 LIMIT 1)
For example, this is my tbl data:
Row1 => id = 1, data = ARRAY[ROW(5,'YYYY'),ROW(6,'YYYY')]
Row2 => id = 2, data = ARRAY[ROW(10,'YYYY'),ROW(11,'YYYY')]
Now i want to update tbl where id is 2 and set the value of one of the tbl.data elements to 'XXXX' where the user_id of element is equal to 11
In fact, the final result of Row2 will be this:
Row2 => id = 2, data = ARRAY[ROW(10,'YYYY'),ROW(11,'XXXX')]
If you know the value value, you can use the array_replace() function to make the change:
UPDATE tbl
SET data_list = array_replace(data_list, (11, 'YYYY')::mydata_t, (11, 'XXXX')::mydata_t)
WHERE id = 2
If you do not know the value value then the situation becomes more complex:
UPDATE tbl SET data_list = data_arr
FROM (
-- UPDATE doesn't allow aggregate functions so aggregate here
SELECT array_agg(new_data) AS data_arr
FROM (
-- For the id value, get the data_list values that are NOT modified
SELECT (user_id, value)::mydata_t AS new_data
FROM tbl, unnest(data_list)
WHERE id = 2 AND user_id != 11
UNION
-- Add the values to update
VALUES ((11, 'XXXX')::mydata_t)
) x
) y
WHERE id = 2
You should keep in mind, though, that there is an awful lot of work going on in the background that cannot be optimised. The array of mydata_t values has to be examined from start to finish and you cannot use an index on this. Furthermore, updates actually insert a new row in the underlying file on disk and if your array has more than a few entries this will involve substantial work. This gets even more problematic when your arrays are larger than the pagesize of your PostgreSQL server, typically 8kB. All behind the scene so it will work, but at a performance penalty. Even though array_replace sounds like changes are made in-place (and they indeed are in memory), the UPDATE command will write a completely new tuple to disk. So if you have 4,000 array elements that means that at least 40kB of data will have to be read (8 bytes for the mydata_t type on a typical system x 4,000 = 32kB in a TOAST file, plus the main page of the table, 8kB) and then written to disk after the update. A real performance killer.
As #klin pointed out, this design may be more trouble than it is worth. Should you make data_list as table (as I would do), the update query becomes:
UPDATE data_list SET value = 'XXXX'
WHERE id = 2 AND user_id = 11
This will have MUCH better performance, especially if you add the appropriate indexes. You could then still create a view to publish the data in an aggregated form with a custom type if your business logic so requires.

Delete a value from jsonb array data having no key in postgresql

Table structure is:
CREATE TABLE mine_check.meta
(
sl_no bigserial NOT NULL,
tags jsonb NOT NULL DEFAULT '[]'::jsonb
);
Table looks like
sl.no tags
1 [120,450]
2 [120]
3 [450,980,120]
4 [650]
I need to delete 120 from the tags column - having no key
I tried reading many places - there they had key to update or delete.
How should I progress ?
I am afraid that it has to be done the hard way - unnest the JSONB array, select and filter from it and aggregate back into a JSONB array.
select sl_no,
(
select jsonb_agg(e::integer)
from jsonb_array_elements_text(tags) e
where e <> 120::text
) tags
from mine_check.meta;

Select rows in postgres table where an array field contains NULL

Our system uses postgres for its database.
We have queries that can select rows from a database table where an array field in the table contains a specific value, e.g.:
Find which employee manages the employee with ID 123.
staff_managed_ids is a postgres array field containing an array of the employees that THIS employee manages.
This query works as expected:
select *
from employees
where 123=any(staff_managed_ids)
We now need to query where an array field contains a postgres NULL. We tried the following query, but it doesn't work:
select *
from employees
where NULL=any(staff_managed_ids)
We know the staff_managed_ids array field contains NULLs from other queries.
Are we using NULL wrongly?
NULL can not be compared using =. The only operators that work with that are IS NULL and IS NOT NULL.
To check for nulls, you need to unnest the elements:
select e.*
from employees e
where exists (select *
from unnest(e.staff_managed_ids) as x(staff_id)
where x.staff_id is null);
if all your id values are positive, you could write something like this:
select *
from employees
where (-1 < all(staff_managed_ids)) is null;
how this works is that -1 should be less than all values, however comparison with null will make the whole array comparison expression null.

Use sum function in calculated column

Is it possible to use a sum function in a calculated column?
If yes, I would like to create a calculated column, that calculates the sum of a column in the same table where the date is smaller than the date of this entry. is this possible?
And last, would this optimize repeated calls on this value over the exemplified view below?
SELECT ProductGroup, SalesDate, (
SELECT SUM(Sales)
FROM SomeList
WHERE (ProductGroup= KVU.ProductGroup) AND (SalesDate<= KVU.SalesDate)) AS cumulated
FROM SomeList AS KVU
Is it possible to use a sum function in a calculated column?
Yes, it's possible using a scalar valued function (scalar UDF) for you computed column but this would be a disaster. Using scalar UDFs for computed columns destroy performance. Adding a scalar UDF that accesses data (which would be required here) makes things even worse.
It sounds to me like you just need a good ol' fashioned index to speed things up. First some sample data:
IF OBJECT_ID('dbo.somelist','U') IS NOT NULL DROP TABLE dbo.somelist;
GO
CREATE TABLE dbo.somelist
(
ProductGroup INT NOT NULL,
[Month] TINYINT NOT NULL CHECK ([Month] <= 12),
Sales DECIMAL(10,2) NOT NULL
);
INSERT dbo.somelist
VALUES (1,1,22),(2,1,45),(2,1,25),(2,1,19),(1,2,100),(1,2,200),(2,2,50.55);
and the correct index:
CREATE NONCLUSTERED INDEX nc_somelist ON dbo.somelist(ProductGroup,[Month])
INCLUDE (Sales);
With this index in place this query would be extremely efficient:
SELECT s.ProductGroup, s.[Month], SUM(s.Sales)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
If you needed to get a COUNT by month & product group you could create an indexed view like so:
CREATE VIEW dbo.vw_somelist WITH SCHEMABINDING AS
SELECT s.ProductGroup, s.[Month], TotalSales = COUNT_BIG(*)
FROM dbo.somelist AS s
GROUP BY s.ProductGroup, s.[Month];
GO
CREATE UNIQUE CLUSTERED INDEX uq_cl__vw_somelist ON dbo.vw_somelist(ProductGroup, [Month]);
Once that indexed view was in place your COUNTs would be pre-aggregated. You cannot, however, include SUM in an indexed view.

postgresql : search records based on array field vaule with multiple values

I have a table that has an array field.
CREATE TABLE notifications
(
id integer NOT NULL DEFAULT nextval('notifications_id_seq'::regclass),
title character(100) COLLATE pg_catalog."default" NOT NULL,
tags text[] COLLATE pg_catalog."default",
CONSTRAINT notifications_pkey PRIMARY KEY (id)
)
and tags field can have multiple values from
["a","b","c","d"]
now I want all the records for which tags have a or d ("a","d")array values.
I can use postgresl in but this can be used to search single value. How can I achieve this?
You could use ANY:
SELECT *
FROM notifications
WHERE 'a' = ANY(tags) OR 'b' = ANY(tags);
DBFiddle Demo
If the values 'a' and 'b' are static (you only need to check for those 2 values in every query), then you can go with the solution that Lukasz Szozda provided.
But if the values you want to check for are dynamic and are different in multiple queries(sometimes it is {'a','b'} but sometimes it is {'b', 'f','m'}) you can create an intersection of both of the arrays and check if the intersection is empty.
For example:
If we have the following table and data:
CREATE TABLE test_table_1(description TEXT, tags TEXT[]);
INSERT INTO test_table_1(description, tags) VALUES
('desc1', array['a','b','c']),
('desc2', array['c','d','e']);
If we want to get all of the rows from test_table_1 that have one of the following tags b, f, or m, we could do it with the following query:
SELECT * FROM test_table_1 tt1
WHERE array_length((SELECT array
(
SELECT UNNEST(tt1.tags)
INTERSECT
SELECT UNNEST(array['b','f','m'])
)), 1) > 0;
In the query above we use array_length to check if the intersection is empty.
Writing the query this way can also be useful if you want to add additional constraint to the number of matched tags.
For example if you want to get all of the rows that have at least 2 tags from the group {'a','b','c'} you just need to set array_length(...) > 1