finding a given item value in an array of a given column - postgresql

I have a table like this:
mTable:
|id | text[] mVals |
|1 | {"a,b"} |
|2 | {"a"} |
|3 | {"b"} |
I'd like a query to return both rows {a,b},{b} If I specify only b, but, it doesn't really return the row having atleast one of the specified values and only returns rows with the value specified.
I have tried this:
SELECT mVals
FROM mTable
WHERE ARRAY['a'] && columnarray; -- Returns only {'a'}
Also tried:
SELECT mVals
FROM mTable
WHERE mVals && '{"a"}'; -- Returns only {'a'}
Nothing seems to be working as it should. What could be the issue?

to me it looks is working as expected, recreating your case with
create table test(id int, mvals text[]);
insert into test values(1, '{a}');
insert into test values(2, '{a,b}');
insert into test values(3, '{b}');
A query similar to the 1st one you posted works
SELECT mVals
FROM test
WHERE ARRAY['a'] && mvals;
Results
mvals
-------
{a}
{a,b}
(2 rows)
and with b instead of a
mvals
-------
{a,b}
{b}
(2 rows)
P.S.: you should probably use the contain operator #> to check if a value (or an array) is contained in another array

Related

How to filter NULLs based on whether a column value has only one record or not, T-SQL

I have a dataset not too dissimilar from this:
data set
Or seen here:
| RECORD_VALUE | RECORD_ATTRIBUTE |
| :--- | :--- |
| ABC | NULL |
| DEF | 123 |
| DEF | 456 |
| GHI | NULL |
| GHI | 789 |
From that picture, I would like to filter such that the row for record "ABC" is kept, but that the record with a "NULL" value in Col_B for record "GHI" is removed. Basically, for records that do have a value other than "NULL" in Col_B, I only want the record(s) with values. But for records that only have an associated "NULL" value in Col_B, I want to keep the entire record.
I would appreciate any ideas! Thanks!
There are a couple of ways to skin this cat.
One method is to read the data first to get the ones with NOT NULL in Col_B, then join back to the table to get the relevant values.
However, this approach involves one read of the data - and then processing based on that (saves doing 2 table reads).
WITH A AS
(SELECT Col_A,
Col_B,
MAX(Col_B) OVER (PARTITION BY Col_A) AS Max_ColB
FROM yourtable
)
SELECT A.Col_A, A.Col_B
FROM A
WHERE (Max_ColB IS NULL)
OR (Max_ColB IS NOT NULL AND Col_B IS NOT NULL);
The above finds the maximum value of Col_B for each value in Col_A, which will be NULL if all the values of Col_B are NULL, else will be an actual value.
Then the main part sorts through those - reporting all rows when the Max_ColB is NULL, or only reporting lines where Col_B is NOT NULL is the Max_ColB has a value.
Here is a db<>fiddle with your example data and results. It also includes the CTE component broken out (the WITH... part) showing its results.

Create table with for loop postgresql

I have a function test_func() that takes in 1 argument (let's say the argument name is X) and returns a table. Now, I have a list of inputs (from a subquery) that I want to pass into argument X and collect all the results of the calls in a table.
In Python, I would do something like
# create empty list
all_results = []
for argument in (1,2,3):
result = test_func(argument)
# Collect the result
all_results.append(result)
return all_results
How can I do the same thing in postgresql?
Thank you.
For the sake of example, my test_func(X) takes in 1 argument and spits out a table with 3 columns. The values for col1 is X, col2 is X+1 and col3 is X+3. For example:
select * from test_func(1)
gives
|col1|col2|col3|
----------------
| 1 | 2 | 3 |
----------------
My list of arguments would be results of a subquery, for example:
select * from (values (1), (2)) x
I expect something like:
|col1|col2|col3|
----------------
| 1 | 2 | 3 |
----------------
| 2 | 3 | 4 |
----------------
demo:db<>fiddle
This gives you a result list of all results:
SELECT
mt.my_data as input,
tf.*
FROM
(SELECT * FROM my_table) mt, -- input data from a subquery
test_func(my_data) tf -- calling every data set as argument
In the fiddle the test_func() gets an integer and generates rows (input argument = generated row count). Furthermore, it adds a text column. For all inputs all generated records are unioned into one result set.
You can join your function to the input values:
select f.*
from (
values (1), (2)
) as x(id)
cross join lateral test_func(x.id) as f;

PostgreSQL JSONB grouping array values inside a hash

We have a PostgreSQL jsonb column containing hashes which in turn contain arrays of values:
id | hashes
---------------
1 | {"sources"=>["a","b","c"], "ids"=>[1,2,3]}
2 | {"sources"=>["b","c","d","e","e"], "ids"=>[1,2,3]}
What we'd like to do is create a jsonb query which would return
code | count
---------------
"a" | 1
"b" | 2
"c" | 2
"d" | 1
"e" | 2
we've been trying something along the lines of
SELECT jsonb_to_recordset(hashes->>'sources')
but that's not working - any help with this hugely appreciated...
The setup (should be a part of the question, note the proper json syntax):
create table a_table (id int, hashes jsonb);
insert into a_table values
(1, '{"sources":["a","b","c"], "ids":[1,2,3]}'),
(2, '{"sources":["b","c","d","e","e"], "ids":[1,2,3]}');
Use the function jsonb_array_elements():
select code, count(code)
from
a_table,
jsonb_array_elements(hashes->'sources') sources(code)
group by 1
order by 1;
code | count
------+-------
"a" | 1
"b" | 2
"c" | 2
"d" | 1
"e" | 2
(5 rows)
SELECT h, count(*)
FROM (
SELECT jsonb_array_elements_text(hashes->'sources') AS h FROM mytable
) sub
GROUP BY h
ORDER BY h;
We finally got this working this way:
SELECT jsonb_array_elements_text(hashes->'sources') as s1,
count(jsonb_array_elements_text(hashes->'sources'))
FROM a_table
GROUP BY s1;
but Klin's solution is more complete and both Klin and Patrick got there quicker than us (thank you both) - so points go to them.

Complex TSQL MultiRow Insert with OutPut

I have a temp table as follows
DECLARE #InsertedRows TABLE (RevId INT, FooId INT)
I also have two other tables
Foo(FooId INT, MyData NVarchar(20))
Revisions(RevId INT, CreatedTimeStamp DATETIME)
For each row in Foo, I need to a) insert a row into Revisions and b) insert a row into #InsertedRows with the corresponding Id values from Foo and Revisions.
I've tried writing something using the Insert Output Select as follows:
INSERT INTO Revisions (CURRENT_TIMESTAMP)
OUTPUT Inserted.RevId, Foo.FooId INTO #InsertedRows
SELECT FooId From Foo
However, Foo.Id is not allowed in the Output column list. Also, the Id returned in the SELECT isn't inserted into the table, so that's another issue.
How can I resolve this?
You cannot reference the FROM table in an OUTPUT clause with an INSERT statement. You can only do this with a DELETE, UPDATE, or MERGE statement.
From the MSDN page on the OUTPUT clause (https://msdn.microsoft.com/en-us/library/ms177564.aspx)
from_table_name Is a column prefix that specifies a table included in
the FROM clause of a DELETE, UPDATE, or MERGE statement that is used
to specify the rows to update or delete.
You can use a MERGE statement to accomplish what you are asking.
In the below example, I changed the tables to be all variable tables so that this could be run as an independent query and I changed the ID columns to IDENTITY columns which increment differently to illustrate the relationship.
The ON clause (1=0) will always evaluate to NOT MATCHED. This means that all records in the USING statement will be used to insert into the target table. Additionally the FROM table in the USING statement will be available to use in the OUTPUT statement.
DECLARE #Foo TABLE (FooId INT IDENTITY(1,1), MyData NVarchar(20))
DECLARE #Revisions TABLE (RevId INT IDENTITY(100,10), CreatedTimeStamp DATETIME)
DECLARE #InsertedRows TABLE (RevId INT, FooId INT)
INSERT INTO #Foo VALUES ('FooData1'), ('FooData2'), ('FooData3')
MERGE #Revisions AS [Revisions]
USING (SELECT FooId FROM #Foo) AS [Foo]
ON (1=0)
WHEN NOT MATCHED THEN
INSERT (CreatedTimeStamp) VALUES (CURRENT_TIMESTAMP)
OUTPUT INSERTED.RevId, Foo.FooId INTO #InsertedRows;
SELECT * FROM #Foo
SELECT * FROM #Revisions
SELECT * FROM #InsertedRows
Table results from above query
#Foo table
+-------+----------+
| FooId | MyData |
+-------+----------+
| 1 | FooData1 |
| 2 | FooData2 |
| 3 | FooData3 |
+-------+----------+
#Revisions table
+-------+-------------------------+
| RevId | CreatedTimeStamp |
+-------+-------------------------+
| 100 | 2016-03-31 14:48:39.733 |
| 110 | 2016-03-31 14:48:39.733 |
| 120 | 2016-03-31 14:48:39.733 |
+-------+-------------------------+
#InsertedRows table
+-------+-------+
| RevId | FooId |
+-------+-------+
| 100 | 1 |
| 110 | 2 |
| 120 | 3 |
+-------+-------+

Adding the results of two select queries into one table row with PostgreSQL

I am attempting to return the result of two distinct select statements into one row in PostgreSQL. For example, I have two queries each that return the same number of rows:
Select tableid1, tableid2, tableid3 from table1
+----------+----------+----------+
| tableid1 | tableid2 | tableid3 |
+----------+----------+----------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+----------+----------+----------+
Select table2id1, table2id2, table2id3, table2id4 from table2
+-----------+-----------+-----------+-----------+
| table2id1 | table2id2 | table2id3 | table2id4 |
+-----------+-----------+-----------+-----------+
| 7 | 8 | 9 | 15 |
| 10 | 11 | 12 | 19 |
+-----------+-----------+-----------+-----------+
Now i want to concatenate these tables keeping the same number of rows. I do not want to join on any values. The desired result would look like the following:
+----------+----------+----------+-----------+-----------+-----------+-----------+
| tableid1 | tableid2 | tableid3 | table2id1 | table2id2 | table2id3 | table2id4 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
| 1 | 2 | 3 | 7 | 8 | 9 | 15 |
| 4 | 5 | 6 | 10 | 11 | 12 | 19 |
+----------+----------+----------+-----------+-----------+-----------+-----------+
What can I do to the two above queries (select * from table1) and (select * from table2) to return the desired result above.
Thanks!
You can use row_number() for join, but I'm not sure that you have guaranties that order of the rows will stay the same as in the tables. So it's better to add some order into over() clause.
with cte1 as (
select
tableid1, tableid2, tableid3, row_number() over() as rn
from table1
), cte2 as (
select
table2id1, table2id2, table2id3, table2id4, row_number() over() as rn
from table2
)
select *
from cte1 as c1
inner join cte2 as c2 on c2.rn = c1.rn
You can't have what you want, as you wrote the question. Your two SELECTs don't have any ORDER BY clause, so the database can return the rows in whatever order it feels like. If it currently matches up, it does so only by accident, and will stop matching up as soon as you UPDATE a row.
You need a key column. Then you need to join on the key column. Anything else is attempting to invent unreliable and unsafe joins without actually using a join.
Frankly, this seems like a pretty dodgy schema. Lots of numbered integer columns like this, and the desire to concatenate them, may be a sign you should be looking at using integer arrays, or using a side-table with a foreign key relationship, instead.
Sample data in case anyone else wants to play:
CREATE TABLE table1(tableid1 integer, tableid2 integer, tableid3 integer);
INSERT INTO table1 VALUES (1,2,3), (4,5,6);
CREATE TABLE table2(table2id1 integer, table2id2 integer, table2id3 integer, table2id4 integer);
INSERT INTO table2 VALUES (7,8,9,15), (10,11,12,19);
Depending on what you're actually doing you might really have wanted arrays.
I think you might need to read these two posts:
Join 2 sets based on default order
How keep data don't sort?
which explain that SQL tables just don't have an order. So you cannot fetch them in a particular order.
DO NOT USE THE FOLLOWING CODE, IT IS DANGEROUS AND ONLY INCLUDED AS A PROOF OF CONCEPT:
As it happens you can use a set-returning function hack to very inefficiently do what you want. It's incredibly ugly and *completely unsafe without an ORDER BY in the SELECTs, but I'll include it for completeness. I guess.
CREATE OR REPLACE FUNCTION t1() RETURNS SETOF table1 AS $$ SELECT * FROM table1 $$ LANGUAGE sql;
CREATE OR REPLACE FUNCTION t2() RETURNS SETOF table2 AS $$ SELECT * FROM table2 $$ LANGUAGE sql;
SELECT (t1()).*, (t2()).*;
If you use this in any real code then kittens will cry. It'll produce insane and bizarre results if the number of rows in the tables differ and it'll produce the rows in orderings that might seem right at first, but will randomly start coming out wrong later on.
THE SANE WAY is to add a primary key properly, then do a join.