Does String Value Exists in a List of Strings | Redshift Query - amazon-redshift

I have some interesting data, I'm trying to query however I cannot get the syntax correct. I have a temporary table (temp_id), which I've filled with the id values I care about. In this example it is only two ids.
CREATE TEMPORARY TABLE temp_id (id bigint PRIMARY KEY);
INSERT INTO temp_id (id) VALUES ( 1 ), ( 2 );
I have another table in production (let's call it foo) which holds multiples those ids in a single cell. The ids column looks like this (below) with ids as a single string separated by "|"
ids
-----------
1|9|3|4|5
6|5|6|9|7
NULL
2|5|6|9|7
9|11|12|99
I want to evaluate each cell in foo.ids, and see if any of the ids in match the ones in my temp_id table.
Expected output
ids |does_match
-----------------------
1|9|3|4|5 |true
6|5|6|9|7 |false
NULL |false
2|5|6|9|7 |true
9|11|12|99 |false
So far I've come up with this, but I can't seem to return anything. Instead of trying to create a new column does_match I tried to filter within the WHERE statement. However, the issue is I cannot figure out how to evaluate all the id values in my temp table to the string blob full of the ids in foo.
SELECT
ids,
FROM foo
WHERE ids = ANY(SELECT LISTAGG(id, ' | ') FROM temp_ids)
Any suggestions would be helpful.
Cheers,

this would work, however not sure about performance
SELECT
ids
FROM foo
JOIN temp_ids
ON '|'||foo.ids||'|' LIKE '%|'||temp_ids.id::varchar||'|%'
you wrap the IDs list into a pair of additional separators, so you can always search for |id| including the first and the last number

The following SQL (I know it's a bit of a hack) returns exactly what you expect as an output, tested with your sample data, don't know how would it behave on your real data, try and let me know
with seq AS ( # create a sequence CTE to implement postgres' unnest
select 1 as i union all # assuming you have max 10 ids in ids field,
# feel free to modify this part
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10)
select distinct ids,
case # since I can't do a max on a boolean field, used two cases
# for 1s and 0s and converted them to boolean
when max(case
when t.id in (
select split_part(ids,'|',seq.i) as tt
from seq
join foo f on seq.i <= REGEXP_COUNT(ids, '|') + 1
where tt != '' and k.ids = f.ids)
then 1
else 0
end) = 1
then true
else false
end as does_match
from temp_id t, foo
group by 1
Please let me know if this works for you!

Related

POSGTRESQL 9.10 - returning the maximum value from a JSON arrays

Looking for a method to calculate the maximum value is an numeric arrays contained in a json array using postgresql.
Simple example:
room, data
1 , '{"history":{"samples":{"101":[5,10,50,20],"102":[10,15,5,5]}}}'
What I'm looking for is the maximum value for a particular "history -> sample" item for a room. This this case, it would be "50" for sample 101 and "15" for sample 102 but the real data is larger than this.
Here is sqlfiddle to some actual data. http://sqlfiddle.com/#!17/2c7a0
Ultimately, I would like to end up with a pivot with the room and samples as columns with the maximum value in that array. Is there a fairly simple way to do this with the large number of elements in the arrays? (crosstab or cross lateral join?) Something like the following based on the simple example from above:
room | 101 | 102 | ... ->
1 | 50 | 15
2 | x | x
etc..
..
again, see sqlfiddle for sample data
You could use LATERAL and json_array_elements:
SELECT j.id, s2.*
FROM jsonData j
,LATERAL (SELECT (data -> 'history') -> 'data' ) s(c)
,LATERAL ( VALUES(
(SELECT MAX(value::text::decimal(10,2))
FROM json_array_elements((s.c -> '101')::json) x),
(SELECT MAX(value::text::decimal(10,2))
FROM json_array_elements((s.c -> '102')::json) x))
)s2("101","102"); -- are more cols here
DBFiddle Demo
This is not a complete answer but it may help getting you close to what you're looking for:
select key, data->'history'->'data' #> array[key] as values
from
(select *, jsonb_object_keys(data->'history'->'data') as key
from jsonData) as a
Output:
See fiddle demo
You can select only a single room and do all the work on it, then it's easier:
select key, max(val::text::float) from
(
select key, jsonb_array_elements(values) as val
from
(select key, data->'history'->'data' #> array[key] as values
from
(select *, jsonb_object_keys(data->'history'->'data') as key
from jsonData) as a)
as b
) as c
group by key
order by 1
Fiddle demo
output:
And if you want to display it in horizontal way instead of vertical, you can use crosstab (tablefunc)

How to split a string in a smart way?

Function string_to_array splits strings without grouping substrings in apostrophes:
# select unnest(string_to_array('one, "two,three"', ','));
unnest
--------
one
"two
three"
(3 rows)
I would like to have a smarter function, like this:
# select unnest(smarter_string_to_array('one, "two,three"', ','));
unnest
--------
one
two,three
(2 rows)
Purpose.
I know that COPY command does it in a proper way, but I need this feature internally.
I want to parse a text representation of rows of existing table. Example:
# select * from dataset limit 2;
id | name | state
----+-----------------+--------
1 | Smith, Reginald | Canada
2 | Jones, Susan |
(2 rows)
# select dataset::text from dataset limit 2;
dataset
------------------------------
(1,"Smith, Reginald",Canada)
(2,"Jones, Susan","")
(2 rows)
I want to do it dynamically in a plpgsql function for different tables. I cannot assume constant number of columns of a table nor a format of columns values.
There is a nice method to transpose a whole table into a one-column table:
select (json_each_text(row_to_json(t))).value from dataset t;
If the column id is unique then
select id, array_agg(value) arr from (
select row_number() over() rn, id, value from (
select id, (json_each_text(row_to_json(t))).value from dataset t
) alias
order by id, rn
) alias
group by id;
gives you exactly what you want. Additional query with row_number() is necessary to keep original order of columns.

Finding exact matches to a requested set of values

Hi I'm facing a challenge. There is a table progress.
User_id | Assesment_id
-----------------------
1 | Test_1
2 | Test_1
3 | Test_1
1 | Test_2
2 | Test_2
1 | Test_3
3 | Test_3
I need to pull out the user_id who have completed only Test_1 & test_2 (i.e User_id:2). The input parameters would be the list of Assesment id.
Edit:
I want those who have completed all the assessments on the list, but no others.
User 3 did not complete Test_2, and so is excluded.
User 1 completed an extra test, and is also excluded.
Only User 2 has completed exactly those assessments requested.
You don't need a complicated join or even subqueries. Simply use the INTERSECT operator:
select user_id from progress where assessment_id = 'Test_1'
intersect
select user_id from progress where assessment_id = 'Test_2'
I interpreted your question to mean that you want users who have completed all of the tests in your assessment list, but not any other tests. I'll use a technique called common table expressions so that you can follow step by step, but it is all one query statement.
Let's say you supply your assessment list as rows in a table called Checktests. We can count those values to find out how many tests are needed.
If we use a LEFT OUTER JOIN then values from the right-side table will be null. So the test_matched column will be null if an assessment is not on your list. COUNT() ignores null values, so we can use this to find out how many tests were taken that were on the list, and then compare this to the number of all tests the user took.
with x as
(select count(assessment_id) as tests_needed
from checktests
),
dtl as
(select p.user_id,
p.assessment_id as test_taken,
c.assessment_id as test_matched
from progress p
left join checktests c on p.assessment_id = c.assessment_id
),
y as
(select user_id,
count(test_taken) as all_tests,
count(test_matched) as wanted_tests -- count() ignores nulls
from dtl
group by user_id
)
select user_id
from y
join x on y.wanted_tests = x.tests_needed
where y.wanted_tests = y.all_tests ;

Postgresql - Basic Arrays and array_agg

As a Test I created this schema:
CREATE TABLE simple_table (client_id int4, order_id int4);
INSERT INTO simple_table (client_id, order_id)
VALUES
(1,2),(1,3),(1,4),(1,6),(1,8),(1,12),(1,16),(1,18),(1,25),(1,32),(1,33),(1,37),(1,43),
(1,56),(1,57),(1,66),(2,2),(2,3),(2,5),(2,7),(2,9),(2,12),(2,17),(2,19),(2,22),(2,30),
(2,33),(2,38),(2,44),(2,56),(2,58),(2,66)
;
Then used array_agg:
SELECT client_id, array_agg(order_id) FROM simple_table GROUP BY client_id;
to create the arrays for client 1 and client 2:
| CLIENT_ID | ARRAY_AGG |
----------------------------------------------------------
| 1 | 2,3,4,6,8,12,16,18,25,32,33,37,43,56,57,66 |
| 2 | 2,3,5,7,9,12,17,19,22,30,33,38,44,56,58,66 |
Now I would like to compare the 2 rows and identify the like values. Tried && overlap (have elements in common) ARRAY[1,4,3] && ARRAY[2,1] from the Postgresql documentation but I am having problems.
Perhaps I am looking at this wrong. Any help or guidance would be appreciated!
The && operator is a predicate that yields a true or false result, not a list of values.
If you're looking for the list of order_id that exist for both client_id=1 and client_id=2, the query would be:
select order_id from simple_table where client_id in (1,2)
group by order_id having count(*)=2;
That's equivalent to the intersections of the two arrays if you consider that these arrays are sets (no duplicates and the positions of the values are irrelevant), except that you don't need to use arrays at all, simple standard SQL is good enough.
Take a look at the "array_intersect" functions here:
Array Intersect
To see elements that are not common to both arrays:
create or replace function arrxor(anyarray,anyarray) returns anyarray as $$
select ARRAY(
(
select r.elements
from (
(select 1,unnest($1))
union all
(select 2,unnest($2))
) as r (arr, elements)
group by 1
having min(arr) = max(arr)
)
)
$$ language sql strict immutable;

T-SQL query to find common features

I have this table:
ID Value
------------
1 car
1 moto
2 car
2 moto
3 moto
3 apple
4 gel
4 moto
5 NULL
note that moto is common to all IDs.
I would to obtain a single row with this result
car*, moto, apple*, gel*
i.e.
car, apple, gel with an asterisk because is present but NOT in all IDs
moto without an asterisk because is COMMON to all IDs
If ID + Value are Unique
SELECT Value, CASE WHEN COUNT(*) <> (SELECT COUNT(DISTINCT ID) FROM MyTable) THEN '*' ELSE '' END AS Asterisk FROM MyTable WHERE Value IS NOT NULL GROUP BY Value
Note that this won't group in a single line. And note that your question is wrong. ID 5 is an ID, so moto isn't common to all the IDs. It's common to all the IDs that have at least a value.
If we filter these IDs as written,
SELECT Value, CASE WHEN COUNT(*) <> (SELECT COUNT(DISTINCT ID) FROM MyTable WHERE Value IS NOT NULL) THEN '*' ELSE '' END FROM MyTable WHERE Value IS NOT NULL GROUP BY Value
To "merge" the * with Value, simply replace the , with a +, like:
SELECT Value + CASE WHEN COUNT(*) <> (SELECT COUNT(DISTINCT ID) FROM MyTable WHERE Value IS NOT NULL) THEN '*' ELSE '' END Value FROM MyTable WHERE Value IS NOT NULL GROUP BY Value
To make a single line use one of https://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/ I'll add that, sadly, tsql doesn't have any native method to do it, and all the alternatives are a little ugly :-)
In general, the string aggregation part is quite common on SO (and outside of it) Concatenate row values T-SQL, tsql aggregate string for group by, Implode type function in SQL Server 2000?, How to return multiple values in one column (T-SQL)? and too many others to count :-)