POSGTRESQL 9.10 - returning the maximum value from a JSON arrays - postgresql

Looking for a method to calculate the maximum value is an numeric arrays contained in a json array using postgresql.
Simple example:
room, data
1 , '{"history":{"samples":{"101":[5,10,50,20],"102":[10,15,5,5]}}}'
What I'm looking for is the maximum value for a particular "history -> sample" item for a room. This this case, it would be "50" for sample 101 and "15" for sample 102 but the real data is larger than this.
Here is sqlfiddle to some actual data. http://sqlfiddle.com/#!17/2c7a0
Ultimately, I would like to end up with a pivot with the room and samples as columns with the maximum value in that array. Is there a fairly simple way to do this with the large number of elements in the arrays? (crosstab or cross lateral join?) Something like the following based on the simple example from above:
room | 101 | 102 | ... ->
1 | 50 | 15
2 | x | x
etc..
..
again, see sqlfiddle for sample data

You could use LATERAL and json_array_elements:
SELECT j.id, s2.*
FROM jsonData j
,LATERAL (SELECT (data -> 'history') -> 'data' ) s(c)
,LATERAL ( VALUES(
(SELECT MAX(value::text::decimal(10,2))
FROM json_array_elements((s.c -> '101')::json) x),
(SELECT MAX(value::text::decimal(10,2))
FROM json_array_elements((s.c -> '102')::json) x))
)s2("101","102"); -- are more cols here
DBFiddle Demo

This is not a complete answer but it may help getting you close to what you're looking for:
select key, data->'history'->'data' #> array[key] as values
from
(select *, jsonb_object_keys(data->'history'->'data') as key
from jsonData) as a
Output:
See fiddle demo
You can select only a single room and do all the work on it, then it's easier:
select key, max(val::text::float) from
(
select key, jsonb_array_elements(values) as val
from
(select key, data->'history'->'data' #> array[key] as values
from
(select *, jsonb_object_keys(data->'history'->'data') as key
from jsonData) as a)
as b
) as c
group by key
order by 1
Fiddle demo
output:
And if you want to display it in horizontal way instead of vertical, you can use crosstab (tablefunc)

Related

Using TSQL to perform calculations

Ive got a table called NewCodes with the following records
| NewCode | Mapping |
| -------- | -------------- |
| pp1 | [US1] + [US5] |
| qq1 | [US8] – [US9] |
| ww1 | [RE5] + RE6] + [RE7] |
| zx1 | [KJ1] – [XC4] |
Ive got another table called Source Codes which contains a list of values assigned to all the code in the mapping column.
Code
Value
US1
35
US5
10
US8
20
US9
5
RE5
7
RE6
8
RE7
6
I am trying to figure out a way of assigning a value to the codes in the NewCode column using the calculations defined in the Mapping column. I currently use SSMS. So for example.
I have no idea how to attempt this and I was wondering anyone could help
As long as the production doesn't get too much more complicated than the example then this can be done. Specifically:
Only addition and subtraction can be performed, or at least there is no concern for order of operations.
The expressions are all well and consistently formed.
All variables exist in SourceCodes. (This could be overcome using a LEFT JOIN and providing a default value like 0).
The level of the SQL Server supports string_split. (Though I used to split with xml back in the day so this can be overcome.)
The following query will do the following.
Split each Mapping into a table of symbols.
Determine the proper order of symbols since string_split is non-deterministic.
Normalize the symbols so the codes in brackets will match what is found in SourceCodes.
Accumulate the result for each new code.
Return the accumulated result in the last row for each new code partition.
The secret sauce in this solution is the use of recursive CTEs to act like for loops. The first instance is used when determining the order of symbols. In order to determine the start index for successive occurrences of the same symbol the unioned part of the CTE gets the char index from the previous. The second instance in a similar fashion to accumulate values except it relies on the convention that an operator appears on every even row and code on every odd one.
WITH Symbols AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY NewCode, Symbol ORDER BY Symbol) [SymbolSeqNum]
FROM NewCodes
CROSS APPLY (
SELECT value [Symbol]
FROM string_split( Mapping, ' ')
) x
)
, UnnormalizedOrderedSymbols AS (
-- since string_split is nondeterministic we need a way to restore the order.
SELECT NewCode, Symbol, SymbolSeqNum, CHARINDEX(Symbol, Mapping, 1) SymbolOrderIndex
FROM Symbols
WHERE SymbolSeqNum = 1
UNION ALL
SELECT s.NewCode, s.Symbol, s.SymbolSeqNum, CHARINDEX(s.Symbol, s.Mapping, os.SymbolOrderIndex + 1) SymbolOrderIndex
FROM UnnormalizedOrderedSymbols os
INNER JOIN Symbols s ON s.NewCode = os.NewCode AND s.Symbol = os.Symbol AND s.SymbolSeqNum = os.SymbolSeqNum + 1
)
, NormalizedOrderedSymbols AS (
SELECT NewCode
, CASE SymbolType WHEN 'Code' THEN SUBSTRING(Symbol, 2, LEN(Symbol) - 2) ELSE Symbol END [Symbol]
, SymbolType
, ROW_NUMBER() OVER (PARTITION BY NewCode ORDER BY SymbolOrderIndex) [SymbolOrderIndex]
FROM UnnormalizedOrderedSymbols
CROSS APPLY (
SELECT CASE WHEN Symbol LIKE '[[]%]' THEN 'Code' ELSE 'Operator' END [SymbolType]
) x
)
, RunningTotal AS (
SELECT NewCode, c.Value, SymbolOrderIndex
FROM NormalizedOrderedSymbols o
INNER JOIN SourceCodes c ON c.Code = o.Symbol
WHERE o.SymbolOrderIndex = 1
UNION ALL
SELECT rt.NewCode
, CASE op.Symbol
WHEN '+' THEN rt.Value + c.Value
WHEN '-' THEN rt.Value - c.Value
END
, num.SymbolOrderIndex
FROM RunningTotal rt
INNER JOIN NormalizedOrderedSymbols op ON op.NewCode = rt.NewCode AND op.SymbolOrderIndex = rt.SymbolOrderIndex + 1
INNER JOIN NormalizedOrderedSymbols num ON num.NewCode = rt.NewCode AND num.SymbolOrderIndex = rt.SymbolOrderIndex + 2
INNER JOIN SourceCodes c ON c.Code = num.Symbol
)
SELECT x.NewCode, x.Value
FROM (
SELECT rt.NewCode, rt.Value, ROW_NUMBER() OVER (PARTITION BY rt.NewCode ORDER BY SymbolOrderIndex DESC) rn
FROM RunningTotal rt
) x
WHERE x.rn = 1
ORDER BY NewCode
This is obviously is not a very good use of SQL Server and you're probably better off writing a script to perform whatever you're trying to accomplish.

Does String Value Exists in a List of Strings | Redshift Query

I have some interesting data, I'm trying to query however I cannot get the syntax correct. I have a temporary table (temp_id), which I've filled with the id values I care about. In this example it is only two ids.
CREATE TEMPORARY TABLE temp_id (id bigint PRIMARY KEY);
INSERT INTO temp_id (id) VALUES ( 1 ), ( 2 );
I have another table in production (let's call it foo) which holds multiples those ids in a single cell. The ids column looks like this (below) with ids as a single string separated by "|"
ids
-----------
1|9|3|4|5
6|5|6|9|7
NULL
2|5|6|9|7
9|11|12|99
I want to evaluate each cell in foo.ids, and see if any of the ids in match the ones in my temp_id table.
Expected output
ids |does_match
-----------------------
1|9|3|4|5 |true
6|5|6|9|7 |false
NULL |false
2|5|6|9|7 |true
9|11|12|99 |false
So far I've come up with this, but I can't seem to return anything. Instead of trying to create a new column does_match I tried to filter within the WHERE statement. However, the issue is I cannot figure out how to evaluate all the id values in my temp table to the string blob full of the ids in foo.
SELECT
ids,
FROM foo
WHERE ids = ANY(SELECT LISTAGG(id, ' | ') FROM temp_ids)
Any suggestions would be helpful.
Cheers,
this would work, however not sure about performance
SELECT
ids
FROM foo
JOIN temp_ids
ON '|'||foo.ids||'|' LIKE '%|'||temp_ids.id::varchar||'|%'
you wrap the IDs list into a pair of additional separators, so you can always search for |id| including the first and the last number
The following SQL (I know it's a bit of a hack) returns exactly what you expect as an output, tested with your sample data, don't know how would it behave on your real data, try and let me know
with seq AS ( # create a sequence CTE to implement postgres' unnest
select 1 as i union all # assuming you have max 10 ids in ids field,
# feel free to modify this part
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10)
select distinct ids,
case # since I can't do a max on a boolean field, used two cases
# for 1s and 0s and converted them to boolean
when max(case
when t.id in (
select split_part(ids,'|',seq.i) as tt
from seq
join foo f on seq.i <= REGEXP_COUNT(ids, '|') + 1
where tt != '' and k.ids = f.ids)
then 1
else 0
end) = 1
then true
else false
end as does_match
from temp_id t, foo
group by 1
Please let me know if this works for you!

psql/redshift: is there a way to use window functions like FIRST_VALUE in a GROUP BY expression?

motivation: This seems kind of terrible, but I'm trying to write string_agg in Redshift using multiple queries, which will coalesce neighboring rows. My maximum group size isn't that big, so I think the query would only run for a few iterations. I've managed to preprocess my data in a form that's like,
key | merge index | value
a | 0 | foo
a | 0 | bar
a | 1 | baz
b | 0 | fandangle
in one step, everything with the same (key, merge_index) should be concatenated, so we get,
key | merge index | value
a | 0 | foo, bar
a | 1 | baz
b | 0 | fandangle
I want to use first_value and last_value in a GROUP BY statement like so,
SELECT key,
merge_index,
FIRST_VALUE(value) || COALESCE((', ' || NTH_VALUE(value, 2)), '')
GROUP BY key, merge_index;
but, of course, you can't do that because FIRST_VALUE and NTH_VALUE are window functions, not aggregate functions.
question: Why can't I use FIRST_VALUE and friends in a GROUP BY group?
note: It works functionally to do a SELECT DISTINCT, omit the GROUP BY, and use the relevant OVER (PARTITION BY key, merge_index) windows, but I can't imagine this is efficient if it's trying to deduplicate the entire result table. I also realize I could do more preprocessing and add a column like left_or_right which indicates which side it's trying to merge, and then use a left join. That also doesn't seem too efficient, but maybe it's not bad.
I like David's queries, but he didn't get into why:
Window functions are last part of the query to be executed, after the grouping and ordering. Because of this, a window function always outputs one value per record in the final data set. You can use aggregates inside window functions, but not window functions inside aggregates. To achieve your goal, you need to do another pass over the data set to aggregate, which is accomplished with a subquery.
Have you tried something like the following? This way you can avoid FIRST_VALUE() and NTH_VALUE() as well as aggregation:
WITH p AS (
SELECT key, merge, value
, ROW_NUMBER() OVER ( PARTITION BY key, merge ) AS rn
FROM mytable
)
SELECT p1.key, p1.merge, p1.value || p1.value || COALESCE(',' || p2.value, '')
FROM p p1 LEFT JOIN p p2
ON p1.key = p2.key
AND p1.merge = p2.merge
AND p2.rn = 2
WHERE p1.rn = 1
Please see SQL Fiddle demo here. Yes, I did use Postgres 9 for the fiddle; I couldn't get a connection on 8 (but I don't think I'm using any features of 9).
Alternately, you might use the following and avoid a self-join:
WITH p AS (
SELECT key, merge, value
, LEAD(value) OVER ( PARTITION BY key, merge ) AS next_value
, ROW_NUMBER() OVER ( PARTITION BY key, merge ) AS rn
FROM mytable
)
SELECT key, merge, value || COALESCE(',' || next_value, '')
FROM p
WHERE rn = 1
SQL Fiddle here. If you knew in advance how many values you needed to concatenate, you could make multiple calls to LEAD() with increasing offset values (more SQL Fiddle):
WITH p AS (
SELECT key, merge, value
, LEAD(value) OVER ( PARTITION BY key, merge ) AS next_value
, LEAD(value,2) OVER ( PARTITION BY key, merge ) AS n2_value
, LEAD(value,3) OVER ( PARTITION BY key, merge ) AS n3_value
, ROW_NUMBER() OVER ( PARTITION BY key, merge ) AS rn
FROM mytable
)

SUM of COUNTs in the same table

I'm doing many counts that I want to show in a table. And I want in the same table to show the sum of all counts.
Here's what I got (simplified - I got 6 Counts):
SELECT * FROM (SELECT COUNT() AS NB_book
item as a1, metadatavalue as m1, metadatavalue as m12,
WHERE m1.field_id = 64 (because I need that field to exist)
AND m2.field_id = 66
And m2. = book
AND a1.in_archive = TRUE )
(SELECT COUNT() AS NB_toys
metadatavalue as m1, metadatavalue as m12,
WHERE m1.field_id = 64 (because I need that field to exist)
AND m2.field_id = 66
And m2. = toys
AND a1.in_archive = TRUE)
)
Now, I want the display to be like
-------------table ----------
|NB_book | NB_Toys | total_object |
-----------------------------
| 12 | 10 | 22 |
You want something along the lines of:
SELECT
sum(CASE WHEN condition_1 THEN 1 END) AS firstcount,
sum(CASE WHEN condition_2 THEN 1 END) AS secondcount,
sum(thecolumn) AS total
FROM ...
Your example query is too vague to construct something usable from, but this'll give you the idea. The conditions above can be any boolean expression.
If you prefer you can use NULLIF instead of CASE WHEN ... THEN ... END. I prefer to stick to the standard CASE.
It is difficult to figure out what you actually want. You can run completely different queries that each return a one-row result and combine the results like this:
select
(select count(*) from pgbench_accounts) as count1,
(select count(*) from pgbench_tellers) as count2 ;
But perhaps you shouldn't do that. Instead just run each query by itself and use the client, rather than the database engine, to format the results.

How to rank in postgres query

I'm trying to rank a subset of data within a table but I think I am doing something wrong. I cannot find much information about the rank() feature for postgres, maybe I'm looking in the wrong place. Either way:
I'd like to know the rank of an id that falls within a cluster of a table based on a date. My query is as follows:
select cluster_id,feed_id,pub_date,rank
from (select feed_id,pub_date,cluster_id,rank()
over (order by pub_date asc) from url_info)
as bar where cluster_id = 9876 and feed_id = 1234;
I'm modeling this after the following stackoverflow post: postgres rank
The reason I think I am doing something wrong is that there are only 39 rows in url_info that are in cluster_id 9876 and this query ran for 10 minutes and never came back. (actually re-ran it for quite a while and it returned no results, yet there is a row in cluster 9876 for id 1234) I'm expecting this will tell me something like "id 1234 was 5th for the criteria given). It will return a relative rank according to my query constraints, correct?
This is postgres 8.4 btw.
By placing the rank() function in the subselect and not specifying a PARTITION BY in the over clause or any predicate in that subselect, your query is asking to produce a rank over the entire url_info table ordered by pub_date. This is likely why it ran so long as to rank over all of url_info, Pg must sort the entire table by pub_date, which will take a while if the table is very large.
It appears you want to generate a rank for just the set of records selected by the where clause, in which case, all you need do is eliminate the subselect and the rank function is implicitly over the set of records matching that predicate.
select
cluster_id
,feed_id
,pub_date
,rank() over (order by pub_date asc) as rank
from url_info
where cluster_id = 9876 and feed_id = 1234;
If what you really wanted was the rank within the cluster, regardless of the feed_id, you can rank in a subselect which filters to that cluster:
select ranked.*
from (
select
cluster_id
,feed_id
,pub_date
,rank() over (order by pub_date asc) as rank
from url_info
where cluster_id = 9876
) as ranked
where feed_id = 1234;
Sharing another example of DENSE_RANK() of PostgreSQL.
Find top 3 students sample query.
Reference taken from this blog:
Create a table with sample data:
CREATE TABLE tbl_Students
(
StudID INT
,StudName CHARACTER VARYING
,TotalMark INT
);
INSERT INTO tbl_Students
VALUES
(1,'Anvesh',88),(2,'Neevan',78)
,(3,'Roy',90),(4,'Mahi',88)
,(5,'Maria',81),(6,'Jenny',90);
Using DENSE_RANK(), Calculate RANK of students:
;WITH cteStud AS
(
SELECT
StudName
,Totalmark
,DENSE_RANK() OVER (ORDER BY TotalMark DESC) AS StudRank
FROM tbl_Students
)
SELECT
StudName
,Totalmark
,StudRank
FROM cteStud
WHERE StudRank <= 3;
The Result:
studname | totalmark | studrank
----------+-----------+----------
Roy | 90 | 1
Jenny | 90 | 1
Anvesh | 88 | 2
Mahi | 88 | 2
Maria | 81 | 3
(5 rows)