Postgres: Query Values in nested jsonb-structure with unknown keys - postgresql

I am quite new in working with psql.
Goal is to get values from a nested jsonb-structure where the last key has so many different characteristics it is not possible to query them explicitely.
The jsonb-structure in any row is as follows:
TABLE_Products
{'products':[{'product1':['TYPE'], 'product2':['TYPE2','TYPE3'], 'productN':['TYPE_N']}]}
I want to get the values (TYPE1, etc.) assigned to each product-key (product1, etc.). The product-keys are the unknown, because of too many different names.
My work so far achieves to pull out a tuple for each key:value-pair on the last level. To illustrate this here you can see my code and the results from the previously described structure.
My Code:
select url, jsonb_each(pro)
from (
select id , jsonb_array_elements(data #> '{products}') as pro
from TABLE_Products
where data is not null
) z
My result:
("product2","[""TYPE2""]")
("product2","[""TYPE3""]")
My questions:
Is there a way to split this tuple on two columns?
Or how can I query the values kind of 'unsupervised', so without knowing the exact names of 'product1 ... n'

Related

SSRS multi value parameter - can't get it to work

First off this is my first attempt at a multi select. I've done a lot of searching but I can't find the answer that works for me.
I have a postgresql query which has bg.revision_key in (_revision_key) which holds the parameter. A side note, we've named all our parameters in the queries with the underscore and they all work, they are single select in SSRS.
In my SSRS report I have a parameter called Revision Key Segment which is the multi select parameter. I've ticked Allow multi value and in Available Values I have value field pointing to revision_key in the dataset.
In my dataset parameter options I have Parameter Value [#revision_key]
In my shared dataset I also have my parameter set to Allow multi value.
For some reason I can't seem to get the multi select to work so I must be missing something somewhere but I've ran out of ideas.
Unlike with SQL Server, when you connect to a database using an ODBC connection, the parameter support is different. You cannot use named parameters and instead have to use the ? syntax.
In order to accommodate multiple values you can concatenate them into a single string and use a like statement to search them. However, this is inefficient. Another approach is to use a function to split the values into an in-line table.
In PostgreSQL you can use an expression like this:
inner join (select CAST(regexp_split_to_table(?, ',') AS int) as filter) as my on my.filter = key_column
Then in the dataset properties, under the parameters tab, use an expression like this to concatenate the values:
=Join(Parameters!Keys.Value, ",")
In other words, the report is concatenating the values into a comma-separated list. The database is splitting them into a table of integers then inner joining on the values.

Count all tables in one instance in kdb

I would like to count all tables in the same instance.
I have not used kdb for a while and I forgot how to make this work.
This is what I got:
tablelist:tables[]
{select count i from x} each tablelist
but I got a type error
Your statement doesn't contain a trailing semi colon ; at the end of the first line which will cause an error in an IDE like qpad (assuming you are running it as written).
If not running from an IDE I would check my hdb for any possible missing data and run some sanity checks (i.e can I select from each of my tables normally, do types match across partitions, i is a virtual column representing row count so issues with non-conforming types in your other columns is probably not a cause but investigating may yield the right answer)
One way to achieve what you're trying is (using dummy data):
q){flip select counts:count i,tab:1#x from x}each tablelist:tables[]
counts tab
-------------
5469 depth
3150 quotes
3005 trades
Here I select the count for each table, but also add on the name of the table, flip each result into a dictionary, which results in a list of dictionaries of conforming types and key names which is in fact a table, hence my result. In this way you have a nice way to track what you're actually counting.
Each select query you run is returning a table in the form:
x
-
3
It would be better to use exec as opposed to select to simply return the value of the count e.g:
q){exec count i from x} each tables[]
3 2
Your current method would be attempting to return a list of tables: e.g:
q){select count i from x} each tables[]
+(,`x)!,,3
+(,`x)!,,2
However, the type error makes me think there may be an issue with your tables as this should not error for in-memory tables.
Here's one way
count each `. tables[]
I am using 3.6 2018.05.17 and your expression worked for me. I then change the select to an exec to return just a list of counts.
q){exec count i from x} each tables[]
Below code helps us get the count of each table along with tablename.
q)flip (`table;`msgcount)! flip {x, count value x}#'tables[]
To get only the count and not the tablename along with it.
q){count value x}#'tables[]

Querying on multiple LINKMAP items with OrientDB SQL

I have a class that contains a LINKMAP field called links. This class is used recursively to create arbitrary hierarchical groupings (something like the time-series example, but not with the fixed year/month/day structure).
A query like this:
select expand(links['2017'].links['07'].links['15'].links['10'].links) from data where key='AAA'
Returns the actual records contained in the last layer of "links". This works exactly as expected.
But a query like this (note the 10,11 in the second to last layer of "links"):
select expand(links['2017'].links['07'].links['15'].links['10','11'].links) from data where key='AAA'
Returns two rows of the last layer of "links" instead:
{"1000":"#23:0","1001":"#24:0","1002":"#23:1"}
{"1003":"#24:1","1004":"#23:2"}
Using unionAll or intersect (with or without UNWIND) results in this single record:
[{"1000":"#23:0","1001":"#24:0","1002":"#23:1"},{"1003":"#24:1","1004":"#23:2"}]
But nothing I've tried (including various attempts at "compound" SELECTs) will get the expand to work as it does with the original example (i.e. return the actual records represented in the last LINKMAP).
Is there a SQL syntax that will achieve this?
Note: Even this (slightly modified) example from the ODB docs does not result in a list of linked records:
select expand(records) from
(select unionAll(years['2017'].links['07'].links['15'].links['10'].links, years['2017'].links['07'].links['15'].links['11'].links) as records from data where key='AAA')
Ref: https://orientdb.com/docs/2.2/Time-series-use-case.html
I'm not sure of what you want to achieve, but I think it's worth trying with values():
select expand(links['2017'].links['07'].links['15'].links['10','11'].links.values()) from data where key='AAA'

Postgresql full text search on really short documents (filename)

I have a database of filenames in which I'm trying to search using PGs full text search facility. I'm running the search query on a table of filenames, the problem is that the ranking functions are not ranking the results as I'd like them to do. For the sake of argument, let's assume the schema looks like this:
create table files (
id serial primary key,
filename text,
filename_ft tsvector
);
The query that I run looks something like this:
select filename, ts_rank(filename_ft, query)
from files, to_tsquery('simple', 'a|b|c') as query
where query ## name_ft
order by rank desc limit 5;
This will return the 5 results with the highest rank. However, those search queries are coming from another process, and in most cases the queries have some 'garbage' in them. For instance, a query for 'a xxxx' might be executed, where xxxxx is just a bunch of other terms. In most cases this still returns the correct results, because the suffix is simply not in the database.
However, sometimes a query contains some extraneous information that screws with the ranking function. For instance, a query for 'a b c' will return a filename containing the tokens 'b c' as first result, and an exact match on 'a' as second result, my guess this is due to the fact the the first result contains a larger percentage of the actual search tokens.
In most cases (if not all) the most important token appears as the first token in the query, so my question is, is there a way to give the tokens in the query a weight?
is there a way to give the tokens in the query a weight?
Yes, there is. See the documentation; search for "weight".
Whether assigning weights is the right choice is another matter. It sounds to me like you really want to exclude some of the data from the inputs to to_tsvector in index creation and searching, so you just don't include that garbage in the index.

what's the utility of array type?

I'm totally newbie with postgresql but I have a good experience with mysql. I was reading the documentation and I've discovered that postgresql has an array type. I'm quite confused since I can't understand in which context this type can be useful within a rdbms. Why would I have to choose this type instead of using a classical one to many relationship?
Thanks in advance.
I've used them to make working with trees (such as comment threads) easier. You can store the path from the tree's root to a single node in an array, each number in the array is the branch number for that node. Then, you can do things like this:
SELECT id, content
FROM nodes
WHERE tree = X
ORDER BY path -- The array is here.
PostgreSQL will compare arrays element by element in the natural fashion so ORDER BY path will dump the tree in a sensible linear display order; then, you check the length of path to figure out a node's depth and that gives you the indentation to get the rendering right.
The above approach gets you from the database to the rendered page with one pass through the data.
PostgreSQL also has geometric types, simple key/value types, and supports the construction of various other composite types.
Usually it is better to use traditional association tables but there's nothing wrong with having more tools in your toolbox.
One SO user is using it for what appears to be machine-aided translation. The comments to a follow-up question might be helpful in understanding his approach.
I've been using them successfully to aggregate recursive tree references using triggers.
For instance, suppose you've a tree of categories, and you want to find products in any of categories (1,2,3) or any of their subcategories.
One way to do it is to use an ugly with recursive statement. Doing so will output a plan stuffed with merge/hash joins on entire tables and an occasional materialize.
with recursive categories as (
select id
from categories
where id in (1,2,3)
union all
...
)
select products.*
from products
join product2category on...
join categories on ...
group by products.id, ...
order by ... limit 10;
Another is to pre-aggregate the needed data:
categories (
id int,
parents int[] -- (array_agg(parent_id) from parents) || id
)
products (
id int,
categories int[] -- array_agg(category_id) from product2category
)
index on categories using gin (parents)
index on products using gin (categories)
select products.*
from products
where categories && array(
select id from categories where parents && array[1,2,3]
)
order by ... limit 10;
One issue with the above approach is that row estimates for the && operator are junk. (The selectivity is a stub function that has yet to be written, and results in something like 1/200 rows irrespective of the values in your aggregates.) Put another way, you may very well end up with an index scan where a seq scan would be correct.
To work around it, I increased the statistics on the gin-indexed column and I periodically look into pg_stats to extract more appropriate stats. When a cursory look at those stats reveal that using && for the specified values will return an incorrect plan, I rewrite applicable occurrences of && with arrayoverlap() (the latter has a stub selectivity of 1/3), e.g.:
select products.*
from products
where arrayoverlap(cat_id, array(
select id from categories where arrayoverlap(parents, array[1,2,3])
))
order by ... limit 10;
(The same goes for the <# operator...)