Can table be used as a stream in PipelineDB

Can table be used as a stream in PipelineDB - pipelinedb

let's assume we have a table test_table like this
create table test_table(x integer);
Is it possible to create a continuous view from this table? Something like this:
create continuous view test_view as select sum(x) as x_sum from test_table;
When I run the above command I get the error:
test=# create continuous view test_view as select sum(x) as x_sum from test_table;
ERROR: continuous queries must include a stream in the FROM clause
LINE 1: ...ous view test_view as select sum(x) as x_sum from test_table...
^
HINT: To include a relation in a continuous query, JOIN it with a stream.
This is the documentation:
Here’s the syntax for creating a continuous view:
CREATE CONTINUOUS VIEW name AS query
where query is a subset of a PostgreSQL SELECT statement:
SELECT [ DISTINCT [ ON ( expression [, ...] ) ] ]
expression [ [ AS ] output_name ] [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY expression [, ...] ]
[ HAVING condition [, ...] ]
[ WINDOW window_name AS ( window_definition ) [, ...] ]
where from_item can be one of:
stream_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
table_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
from_item [ NATURAL ] join_type from_item [ ON join_condition ]
According to this from_item could also be a table. Is the documentation wrong? If it is not possible to create a continuous view from the table is there a way to load current data from table to some stream.

Jeff from PipelineDB here.
Is there a reason you would want to try and create a continuous view off of a regular table like this? Why not just create a regular view, or materialized view?
PipelineDB is designed to continuously analyze infinite streams of raw data so that data doesn't need to be stored as regular tables and then processed in an ad hoc fashion, so this use case is exactly opposite of PipelineDB's intended purpose.

To Jeff, materialized views may be outdated and regular view will not give any performance benefits.
So there are reasons to create CONTINUOUS VIEW from table and according to documentation this should be possible.
So suppose this can be considered as a bug.

Related

Map Nested Array to Sink Columns

I'm currently working with a survey API to retrieve results and store them in our data warehouse (SQL database). The results are returned as a JSON object, which includes array ("submissions"), containing each individual's responses. An individual submission contains an array ("answers") with each answer to the questions in the survey.
I would like each submission to be one row in one table.
I will provide some very simple data examples and am just looking for a general way to approach this problem. I certainly am not looking for an entire solution.
The API returns a response like this:
{
"surveyName": "Sample Survey",
"count": 2,
"submissions": [
{
"id": 1,
"created": "2021-01-01T12:00:00.000Z",
"answers": [
{
"question_id": 1,
"answer": "Yes"
},
{
"question_id": 2,
"answer": 5
}
],
},
{
"id": 2,
"created": "2021-01-02T12:00:00.000Z",
"answers": [
{
"question_id": 1,
"answer": "No"
},
{
"question_id": 2,
"answer": 4
}
],
}
]
}
Essentially, I want to add a row into a SQL table where the columns are: id, created, answer1, answer2. Within the Sink tab of the Copy Data activity, I cannot figure out how to essentially say, "If question_id = 1, map the answer to column answer1. If question_id = 2, map the answer to column answer2."
Will I likely have to use a Data Flow to handle this sort of mapping? If so, can you think of the general steps included in that type of flow?

For those looking for a similar solution, I'll post the general idea on how I solved this problem, thanks to the suggestion from #mark-kromer-msft.
First of all, the portion of my pipeline where I obtained the JSON files is not included. For that, I had to use an Until loop to paginate through this particular endpoint in order to obtain all submission results. I used a Copy Data activity to create JSON files in blob storage for each page. After that, I created a Data Flow.
I had to first flatten the "submissions" array in order to separate each submission into a separate row. I then used Derived Column to pull out each answer to a separate column. Here's what that looks like:
Here's one example of an Expression:
find(submissions.answers, equals(#item.question_id, '1')).answer
Finally, I just had to create the mapping in the last step (Sink) in order to map my derived columns.

An alternate approach would be to use the native JSON abilites of Azure SQL DB. Use a Stored Proc task, pass the JSON in as a parameter and shred it in the database using OPENJSON:
-- Submission level
-- INSERT INTO yourTable ( ...
SELECT
s.surveyName,
s.xcount,
s.submissions
FROM OPENJSON( #json )
WITH (
surveyName VARCHAR(50) '$.surveyName',
xcount INT '$.count',
submissions NVARCHAR(MAX) AS JSON
) s
CROSS APPLY OPENJSON( s.submissions ) so;
-- Question level, additional CROSS APPLY and JSON_VALUEs required
-- INSERT INTO yourTable ( ...
SELECT
'b' s,
s.surveyName,
s.xcount,
--s.submissions,
JSON_VALUE ( so.[value], '$.id' ) AS id,
JSON_VALUE ( so.[value], '$.created' ) AS created,
JSON_VALUE ( a.[value], '$.question_id' ) AS question_id,
JSON_VALUE ( a.[value], '$.answer' ) AS answer
FROM OPENJSON( #json )
WITH (
surveyName VARCHAR(50) '$.surveyName',
xcount INT '$.count',
submissions NVARCHAR(MAX) AS JSON
) s
CROSS APPLY OPENJSON( s.submissions ) so
CROSS APPLY OPENJSON( so.[value], '$.answers' ) a;
Results at submission and question level:
Full script with sample JSON here.

How to filter data from postgresql which has jsonb nested field in the array field of jsonb?

i have a table with a jsonb column and documents are like these(simplified)
{
"a": 1,
"rg": [
{
"rti": 2
}
]
}
I want to filter all the rows which has 'rg' field and there is at least one 'rti'field in the array.
My current solution is
log->>'rg' ilike '%rti%'
Is there another approach, probably a faster solution exists.

Another approach would be applying jsonb_each to the jsonb object and then jsonb_array_elements_text to the extracted value from jsonb_each method :
select id, js_value2
from
(
select (js).value as js_value, jsonb_array_elements_text((js).value) as js_value2,id
from
(
select jsonb_each(log) as js, id
from tab
) q
where (js).key = 'rg'
) q2
where js_value2 like '%rti%';
Demo

Json_query check which rows have a special value in their json list

I have a table that each row contains a json column. Inside the json column I have an object containing an array of tags. What I want to do is to see which rows in my table have the tag that I am searching for.
Here is an example of my data:
Row 1:
Id :xxx
Jsom Column:
{
"tags":[
{"name":"blue dragon", weight:0.80},
{"name":"Game", weight:0.90}
]
}
Row 2:
Id : yyy
Jsom Column:
{
"tags":[
{"name":"Green dragon", weight:0.70},
{"name":"fantasy", weight:0.80}
]
}
So I want to write a code that if I search for Green, it returns only row 2 and if I search for dragon it returns both row 1 and 2. How can I do that?
I know I can write this to access my array, but more than that I am clueless :\
I am looking for something like this
Select * from myTable
where JSON_query([JsonColumn], '$.tags[*].name') like '%dragon%'
update
My final query looking like this
select DISTINCT t.id, dv.[key], dv.value
from #t t
cross apply openjson(doc,'$.tags') as d
where json_Value( d.value,'$.name') like '%dragon%'

Something like this:
declare #t table(id int, doc nvarchar(max))
insert into #t(id,doc) values
(1,'
{
"tags":[
{"name":"blue dragon", "weight":"0.80"},
{"name":"Game", "weight":"0.90"}
]
}'),(2,'
{
"tags":[
{"name":"Green dragon", "weight":"0.70"},
{"name":"fantasy", "weight":"0.80"}
]
}')
select t.id, dv.[key], dv.value
from #t t
cross apply openjson(doc,'$.tags') as d
cross apply openjson(d.value) dv
where dv.value like '%dragon%'

postgres - syntax for updating a jsonb array

I'm struggling to find the right syntax for updating an array in a jsonb column in postgres 9.6.6
Given a column "comments", with this example:
[
{
"Comment": "A",
"LastModified": "1527579949"
},
{
"Comment": "B",
"LastModified": "1528579949"
},
{
"Comment": "C",
"LastModified": "1529579949"
}
]
If I wanted to append Z to each comment (giving AZ, BZ, CZ).
I know I need to use something like jsonb_set(comments, '{"Comment"}',
Any hints on finishing this off?
Thanks.

Try:
UPDATE elbat
SET comments = array_to_json(ARRAY(SELECT jsonb_set(x.original_comment,
'{Comment}',
concat('"',
x.original_comment->>'Comment',
'Z"')::jsonb)
FROM (SELECT jsonb_array_elements(elbat.comments) original_comment) x))::jsonb;
It uses jsonb_array_elements() to get the array elements as set, applies the changes on them using jsonb_set(), transforms this to an array and back to json with array_to_json().
But that's an awful lot of work. OK, maybe there is a more elegant solution, that I didn't find. But since your JSON seems to have a fixed schema anyway, I'd recommend a redesign to do it the relational way and have a simple table for the comments plus a linking table for the objects the comment is on. The change would have been very, very easy in such a model for sure.

Find a query returning the expected result:
select jsonb_agg(value || jsonb_build_object('Comment', value->>'Comment' || 'Z'))
from my_table
cross join jsonb_array_elements(comments);
jsonb_agg
-----------------------------------------------------------------------------------------------------------------------------------------------------
[{"Comment": "AZ", "LastModified": "1527579949"}, {"Comment": "BZ", "LastModified": "1528579949"}, {"Comment": "CZ", "LastModified": "1529579949"}]
(1 row)
Create a simple SQL function based of the above query:
create or replace function update_comments(jsonb)
returns jsonb language sql as $$
select jsonb_agg(value || jsonb_build_object('Comment', value->>'Comment' || 'Z'))
from jsonb_array_elements($1)
$$;
Use the function:
update my_table
set comments = update_comments(comments);
DbFiddle.

Most efficient way to do a bulk UPDATE with pairs of input

Suppose I want to do a bulk update, setting a=b for a collection of a values. This can easily be done with a sequence of UPDATE queries:
UPDATE foo SET value='foo' WHERE id=1
UPDATE foo SET value='bar' WHERE id=2
UPDATE foo SET value='baz' WHERE id=3
But now I suppose I want to do this in bulk. I have a two dimensional array containing the ids and new values:
[ [ 1, 'foo' ]
[ 2, 'bar' ]
[ 3, 'baz' ] ]
Is there an efficient way to do these three UPDATEs in a single SQL query?
Some solutions I have considered:
A temporary table
CREATE TABLE temp ...;
INSERT INTO temp (id,value) VALUES (....);
UPDATE foo USING temp ...
But this really just moves the problem. Although it may be easier (or at least less ugly) to do a bulk INSERT, there are still a minimum of three queries.
Denormalize the input by passing the data pairs as SQL arrays. This makes the query incredibly ugly, though
UPDATE foo
USING (
SELECT
split_part(x,',',1)::INT AS id,
split_part(x,',',2)::VARCHAR AS value
FROM (
SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz']) AS x
) AS x;
)
SET value=x.value WHERE id=x.id
This makes it possible to use a single query, but makes that query ugly, and inefficient (especially for mixed and/or complex data types).
Is there a better solution? Or should I resort to multiple UPDATE queries?

Normally you want to batch-update from a table with sufficient index to make the merge easy:
CREATE TEMP TABLE updates_table
( id integer not null primary key
, val varchar
);
INSERT into updates_table(id, val) VALUES
( 1, 'foo' ) ,( 2, 'bar' ) ,( 3, 'baz' )
;
UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
;
So you should probably populate your update_table by something like:
INSERT into updates_table(id, val)
SELECT
split_part(x,',',1)::INT AS id,
split_part(x,',',2)::VARCHAR AS value
FROM (
SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz'])
) AS x
;
Remember: an index (or the primary key) on the id field in the updates_table is important. (but for small sets like this one, a hashjoin will probably by chosen by the optimiser)
In addition: for updates, it is important to avoid updates with the same value, these cause extra rowversions to be created + plus the resulting VACUUM activity after the update was committed:
UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
AND (t.value IS NULL OR t.value <> u.value)
;

You can use CASE conditional expression:
UPDATE foo
SET "value" = CASE id
WHEN 1 THEN 'foo'
WHEN 2 THEN 'bar'
WHEN 3 THEN 'baz'
END

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Can table be used as a stream in PipelineDB - pipelinedb

To Jeff, materialized views may be outdated and regular view will not give any performance benefits. So there are reasons to create CONTINUOUS VIEW from table and according to documentation this should be possible. So suppose this can be considered as a bug.

Related

Map Nested Array to Sink Columns

How to filter data from postgresql which has jsonb nested field in the array field of jsonb?

Json_query check which rows have a special value in their json list

postgres - syntax for updating a jsonb array

Most efficient way to do a bulk UPDATE with pairs of input

Categories

Resources