I am sending telemetry data to a SQL table made of different columns, and one of this columns receive a varchar that is actually a json array, such as :
'[{"data1": 5,"data2": 12, "data3": 2},{"data1": 7,"data2": 8, "data3": 1},{"data1": 4,"data2": 2, "data3": 11}]'
The length of this array can change, but each element of the array is made of all three keys/values.
I want to integrate this data into a dedicated table, whose columns are :
(LandingTableID, Data1, Data2, Data3)
I have developped a trigger on each insert of my landing table to perform so, and it works fine. However, I am looking for the best solution to integrate the existing 5 millions rows that are already in my landing table.
What would be the best way to proceed here?
As I am using Azure SQL, I cannot use SQL agents (but I can use Azure functions instead).
Just picking up on Conor's suggestion, you could adapt your trigger code that shreds the JSON to shred the 5 million rows into a temp table. Use SELECT INTO to create the temp table (#) on the fly and use the IDENTITY function to add a row identifier to the temp table. This will allow you to batch up the final INSERT to the main table if you think it's necessary.
Then INSERT the records from the temp table which are now in record format, not JSON, directly into your main table. This should only take a few seconds, depending on what other processes are running, what tier your database is set at etc. This way you are splitting out the shredding of the JSON into two distinct operations reducing the load on each one, and making use of tempdb for temporary storage. Something like this:
-- Take the historical JSON and shred it into a temp table as records
SELECT
IDENTITY( INT, 1, 1 ) AS rowId,
t.rowId originalRowId,
j.[key],
JSON_VALUE ( j.[value], '$.data1' ) AS [data1],
JSON_VALUE ( j.[value], '$.data2' ) AS [data2],
JSON_VALUE ( j.[value], '$.data3' ) AS [data3]
INTO #tmp
FROM yourLandingTable t
CROSS APPLY OPENJSON ( yourJSON ) j
-- Now insert the records into the main table in batches
INSERT INTO yourMainTable ( data1, data2, data3 )
SELECT data1, data2, data3
FROM #tmp
WHERE rowId Between 1 And 1000000
INSERT INTO yourMainTable ( data1, data2, data3 )
SELECT data1, data2, data3
FROM #tmp
WHERE rowId Between 1 And 2000000
etc ...
INSERT INTO yourMainTable ( data1, data2, data3 )
SELECT data1, data2, data3
FROM #tmp
WHERE rowId > 5000000
I probably would have designed this as having Data Lake as the landing zone for the data followed by some processing using an Azure Function for example to turn the JSON into a tabular format and then do the final loads into your SQL tables.
You could orchestrate with Data Factory
Those data volumes on a trigger may suffer performance issues.
You can use OPENJSON to crack open the JSON array and turn it into rows. There is an example in this doc page that shows the pattern you can use:
https://learn.microsoft.com/en-us/sql/t-sql/functions/openjson-transact-sql?view=sql-server-ver15
Related
I am trying to add the same data for a row into my table x number of times in postgresql. Is there a way of doing that without manually entering the same values x number of times? I am looking for the equivalent of the go[count] in sql for postgres...if that exists.
Use the function generate_series(), e.g.:
insert into my_table
select id, 'alfa', 'beta'
from generate_series(1,4) as id;
Test it in db<>fiddle.
Idea
Produce a resultset of a given size and cross join it with the record that you want to insert x times. What would still be missing is the generation of proper PK values. A specific suggestion would require more details on the data model.
Query
The sample query below presupposes that your PK values are autogenerated.
CREATE TABLE test ( id SERIAL, a VARCHAR(10), b VARCHAR(10) );
INSERT INTO test (a, b)
WITH RECURSIVE Numbers(i) AS (
SELECT 1
UNION ALL
SELECT i + 1
FROM Numbers
WHERE i < 5 -- This is the value `x`
)
SELECT adhoc.*
FROM Numbers n
CROSS JOIN ( -- This is the single record to be inserted multiple times
SELECT 'value_a' a
, 'value_b' b
) adhoc
;
See it in action in this db fiddle.
Note / Reference
The solution is adopted from here with minor modifications (there are a host of other solutions to generate x consecutive numbers with SQL hierachical / recursive queries, so the choice of reference is somewhat arbitrary).
I have an application that stores a single XML record broken up into 3 separate rows, I'm assuming due to length limits. The first two rows each max out the storage at 4000 characters and unfortunately doesn't break at the same place for each record.
I'm trying to find a way to combine the three rows into a complete XML record that I can then extract data from.
I've tried concatenating the rows but can't find a data type or anything else that will let me pull the three rows into a single readable XML record.
I have several limitations I'm up against as we have select only access to the DB and I'm stuck using just SQL as I don't have enough access to implement any kind of external program to pull the data that is there an manipulate it using something else.
Any ideas would be very appreciated.
Without sample data, and desired results, we can only offer a possible approach.
Since you are on 2017, you have access to string_agg()
Here I am using ID as the proper sequence.
I should add that try_convert() will return a NULL if the conversion to XML fails.
Example
Declare #YourTable table (ID int,SomeCol varchar(4000))
Insert Into #YourTable values
(1,'<root><name>XYZ Co')
,(2,'mpany</')
,(3,'name></root>')
Select try_convert(xml,string_agg(SomeCol,'') within group (order by ID) )
From #YourTable
Returns
<root>
<name>XYZ Company</name>
</root>
EDIT 2014 Option
Select try_convert(xml,(Select '' + SomeCol
From #YourTable
Order By ID
For XML Path(''), TYPE).value('.', 'varchar(max)')
)
Or Even
Declare #S varchar(max) = ''
Select #S=#S+SomeCol
From #YourTable
Order By ID
Select try_convert(xml,#S)
We have a query in which a list of parameter values is provided in "IN" clause of the query. Some time back this query failed to execute as the size of data in "IN" clause got quite large and hence the resulting query exceeded the 16 MB limit of the query in REDSHIFT. As a result of which we then tried processing the data in batches so as to limit the data and not breach the 16 MB limit.
My question is what are the factors/pitfalls to keep in mind while supplying such large data for the "IN" clause of a query or is there any alternative way in which I can deal with such large data for the "IN" clause?
If you have control over how you are generating your code, you could split it up as follows
first code to be submitted, drop and recreate filter table:
drop table if exists myfilter;
create table myfilter (filter_text varchar(max));
Second step is to populate the filter table in parts of a suitable size, e.g. 1000 values at a time
insert into myfilter
values({{myvalue1}},{{myvalue2}},{{myvalue3}} etc etc up to 1000 values );
repeat the above step multiple times until you have all of your values inserted
Then, use that filter table as follows
select * from master_table
where some_value in (select filter_text from myfilter);
drop table myfilter;
Large IN is not the best practice itself, it's better to use joins for large lists:
construct a virtual table a subquery
join your target table to the virtual table
like this
with
your_list as (
select 'first_value' as search_value
union select 'second_value'
...
)
select ...
from target_table t1
join your_list t2
on t1.col=t2.search_value
I have two tables, stuff and nonsense.
create table stuff(
id serial primary key,
details varchar,
data varchar,
more varchar
);
create table nonsense (
id serial primary key,
data varchar,
more varchar
);
insert into stuff(details) values
('one'),('two'),('three'),('four'),('five'),('six');
insert into nonsense(data,more) values
('apple','accordion'),('banana','banjo'),('cherry','cor anglais');
See http://sqlfiddle.com/#!17/313fb/1
I would like to copy random values from nonsense to stuff. I can do this for a single value using the answer to my previous question: SQL Server Copy Random data from one table to another:
update stuff
set data=(select data from nonsense where stuff.id=stuff.id
order by random() limit 1);
However, I would like to copy more than one value (data and more) from the same row, and the sub query won’t let me do that, of course.
I Microsoft SQL, I can use the following:
update stuff
set data=sq.town,more=sq.state
from stuff s outer apply
(select top 1 * from nonsense where s.id=s.id order by newid()) sq
I have read that PostGresql uses something like LEFT JOIN LATERAL instead of OUTER APPPLY, but simply substituting doesn’t work for me.
How can I update with multiple values from a random row of another table?
As of Postgres 9.5, you can assign multiple columns from a subquery:
update stuff
set (data, more) = (
select data, more
from nonsense
where stuff.id=stuff.id
order by random()
limit 1
);
I'm trying to get the whole result of a query into a variable, so I can loop through it and make inserts.
I don't know if it's possible.
I'm new to postgre and procedures, any help will be very welcome.
Something like:
declare result (I don't know what kind of data type I should use to get a query);
select into result label, number, desc from data
Thanks in advance!
I think you have to read PostgreSQL documentation about cursors.
But if you want just insert data from one table to another, you can do:
insert into data2 (label, number, desc)
select label, number, desc
from data
if you want to "save" data from query, you also can use temporary table, which you can create by usual create table or create table as:
create temporary table temp_data as
(
select label, number, desc
from data
)
see documentation