JPA - insert into select - copy large amount of records

JPA - insert into select - copy large amount of records - jpa

I would like to copy records with diffrent key values. what is the best way to do so ?
In plain sql I would do:
insert into tableX values (x1,x2,x3,x4,x5) select 2,T1.x2,T1.x3,T1.x4,T1.x5 from tableX T1
(x1 is my primary key).
I tried writing this code inside the entity #NamedQuery, but i got org.eclipse.persistence.exceptions.JPQLException and after searching a way to write it i understand that this sql cannot be wriitten inside NamedQuery - is that correct?
I also tried looping through the object list representing tableX and for every object I did em.find() or created a new object and then inserted it with em.persist - but it seems to be an inefficient way. (when using find I do a select for each object , so if i have a list of 2000 records, it dosent make sense to create 2000 selectes and then insert with new key value).
So my question is what is the best way to implement copying all the records?
also if I got an exception, or something went wrong I would like to rollback so that I wont have only part of the records inside my database table.
Thanks In Advance.

You can use any SQL in JPA through a native query. SQL would be best for this type of insert.
If you need to do anything in Java on the data before inserting it, then you would query the objects, then insert them. Enable batch writing to improve efficiency.
http://java-persistence-performance.blogspot.com/2013/05/batch-writing-and-dynamic-vs.html

Related

Best performance method for getting records by large collection of IDs

I am writing a query with code to select all records from a table where a column value is contained in a CSV. I found a suggestion that the best way to do this was using ARRAY functionality in PostgresQL.
I have a table price_mapping and it has a primary key of id and a column customer_id of type bigint.
I want to return all records that have a customer ID in the array I will generate from csv.
I tried this:
select * from price_mapping
where ARRAY[customer_id] <# ARRAY[5,7,10]::bigint[]
(the 5,7,10 part would actually be a csv inserted by my app)
But I am not sure that is right. In application the array could contain 10's of thousands of IDs so want to make sure I am doing right with best performance method.
Is this the right way in PostgreSQL to retrieve large collection of records by pre-defined column value?
Thanks

Generally this is done with the SQL standard in operator.
select *
from price_mapping
where customer_id in (5,7,10)
I don't see any reason using ARRAY would be faster. It might be slower given it has to build arrays, though it might have been optimized.
In the past this was more optimal:
select *
from price_mapping
where customer_id = ANY(VALUES (5), (7), (10)
But new-ish versions of Postgres should optimize this for you.
Passing in tens of thousands of IDs might run up against a query size limit either in Postgres or your database driver, so you may wish to batch this a few thousand at a time.
As for the best performance, the answer is to not search for tens of thousands of IDs. Find something which relates them together, index that column, and search by that.

If your data is big enough, try this:
Read your CSV using a FDW (foreign data wrapper)
If you need this connection often, you might build a materialized view from it, holding only needed columns. Refresh this when new CSV is created.
Join your table against this foreign table or materialized viev.

Handling the output of jsonb_populate_record

I'm a real beginner when it comes to SQL and I'm currently trying to build a database using postgres. I have a lot of data I want to put into my database in JSON files, but I have trouble converting it into tables. The JSON is nested and contains many variables, but the behavior of jsonb_populate_record allows me to ignore the structure I don't want to deal with right now. So far I have:
CREATE TABLE raw (records JSONB);
COPY raw from 'home/myuser/mydocuments/mydata/data.txt'
create type jsonb_type as (time text, id numeric);
create table test as (
select jsonb_populate_record(null::jsonb_type, raw.records) from raw;
When running the select statement only (without the create table) the data looks great in the GUI I use (DBeaver). However it does not seem to be an actual table as I cannot run select statements like
select time from test;
or similar. The column in my table 'test' also is called 'jsonb_populate_record(jsonb_type)' in the GUI, so something seems to be going wrong there. I do not know how to fix it, I've read about people using lateral joins when using json_populate_record, but due to my limited SQL knowledge I can't understand or replicate what they are doing.

jsonb_populate_record() returns a single column (which is a record).
If you want to get multiple columns, you need to expand the record:
create table test
as
select (jsonb_populate_record(null::jsonb_type, raw.records)).*
from raw;
A "record" is a a data type (that's why you need create type to create one) but one that can contain multiple fields. So if you have a column in a table (or a result) that column in turn contains the fields of that record type. The * then expands the fields in that record.

How to set Ignore Duplicate Key in Postgresql while table creation itself

I am creating a table in Postgresql 9.5 where id is the primary key. While inserting rows in the table if anyone tries to insert duplicate id, i want it to get ignored instead of raising exception. Is there any way such that i can set this while table creation itself that duplicate entries get ignored.
There are many techniques to resolve duplicate insertion issue while writing insertion query i.e. using ON CONFLICT DO NOTHING, or using WHERE EXISTS clause etc. But i want to handle this at table creation end so that the person writing insertion query doesn't need to bother any.
Creating RULE is one of the possible solution. Are there other possible solutions? Maybe something like this:
`CREATE TABLE dbo.foo (bar int PRIMARY KEY WITH (FILLFACTOR=90, IGNORE_DUP_KEY = ON))`
Although exact this statement doesn't work on Postgresql 9.5 on my machine.

add a trigger before insert or rule on insert do instead - otherwise has to be handled by inserting query. both solutions will require more resources on each insert.
Alternative way to use function with arguments for insert, that will check for duplicates, so end users will use function instead of INSERT statement.
WHERE EXISTS sub-query is not atomic btw - so you can still have exception after check...
9.5 ON CONFLICT DO NOTHING is the best solution still

Insert data from staging table into multiple, related tables?

I'm working on an application that imports data from Access to SQL Server 2008. Currently, I'm using a stored procedure to import the data individually by record. I can't go with a bulk insert or anything like that because the data is inserted into two related tables...I have a bunch of fields that go into the Account table (first name, last name, etc.) and three fields that will each have a record in an Insurance table, linked back to the Account table by the auto-incrementing AccountID that's selected with SCOPE_IDENTITY in the stored procedure.
Performance isn't very good due to the number of round trips to the database from the application. For this and some other reasons I'm planning to instead use a staging table and import the data from there. Reading up on my options for approaching this, a cursor that executes the same insert stored procedure on the data in the staging table would make sense. However it appears that cursors are evil incarnate and should be avoided.
Is there any way to insert data into one table, retrieve the auto-generated IDs, then insert data for the same records into another table using the corresponding ID, in a set-based operation? Or is a cursor my only option here?

Look at the OUTPUT clause. You should be able to add it to your INSERT statement to do what you want.
BTW, if you need to output columns into the second table that weren't inserted into the first one, then use MERGE instead of INSERT (as suggested in the comment to the original question) as its OUTPUT clause supports referencing other columns from the source table(s). Otherwise, keeping it with an INSERT is more straightforward, and it does give you access to the inserted identity column.

I'm having experiment to worked out in inserting multiple record into related table using databinding. So, try this!
Hopefully this is very helpful. Follow this link How to insert record into related tables. for more information.

Most straightforward way to add a row to an SQL Server table in ADO.NET without hardcoded SQL?

I am wondering what the best / most efficient / common way is to add a row to an SQL Server table using C# and ADO.NET. I know of course that I can just create an SQL statement for that, but first, the destination table schema might vary, so I want to keep this flexible, and second, there are so much columns that I do not want to code and maintain this manually. So I currently use a SqlCommandBuilder that is automatically creating the proper insert statement for me, together with an SQLDataAdapter, like this:
var dataAdapter = new SqlDataAdapter("select * from sometable", _databaseConnection);
new SqlCommandBuilder(dataAdapter);
dataAdapter.Fill(dataTable);
// ... add row to dataTable, fill fields from some external file that
// ... includes column names as well,
//.... add some more field values not from the file, etc. ...
dataAdapter.Update(dataTable);
This seems pretty inefficient though to first grab all the records from the table even though I do not need them for anything (especially considering that there might even already be a million records in there). Using some select statement like select * from sometable where 1=2 would work, but it does not seem like a very clean approach. I imagine there is some different solution for this that I am just not aware of.
Thanks,
Timo

I think the best way to insert rows is by using Stored Procedures through the ADO.NET command object.
If you are inserting massive amounts of data and are using SQL Server 2008 you can pass DataTable objects to a stored procedure by using a User-Defined Table Types.
In SQL:
CREATE TYPE SAMPLE_TABLE_TYPE --
AS
field1 VARCHAR(255)
field2 VARCHAR(255)
CREATE STORED PROCEDURE insert_data
AS
#data Sample_TABLE_TYPE
BEGIN
INSERT INTO table1 (field1, field1)
SELECT username, password FROM #data;
In .NET:
DataTable myTable = new DataTable();
myTable.Columns.Add(new DataColumn("field1", typeof(string));
myTable.Columns.Add(new DataColumn("field1", typeof(string));
SqlCommand command = new SqlCommand(conn, CommandType.StoredProcedure);
command.Parameters.Add("#data", myTable);
command.ExecuteNonQuery();
If you data also contains updates you can use the new MERGE function used in SQL Server 2008 to efficiently perform both inserts and updates in the same procedure.
However, if creating User-Defined Table Types and creating stored procedures is too much work, and you need a complete dynamic solution I would stick with what you have, with the recommendation of using the
Where 1 = 0
appended to your SQL text.

You also can use "SELECT TOP(0) * FROM SOMETABLE;" query.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

JPA - insert into select - copy large amount of records - jpa

Related

Best performance method for getting records by large collection of IDs

Handling the output of jsonb_populate_record

How to set Ignore Duplicate Key in Postgresql while table creation itself

Insert data from staging table into multiple, related tables?

Most straightforward way to add a row to an SQL Server table in ADO.NET without hardcoded SQL?

Categories

Resources