I need to add index to a column in a very large table.
To avoid locking the table, I tried to use CREATE INDEX CONCURRENTLY.
During the testing on a small dataset, I found that will lead the Play framework schema evolution to hang.After I removed the CONCURRENTLY, schema evolution can finish quickly.
Versions:
"com.typesafe.play" %% "play" % "2.6.15",
"com.typesafe.play" %% "play-java" % "2.6.15",
Related
I am trying to use some pg_dump generated migration scripts with Flyway. The first migration script is for schema only. The other migration scripts load seed data into various tables using the Postgres COPY command. These seed-data scripts are going to exist as Flyway repeatable migration scripts. This setup presents two issues.
When Flyway loads the seed data from the migration scripts, I'm getting foreign key constraint violations since I don't have the various tables being seeded in the correct order. There are a large number of tables to deal with, so is there an easy way to work around this so that I don't have to try to reorder my COPY's?
Since the seed data is going to be in repeatable migration scripts, these need to be idempotent. Is there a way to do this with the Postgres COPY command? I'm trying to avoid having to convert this to INSERTs since it will hurt performance and also make my migrations files huge.
The trick here for idempotency is to delete the data from the files in the correct dependency order, and when you've done that, to likewise load the data in the correct dependency order. The correct dependency order for deleting the data is worked out by obtaining the target tables for every foreign key constraint and ensuring that no data from a table is ever deleted when it is the target of a table whose data is yet to be deleted. This list of tables in dependency order is usually called a 'manifest' and is required also for CREATE statements and for the PgSQL COPY. The Public domain PowerShell-based Flyway Teamworks framework will create the manifest for you.
The documentation for migrating to PostgreSQL 12.6 says
Concurrent insertions could lead to a corrupt index with entries placed in the wrong pages. It's recommended to reindex any GiST index that's been subject to concurrent insertions.
I am hoping for some clarity on the definition of a "concurrent insertion." Does it simply refer to the case when two transactions are attempting to update the same table?
I am trying to pull data from Oracle RDBMS and move it to OrientDB using teleporter. My relational database have multiple columns and have E-R relationships maintained. I have two questions :
My objective is to get only few columns ( that holds unique identity and foreign key relations ) and not all bulky column data. Is there any configuration using which I could do so. Today include and exclude only works at full DB table level.
Another objective is to keep my graph db sync with these selected table-column data which I pushed in previous run. Additional data which comes to RDBMS I would want in my graph db too.
You can enjoy this feature, and more others, in orientdb 3.0 through a JSON configuration, but there is not any documentation about it yet. Currently in 2.2.x you can just configure relationships and edges as described here:
http://orientdb.com/docs/2.2.x/Teleporter-Import-Configuration.html
In the next 2 weeks all these features will be available also in 2.2.x and well documented in order to make the comprehension of the config very easy.
At the moment you can adopt the following workaround:
import all the columns for each table in the correspondent vertex as usual.
drop the properties you are not interested in after each sync. You could write down a script where you call the teleporter execution and then delete the properties you don't care about from the schema.
I will update here when the alignment with 3.0 and the doc will be complete.
We have a Spring Boot project that uses Spring-JPA for data access. We have a couple of tables where we create/update rows once (or a few times, all within minutes). We don't update rows that are older than a day. These tables (like audit table) can get very large and we want to use Postgres' table partitioning features to help break up the data by month. So the main table always has this calendar month's data but if the query requires retrieval from previous months it would somehow read it from other partitions.
Two questions:
1) Is this a good idea for archiving older data but still leave it query-able?
2) Does Spring-JPA work with partitioned tables? Or do we have to figure out how to break up the query and do native queries and concatenate the restult set?
Thanks.
I am working with postgres partitioning with Hibernate & Spring JPA for a period of time. So I think, I can try to answer your questions.
1) Is this a good idea for archiving older data but still leave it query-able?
If you are applying indexes and not re-indexing table frequently, then partitioning of data may result faster query results.
Also you can use clustered index feature in postgres as well to fetch the data faster.
Because table with older data will not going to be updated, so clustered index will improve the performance efficiently.
2) Does Spring-JPA work with partitioned tables? Or do we have to figure out how to break up the query and do native queries and concatenate the restult set?
Spring JPA will work out of the box with partitioned table. It will retrieve the data from master as well as child tables and returns the concatenated result set.
Note : Issue with partitioned table
The only issue you will face with partitioned table is insertion in partitioned table.
Let me explain, when you partition a table, you will create a trigger over master table, and that trigger will return null. This is the key behind insertion issue in partitioned table using Spring JPA / Hibernate.
When you try to insert a row using Spring JPA or Hibernate you will face below issue
Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
To overcome this issue you need to override implementation of Batching batcher.
In hibernate you can provide the custom implementation of batcher factory using below configuration
hibernate.jdbc.factory_class=path.to.my.batcher.factory.implementation
In spring JPA you can achieve the same by custom implementation of batch builder using below configuration
hibernate.jdbc.batch.builder=path.to.my.batch.builder.implementation
References :
Custom Batch Builder/Batch in Spring-JPA
Demo Application
In addition to the #Anil Agrawal answer.
If you are using spring boot 2 then you need to define the customBatcher using the property.
spring.jpa.properties.hibernate.jdbc.batch.builder=net.xyz.jdbc.CustomBatchBuilder
You do not have to break down the JDBC query with postgres 11+.
If you execute select on the main table with plain jdbc, the DB would return the aggregated results from the partitioned tables.
In other words, the work is done by the Postgres DB, so Spring JPA will simply get the result and map it to objects as if there were no partitioning.
For having inserts work in a partitioned table you need to make sure that your partitions are already created, i think spring data will not create them for you.
This may be a dumb question but I'm new to Play! & Slick. While using Slick's table.ddl.create I noticed that it doesn't create an evolution but the application still works.
Does this replace evolutions? Can I use it in production? Should I?
Thanks in advance.
Both Slick and the Slick DDL Plugin can only generate code to create or delete your schema, not to evolve it. So you still need Play evolutions or something similar to modify an existing schema along the way. In the Slick team, we are working towards a migration solution (on a lower priority). Many parts are already there, but haven't been integrated properly yet. There are #nafg's schema manipulation DSL: https://github.com/nafg/slick-migration-api and my one year old prototype for a database version management tool: https://github.com/cvogt/migrations/ . The code generation part of the latter has already made it into Slick 2.0. Properly integrating all of these will give us a comprehensive solution for type-safe database migration scripts.
Slick is able to generate DDL for your defined tables it does not contain logic that does what evolutions does.
The play slick plugin on the other hand contains a SlickDDLPlugin that will generate and run DDL evolutions for you when you run your app in non prod-mode (with play run for example) It also dumps out those evolutions in your conf/evolutions directory.
The sources that handles evolutions:
https://github.com/freekh/play-slick/blob/master/src/main/scala/play/api/db/slick/plugin/SlickPlugin.scala