Import and merge rails database tables - postgresql

I have 3 separate ruby on rails applications running PostgreSQL databases, all with the same tables (and columns) but different values.
For example :
app 1 TABLE
name surname postcode
----------
tom smith so211ux
app 2 TABLE
name surname postcode
----------
mark smith so2ddx
app 3 TABLE
name surname postcode
----------
james roberts F2D1ux
I am looking to export/dump/download from two of the databases and import into one consolidated database/app.
If someone could point me in the right direction/reading for this type of query I would be most grateful.

Please use the foreign data wrappers (fdw), where you can access the data from external postgresql servers. Please read more here - https://www.postgresql.org/docs/9.3/static/postgres-fdw.html

Related

Moving data between PostgreSQL databases respecting conflicting keys

Situation
I have a 2 databases which were in one time direct copies of each other but now they contain new different data.
What do I want to do
I want to move data from database "SOURCE" to database "TARGET" but the problem is that the tables use auto-incremented keys, and since both databases are used at the same time, a lot of the IDs are already taken up in TARGET so I cannot just identity insert the data coming from SOURCE.
But in theory we could just not use identity insert at all and let the database take care of assigning new IDs.
What makes it harder is that we have like 50 tables where each of them is connected by foreign keys. Clearly the foreign keys will also have to be changed else they will no longer reference the correct thing.
Let's see a very simplified example:
table Human {
id integer NOT NULL PK AutoIncremented
name varchar NOT NULL
parentId integer NULL FK -> Human.id
}
table Pet {
id integer NOT NULL PK AutoIncremented
name varchar NOT NULL
ownerId integer NOT NULL FK -> Human.id
}
SOURCE Human
Id name parentId
==========================
1 Aron null
2 Bert 1
3 Anna 2
SOURCE Pet
Id name ownerId
==========================
1 Frankie 1
2 Doggo 2
TARGET Human
Id name parentId
==========================
1 Armin null
2 Cecil 1
TARGET Pet
Id name ownerId
==========================
1 Gatto 2
Let's say I want to move Aron, Bert, Anna, Frankie and Doggo to the TARGET database.
But if we directly try to insert them with not caring about original ids, the foreign keys will be garbled:
TARGET Human
Id name parentId
==========================
1 Armin null
2 Cecil 1
3 Aron null
4 Bert 1
5 Anna 2
TARGET Pet
Id name ownerId
==========================
1 Gatto 2
2 Frankie 1
3 Doggo 2
The father of Anna is Cecil and the Owner of Doggo is Cecil also instead of Bert. The parent of Bert is Armin instead of Aron.
How I want it to look is:
TARGET Human
Id name parentId
==========================
1 Armin null
2 Cecil 1
3 Aron null
4 Bert 3
5 Anna 4
TARGET Pet
Id name ownerId
==========================
1 Gatto 2
2 Frankie 3
3 Doggo 4
Imagine having like 50 similar tables with 1000 of lines, so we will have to automate the solution.
Questions
Is there a specific tool I can utilize?
Is there some simple SQL logic to precisely do that?
Do I need to roll my own software to do this (e.g. a service that connects to both databases, read everything in EF with including all relations, and save it to the other DB)? I fear that there are too many gotchas and it is time consuming.
Is there a specific tool? Not as far as I know.
Is there some simple SQL? Not exactly simple but not all that complex either.
Do you need to roll own? Maybe, depends on if you think you use the SQL (balow).
I would guess there is no direct path, the problem being as you note, getting the FK values reassigned. The following adds a column to all the tables, which can be used to span the across the tables. For this I would use a uuid. Then with that you can copy from one table set to the other except for the FK. After copying you can join on the uuid to complete the FKs.
-- establish a reference field unique across databases.
alter table target_human add sync_id uuid default gen_random_uuid ();
alter table target_pet add sync_id uuid default gen_random_uuid ();
alter table source_human add sync_id uuid default gen_random_uuid ();
alter table source_pet add sync_id uuid default gen_random_uuid ();
--- copy table 2 to table 1 except parent_id
insert into target_human(name,sync_id)
select name, sync_id
from source_human;
-- update parent id in table to prior parent in table 2 reasigning parent
with conv (sync_parent, sync_child, new_parent) as
( select h2p.sync_id sync_parent, h2c.sync_id sync_child, h1.id new_parent
from source_human h2c
join source_human h2p on h2c.parentid = h2p.id
join target_human h1 on h1.sync_id = h2p.sync_id
)
update target_human h1
set parentid = c.new_parent
from conv c
where h1.sync_id = c.sync_child;
-----------------------------------------------------------------------------------------------
alter table target_pet alter column ownerId drop not null;
insert into target_pet(name, sync_id)
select name, sync_id
from source_pet ;
with conv ( sync_pet,new_owner) as
( select p2.sync_id, h1.id
from source_pet p2
join source_human h2 on p2.ownerid = h2.id
join target_human h1 on h2.sync_id = h1.sync_id
)
update target_pet p1
set ownerid = c.new_owner
from conv c
where p1.sync_id = c.sync_pet;
alter table target_pet alter column ownerId set not null;
See demo. You now reverse the source and target table definitions to complete the other side of the sync. You can then drop the uuid columns if so desired. But you may want to keep them. If you have gotten them out of sync, you will do so again. You could even go a step further and make the UUID your PK/FK and then just copy the data, the keys will remain correct, but that might involve updating the apps to the revised DB structure. This does not address communication across databases, but I assume you already have that handled. You will need to repeat for each set, perhaps you can write a script to generate them. Further, I would guess there are fewer gotchas and less time consuming than rolling your own. This is basically 5 queries per table set you have, but to clean-up the current mess, 500 queries is not that much;

Relational table to Dynamodb

I worked with relational databases for a long time, and now I am going to work with DynamoDB. After, working with relational databases, I am struggling to design some of our current SQL tables in DynamoDB. Especially, deciding about partition and sort keys. I will try to explain with an example:
Current Tables:
Student: StudentId(PK), Email, First name, Last name, Password, SchoolId(FK)
School: SchoolId(PK), Name, Description
I was thinking to merge these tables in DynamoDB and use SchoolId as the Partition Key, StudentId as the sort key. However, I saw some similar examples use StudentId as the Partition Key.
And then I realized, we use "username" in each login functionality, so the application will query with "username"(sometimes with a password, or auth token) a lot. This situation makes me think about; SchoolId as the Partition Key and Username as the sort key.
I need some ideas about what would be the best practice in that case and some suggestions to give me a better understanding of NoSQL and DynamoDb concepts.
In NoSql you should try to list down all your use cases first and then try to model the table schema.
Below are the use-cases that I see in your application
Get user info for one user with userId (password, age, name,...)
Get School info for user with userId (className, schoolName)
Get All the student in one school.
Get All the student in one class of one school.
Based on these given access pattern this is how I would have designed the schema
| pk | sk | GSI1 PK | GSI1 SK |
| 12345 | metadata | | | Age:13 | Last name: Singh | Name:Rohan | ...
| 12345 | schoolMeta | SchoolName: DPS | DPS#class5 | className:5 |
With the above schema you can solve the identified use cases as
Get user info for one user with userId
Select * where pk=userId and sk=metadata
Get school info for user with userId
Select * where pk=userId and sk=schoolMeta
Get All the student in one school.
Select * where pk=SchoolId from table=GSI1
Get All the student in one class.
Select * where pk=SchoolId and sk startswith SchoolId#className from table=GSI1
But the given schema suffers from the drawback that
If you want to change school name you will have to update too many rows.

entity framework not recognising data type / column format from SQLite

Scene:
SQLite3 database as external source
A number of views exist in the db
Entity Framework
edmx file doesn't seem to recognise the data type of the columns in the view even when CAST
Example:
In the SQLite db I have:
CREATE TABLE [dr]
([ID] INTEGER PRIMARY KEY AUTOINCREMENT,
[CoalNum] INT,
[InputTime] DATE)
-------------
CREATE VIEW "J13" AS
SELECT
dr.ID,
date(dr.InputTime) as InputTime,
CAST (count(*) AS INT) as CoalNum
FROM dr
WHERE dr.InputTime >= "2010-01-13"
GROUP BY dr.ID, date(dr.InputTime)
When I update the edmx model, VS2010 can recognise "dr.ID", but it does not recognise "InputTime" and "CoalNum". The error message is "doesn't support datetype".
In the sqlite3 management studio, I checked this view ("J13") and I found "InputTime" and "CoalNum" datetype is null
cid name type notnull dflt_value pk
------- ------------ ---------- ----------- -------------- ---
0 ID INTEGER 0 0
1 InputTime 0 0
2 CoalNum 0 0
So I cannot update the data model in Entity Framework. Hopefully someone can help or provide further information before I yell bug.
A man have the same question,but i don't know did he solution it.
https://forum.openoffice.org/en/forum/viewtopic.php?t=27214
SQLite uses dynamic typing, and thus does not have data types in computed columns.
Entity Framework assumes that all columns do have known types; this is not fully compatible with SQLite's architecture.
You could try a workaround: create a database with a fake J13 table with data types, update the model, and then replace the database file with the real database while Entity Framework isn't looking.

Entity Framework Database Design Foreign Key and Table linking

Ok I have a master table which can be designed in two different ways, and I'm unsure of the best approach. I am doing model first programming in regards to setting up the database.
I have 5 tables so far.
Master table
Departments
Functions
Processes
Procedures
Which is a better way to handle the design?
Idea #1:
Master Table
masterId, departmentID, functionID, processID, procedureID, user1, date
Should I make it this way and then provide a FK from master to the departments table, functions table, processess table and procedures table?
Idea #2
Master Table
MasterID, departmentID, user1, date
This table will link to Departments table, which will then link to functions, which will link to processes which will link to procedures.
The master table will have a complete list of everything.
A department can have many functions.
a function can have many processes.
a process can have many procedures.
Which of the ways is best or am I just doing it completely wrong and someone can tell me thee or close to thee best way to create this diagram of tables and linking structure?
If you have the following criteria,
A master can have many departments.
a department can have many functions.
a function can have many processes.
a process can have many procedures.
Then you must use your second design idea. You only have one department, one function, one process, and one procedure key in your first design idea.
Master
------
Master ID
User
Date
...
Department
----------
Department ID
Master ID
...
Function
--------
Function ID
Department ID
...
and so on.
The primary key of each table is an auto incrementing integer or long.
The foreign keys are identified by name.

Entity Framework, Junction Tables with Timestamps

I was wanting to know if there is a good way to work in timestamps into junction tables using the Entity Framework (4.0). An example would be ...
Clients
--------
ID | Uniqueidentifier
Name | varchar(64)
Products
-----------
ID | uniqueidentifier
Name | varchar(64)
Purchases
--------
Client | uniqueidentifier
Product | uniqueidentifier
This works smooth for junctioning the two together - but I'd like to add a timestamp. Whenever I do that, I'm forced to go through the middle-table in my code. I don't think I can add the timestamp field to the junction table - but is there a different method that might be useable?
Well, your question says it all. You must either have a middle "Purchases" entity or not have the timestamp on Purchases. Actually, you can have the field on the table if you don't map it, but if you want it on your entity model then these are the only two choices.