We are attempting to merge multiple datasets created in in filmmaker pro.
These datasets have multiple tables, and each entry within each table has a local ID that is used to relate entries between tables. The local ID values for all the entries were serially generated values, but some of the ID values are repeated between the different datasets, though the indicated records are non equivalent.
How can the ID values be updated in the data that is being imported to remove these overlaps without destroying the relationships that depend on them?
If you have access to the original database, you can try to migrate the ID's over to UUID or something unique before exporting. This has to be done manually, either cut/paste by hand or by a script.
Such a script will have do the following:
Loop through the parent records
For each record go to the related records
Generate an UUID with the get(UUID) function and put it in a variable
Replace the parent ID in the related record with this variable
Return to the parent record and replace the record ID with the variable.
Move to the next record.
Repeat until all records have been updated.
Related
I have a PostgreSQL table in an application that holds both parent and child records. There is a column in the table to reference the the parent id where applicable for each child record. The problem is I am trying to import data from an external source where the child record is made up of a sub number of the parent. eg parent_reference_id = 123456000000 and a child_reference record for this could 123456000001, 123456000002 and so on. The application itself generates a unique id for each record when I import the data and so its possible to import the child and parent records simultaneously, however the difficulty I'm facing is linking the application generated id for the parent record to the parent_reference_id for the corresponding child records. The only hook I have is that the 1st six digits of the child_value_reference match the 1st six digits of the parent_value_reference and I've tried something like foo = bar(left(value,6)||'000000'; to create a match. However, I don't know how to use this to return the unique_id in a meaningful way and update the matching records. I've tried temporary tables and cte, however my knowledge of postgres is limited and I can't seem to find a solution that fits my problem. Another thing to mention is that these groups can change with updates within the external data so i'd also need a solution to make those updates too. Thanks in advance, Crispian
I'm trying to build a database (in PostgreSQL 9.6.6) that allows for one "master column" (items.id) to be replicated in to many (automatically generated) tables (e.g. rank1.id, rank2.id, rank3.id, ...). Only items will have INSERT's (or DELETE's) performed and when they are the newly added id's should also show up (or be removed) in the rankX table(s). To be more concrete:
items:
id | name | description
rank1:
id | rank
rank2:
id | rank
...
Where the id's are always the same, and there is always the same number of rows in each of the tables. The rankX.rank values, however, will be different (imagine users ranking how funny a series of images are -- the images all have the same id's but different users might rank them differently).
What I was thinking was that when a new user was added and a new rankX table created I would do the following:
Have rankX.id referencing a foreign key items.id (with ON DELETE CASCADE)
Copy any items.id that already exist
Auto-generate a trigger function that mirrors the INSERT's to items to the rankX table
This seems cumbersome and wasteful of space since all of the xxxx.id columns are identical and I will end up with hundreds or thousands of trigger functions. As someone new to relational databases I was hoping there was an easier way to achieve this.
So, I have a few questions:
Is there a more efficient way to define my tables such that all of this copying isn't necessary?
If this the best way, can you give an example of how you would set up the triggers (and associated functions)?
Do I need to worry about running out of space on the server as I create (potentially many) sets of triggers of this type?
I can't think of a better title, so feel free to make a suggestion once you understand the issue.
I was given a table to work with that I need to call from another table:
Name
Month
Type
Value
For each record in the main table I need to pull one "Value" that corresponds to it. What it is will be determined by all three of the other fields. So for example, if a record in the main table is:
Name:
Google
Date:
3\17\2016
Type:
M
Then I need to pull the value for the record in the other table where the Name is "Google", the month is "3", and the type is "M".
I was able to do this successfully (if slowly) using an ExecuteSQL command in a calculation field, with a ton of nested If statements for the names (I have yet to figure out how to input the record's data directly into the ExecuteSQL statement, it breaks when I try). I would prefer to just grab the data directly. I can't switch over to the other layout because I need to see all of the records at once. I can't do a simple relationship because there isn't a real relationship, it's like there are three foreign keys working in tandem and I only know how to use one to call the data.
Any idea on how to do this more simplistically?
Some ideas I've had but not sure if it will work:
Using a calculation field as a related field to dynamically point to the row by code (concatenate the three relevant fields into a type of code). Not sure if you can connect two tables by a calculation field.
Doing that same thing when calling the data into the table in the first place, adding a code to create a single primary key.
Here are my relationships:
I can't do a simple relationship because there isn't a real
relationship, it's like there are three foreign keys working in tandem
and I only know how to use one to call the data.
Simply define a relationship with three predicates - i.e. three pairs of match fields.
What is the correct way of storing large lists in PostgreSQL?
I currently have a "user" table, where the "followers" column stores a list of the followers that that user has. This list is stored in JSON, and every time the server wants to add a new user to that list, it retrieves it from the database, appends the new user, and then replaces the old list with the new list.
The problem is that these lists tend to get quite lengthy, which might affect performance. Is it possible to simply append to the list directly via SQL without retrieving it and rewriting it later?
Use a separate table for followers. The table should have at least two columns: userid and followerid. And it's good practice to have a primary key for this table as well, so let's give it a "ufid".
You can do a select to get all the elements and compute the JSON string if your application needs it. But do not work with JSON or any other string representation of the list, as it defeats the purpose of a relational database.
To add a new follower, simply add a new record to the follower table with the userid; deleting and update are also done on the record level without working with the "other records".
If followers is a list of integers which are primary keys to their accounts, make it an integer array int[]. If they are usernames or other words, go with a string array character varying[].
To append to an array column you can do this:
UPDATE the_table SET followers = followers || new_follower WHERE id = user;
I am obtaining a json array from a url and inserting data into a table. Since the contents of the url are subject to change, I want to make a second connection to a url and check for updates and insert new records in y table using sqlite3.
The issues that I face are:
1) My table doesn't have a primary key
2) The url lists the changes on the same day. Hence, if I run my app multiple times, when I insert values in my database, I get duplicate entries. I want to keep a check for the day duplicated entries that should be removed. The problem can be solved by adding a constraint, but since the url itself has duplicated values, I find it difficult.
The only way I can see you can do it if you have no primary key or something you can use that is unique to each record, is when you get your new data in you go through the new entries where for each one you check if the exact same data exists in the database already. If it doesn't then you add it, if it does then you skip over it.
You could even do something like create a unique key yourself for each entry which is a concatenation of each column of the table. That way you can quickly do the check for if the entry already exists in the database.
I see two possibilities depending on your setup:
You have a column setup as UNIQUE (this can be through a PRIMARY KEY or not). In this case, you can use the ON CONFLICT clause:
http://www.sqlite.org/lang_conflict.html
If you find this construct a little confusing, you can instead use "INSERT OR REPLACE" or "INSERT OR IGNORE" as described here:
http://www.sqlite.org/lang_insert.html
You do not have a column setup as UNIQUE. In this case, you will need to SELECT first to verify for duplicate data, and based on the result INSERT, UPDATE, or do nothing.
A more common & robust way to handle this is to associate a timestamp with each data item on the server. When your app interrogates the server it provides the timestamp corresponding to the last time it synced. The server then queries its database and returns all values that are timestamped later than the timestamp provided by the app. Then it also returns a new timestamp value for the app to store, to use on the next sync.