Sidestepping possible delimiter collisions - postgresql

So I have user accounts that have a lot of data, and sometimes people create duplicate accounts and administrators need to merge them non-destructively.
I've written a script that combines the two accounts, and stores a record in a merge_record table. Each row of data is stored as one merge_record entry, containing the origin account, destination account, action type(deletion, merge or exclusion), table name, column that denotes the account number, and an encoded string containing all the key/value pairs in the following format:
columnoneMERGEBANANAEQUALSvalueoneMERGEBANANASPLITTERcolumntwoMERGEBANANAEQUALSvaluetwo
Yes, that's probably hard to read - but my goal was to delimit the data pairs by some incredibly-unlikely-to-be-used-by-a-user string, and have equals be equally unlikely to be non-delimited. The reason I need this is because I've also created an Undo Merge button, and it needs to be reversible - so the Undo Merge scans each row of merge_record, deconstructs the column/value pairs and inserts into table_name under the original_account's ID in the account_column section.
However, I still don't like it. Some jerk who read this post could type MERGEBANANAEQUALS as their name, request a merge, and then request an undo. Is there any way to absolutely guarantee that no collisions can occur? Or should I re-design the way the key/value pairs are stored? If so, what is a better way?

The simple answer is to create a line of CSV from the data, like this:
datum1,"datum,with,commas","datum with ""double quote""",,42
Just escape everything that's nasty.

Related

Reversed Distinct?

I have a table in my postgres database, with a list of buildings called Buildings_national. In this table, I have a larger number of duplicates, and i need to point them out to have the duplicates deleted.
The thing is, that I need to leave out a single line for every duplicate group, to make sure that one example of the duplicated building is retained in the table.
All buildings has a unique identifier, and every line has a unique identifier as well.
Can anyone recommend a good way to do this? I guess what I am looking for, is a reversed distinct somehow?

Deleting substrings throughout a column

I am trying to clean up throughout columns within a table to create a clear attribution/reference for reporting on my digital marketing campaigns. The goal is to keep one part of a string while deleting all others. All strings within my marketing campaigns have symbols separating each substring.
Attached are pictures of my current table and of the desired table.
I am essentially trying to only keep on part of the structure of a string and delete all other sub strings. I have already managed to do this successfully by applying the following formula given to be from a separate thread.
update adwords
set campaign = substring(campaign from '%-%-#"%#"' for '#')
where campaign like '%-%-%';
This worked perfectly, however, I do not fully understand why and have not found a clear answer thus far on this forum.
How would I apply this to future rows? Ad group and match type can be used for this purpose.
Many Thanks.
First thing: You do not modify source data. Do ETL instead, and transform it to a final stage. Do that periodically and thus taking care of new data.
You could just create a trigger which should work for all new data, but there are 2 caveats with that:
Failure will lead to missing data and you not being able to QA it.
If you modify the source data in an incorrect way by mistake, you cannot undo it unless you have a backup, and even then it's just too hard.
So instead look at ETL tools like Talend or Pentaho Kettle; create your own ETL scripts, or whatever. Use Jenkins to schedule all of this periodically and you're set.
Now, about the transformation itself.
for '#'
indicates that # will be an escape symbol, which means that #" will be treated as a regular quote in this case.
substring(campaign from '%-%-#"%#"' for '#')
thus, selects everything between the quotes in the pattern. % is a wildcard, same as used in LIKE comparisons. So everything in the last group will be returned. This can better be done with regular expressions
substring(campaign from '.*?-.*?-(.*)')
For the second column the regex would be ^(.*?)\s*\{
And for the third one - similar: ^(.*?)\s*\}
I would create the new table like this:
CREATE TABLE aw_final AS
SELECT
substring(campaign FROM '^\w{2}-\w+-(.*)$') AS campaign,
substring(ad_group FROM '^(\w+)\s*\{\w+\}$') AS ad_group,
substring(match_type FROM '^(\w+)\s*\}$') AS match_type
FROM adwords
WHERE campaign ~ '^\w{2}-\w+-(.*)$'
But if you must do an update, this would be how:
UPDATE adwords SET
campaign = substring(campaign FROM '^\w{2}-\w+-(.*)$'),
ad_group = substring(ad_group FROM '^(\w+)\s*\{\w+\}$'),
match_type = substring(match_type FROM '^(\w+)\s*\}$')
WHERE campaign ~ '^\w{2}-\w+-(.*)$'

Data type for storing long lists

What is the correct way of storing large lists in PostgreSQL?
I currently have a "user" table, where the "followers" column stores a list of the followers that that user has. This list is stored in JSON, and every time the server wants to add a new user to that list, it retrieves it from the database, appends the new user, and then replaces the old list with the new list.
The problem is that these lists tend to get quite lengthy, which might affect performance. Is it possible to simply append to the list directly via SQL without retrieving it and rewriting it later?
Use a separate table for followers. The table should have at least two columns: userid and followerid. And it's good practice to have a primary key for this table as well, so let's give it a "ufid".
You can do a select to get all the elements and compute the JSON string if your application needs it. But do not work with JSON or any other string representation of the list, as it defeats the purpose of a relational database.
To add a new follower, simply add a new record to the follower table with the userid; deleting and update are also done on the record level without working with the "other records".
If followers is a list of integers which are primary keys to their accounts, make it an integer array int[]. If they are usernames or other words, go with a string array character varying[].
To append to an array column you can do this:
UPDATE the_table SET followers = followers || new_follower WHERE id = user;

perl dbi submit checkbox values

I have a form with checkboxes and I need to know what the best way to submit them to the database is. I have the following table setup:
roles users user_roles
----- ----- ----------
id id user_id
role_id
I have a page where you can edit a user and assign them different roles via checkbox, then those checkboxes are saved in the user_roles table. Since editing a user's roles can involve either deleting rows or adding rows, this is how I currently handle it:
my $form_vals = (1=>1,2=>2); #submitted by user
my $db_vals = (3=>3); #gotten out of db
So I have these two hashes and I will compare the keys in $form_vals with the keys in $db_vals, then I see that I have two extra values that are not present in the database so I add them. And vice versa I find which values are no longer selected on the form by comparing the keys in $db_vals with the keys in $form_vals and then I delete those rows from the database. My question is, does anyone know of a better/easier way to do this? It's never really seemed obvious to me how to handle checkboxes and I'd like to know what best practice is. Thanks!
I wouldn't say that this has much to do with check boxes per se.
Basically what you have is two array of arrays, [ (uid, rid), (uid, rid) ], and you want to make array1 (the one in your database) a copy of array2 (the user input from the checkboxes). You could have a multi select or a comma separated string, and the case would be the same. You have a user id, and you want that user to have only the roles supplied.
Two ways to achieve that would be to either
Put both arrays in one hash each, do foreach key on the submitted, if not present in the database one do insert. Then do the same for the database hash and delete those not present in the submitted hash
Delete everything from the member_role table and insert what's submitted.
You really have to know everything in the database and everything submitted and check twice if you don't want to delete everything and do a fresh insert. You can of course make a function doing this for you, hiding the ugliness a bit. Think about how you'd do if it was just two arrays and no database was around.

How can I (partially) automate the transfer of a FileMaker database structure and field contents to a second database?

I'm trying to copy some field values to a duplicate database. One record at a time. This is used for history and so I can delete some records in the original database to keep it fast.
I don't want to manually save the values in a variable because there are hundreds of fields. So I want to go to the first field, save the field name and value and then go over to the other database and save the data. Then run a 'Go to Next Field' and loop through all the fields.
This works perfectly, but here is the problem: When a field is a calculation you cannot tab into it and therefore 'Go to Next Field' doesn't work. It skips it.
I though of doing a 'Go to Object' but then I need to name all the objects and I can't find a script to name objects.
Can anyone out there think of a solution?
Thanks!
This is one of those problems where I always found it easier to do an export/import.
Export all the data you want from the one database, and then import it into the other database. All you need to do is:
Manually specify which fields you want to copy
Map the data from the export to the right fields in the new database/table
You can even write a script to do these things for you.
There are several ways to achieve this.
To make a "history file", I have found there are several cases out there, so lets take a look.
CASE ONE
Single file I just want to "keep" a very large file with historical data, because I need to erease all data in my Main file.
In this case, you should create a "clone" table (in the same file ore in other file, is the same). Then change any calculation field to the type of the calculation result (number, text, date, an so on...). Remove any "auto entered value or calculation from any field, like auto number, auto creation date, etc..). You will have a "Plain Table" with no calculations or auto entered data.
Then add a field to control duplicate data. If you have lets say an invoice number (unique) for each record, you can do this to achieve this task. But if you do not have a unique field that identifies the record as unique, then you have to create one...
To create such a field, I recommed to add a new field on the clone table and set as an aunto entered calculation and make a field combination that is unique... somthing like this: invoiceNumber & "-" & lineNumber & "-" " & date.
On the clone table make shure that validation is set up for "always", and no empty values allowed and that this value is unique.
Once you setup the clone table... then you can import your records, making sure that the auto enty option is on. Yo can do it as many times as you like, new records will be added and no duplicates.
If you want, can make a Script to do the move to historical table all the current records before deleting them.
NOTE:
This technique works fine when the data you try to keep do not have changes over time. This means, once the record is created is has no changes.
CASE TWO
A historical table must be created but some fields are updated.
In the beginnig I thougth a historical data, never changes. In some cases I found this is not the case, like the case I want to track historical invoices but at the same time, keep track if they are paid or not...
In this case you may use the same technique above, but instead of importing data... you must update data based on the "unique" fields that identifiy the record.
Hope this technique helps
FileMaker's FieldNames() function, along with GetField() can give you a list of field names and then their values