Deleting substrings throughout a column - postgresql

I am trying to clean up throughout columns within a table to create a clear attribution/reference for reporting on my digital marketing campaigns. The goal is to keep one part of a string while deleting all others. All strings within my marketing campaigns have symbols separating each substring.
Attached are pictures of my current table and of the desired table.
I am essentially trying to only keep on part of the structure of a string and delete all other sub strings. I have already managed to do this successfully by applying the following formula given to be from a separate thread.
update adwords
set campaign = substring(campaign from '%-%-#"%#"' for '#')
where campaign like '%-%-%';
This worked perfectly, however, I do not fully understand why and have not found a clear answer thus far on this forum.
How would I apply this to future rows? Ad group and match type can be used for this purpose.
Many Thanks.

First thing: You do not modify source data. Do ETL instead, and transform it to a final stage. Do that periodically and thus taking care of new data.
You could just create a trigger which should work for all new data, but there are 2 caveats with that:
Failure will lead to missing data and you not being able to QA it.
If you modify the source data in an incorrect way by mistake, you cannot undo it unless you have a backup, and even then it's just too hard.
So instead look at ETL tools like Talend or Pentaho Kettle; create your own ETL scripts, or whatever. Use Jenkins to schedule all of this periodically and you're set.
Now, about the transformation itself.
for '#'
indicates that # will be an escape symbol, which means that #" will be treated as a regular quote in this case.
substring(campaign from '%-%-#"%#"' for '#')
thus, selects everything between the quotes in the pattern. % is a wildcard, same as used in LIKE comparisons. So everything in the last group will be returned. This can better be done with regular expressions
substring(campaign from '.*?-.*?-(.*)')
For the second column the regex would be ^(.*?)\s*\{
And for the third one - similar: ^(.*?)\s*\}
I would create the new table like this:
CREATE TABLE aw_final AS
SELECT
substring(campaign FROM '^\w{2}-\w+-(.*)$') AS campaign,
substring(ad_group FROM '^(\w+)\s*\{\w+\}$') AS ad_group,
substring(match_type FROM '^(\w+)\s*\}$') AS match_type
FROM adwords
WHERE campaign ~ '^\w{2}-\w+-(.*)$'
But if you must do an update, this would be how:
UPDATE adwords SET
campaign = substring(campaign FROM '^\w{2}-\w+-(.*)$'),
ad_group = substring(ad_group FROM '^(\w+)\s*\{\w+\}$'),
match_type = substring(match_type FROM '^(\w+)\s*\}$')
WHERE campaign ~ '^\w{2}-\w+-(.*)$'

Related

Sidestepping possible delimiter collisions

So I have user accounts that have a lot of data, and sometimes people create duplicate accounts and administrators need to merge them non-destructively.
I've written a script that combines the two accounts, and stores a record in a merge_record table. Each row of data is stored as one merge_record entry, containing the origin account, destination account, action type(deletion, merge or exclusion), table name, column that denotes the account number, and an encoded string containing all the key/value pairs in the following format:
columnoneMERGEBANANAEQUALSvalueoneMERGEBANANASPLITTERcolumntwoMERGEBANANAEQUALSvaluetwo
Yes, that's probably hard to read - but my goal was to delimit the data pairs by some incredibly-unlikely-to-be-used-by-a-user string, and have equals be equally unlikely to be non-delimited. The reason I need this is because I've also created an Undo Merge button, and it needs to be reversible - so the Undo Merge scans each row of merge_record, deconstructs the column/value pairs and inserts into table_name under the original_account's ID in the account_column section.
However, I still don't like it. Some jerk who read this post could type MERGEBANANAEQUALS as their name, request a merge, and then request an undo. Is there any way to absolutely guarantee that no collisions can occur? Or should I re-design the way the key/value pairs are stored? If so, what is a better way?
The simple answer is to create a line of CSV from the data, like this:
datum1,"datum,with,commas","datum with ""double quote""",,42
Just escape everything that's nasty.

adding up specific mergefield values in word

I have a table in a word document that has three colums and all fields are mailmerge fields from an external IT system.
There are three columns displaying the fields:
Charge Description
Charge Value (£)
Eiligible? (yes/no)
I am trying to create a field that adds up all eligibale charges so that only charge values that show a "yes" in the eligigble field are included. Does anyone know if this is possible? I have tried creating a formula but can't get it to work. Also, I would assume at some point an if statment is required so that it only includes the eligible charge.
Has anyone done anything similar before and if so, would they mind sharing how it was achieved?
Many thanks
You can do some things with expression fields (created in Word with CTRL-F9). This will look like {} and you can insert the expression. eg {{MERGFIELD charge} + {MERGEFIELD charge2}}. Since however you want to check multiple values and then create an expression, its probably easier to use a macro. The macro would contain your logic, then set the fields in the document accordingly.
Here are two external links since I can't reproduce a useful amount the content here because it's a verbose answer to a potentially deep question:
Expression Fields
Merge fields
I hope that helps.

Prioritise which identifier to use

My crystal report pulls data about books, including an identifier (isbn, issn order number etc.), author, and publisher.
The ID field stores multiple ways to identify the book. The report displays any of the identifiers for that record. If one book has two identifiers; issn and order number, the report currently displays one apparently at random.
How can I make it prioritise which type to use based on a preset order? I figured some sort of filter on the field could work, but I haven't figured out how. I can't edit the table, but I can use SQL within the report.
If all the different types of ID are stored in a single field, your best bet is to use a SQL Command inside your report to separate them into multiple virtual fields.
Go to Database Fields / Database Expert, expand the connection you want to use, and pick Add Command. From here you can write a custom SQL statement to grab the information you're currently using, and at the same time separate the ID field into multiple different fields (as far as the report will be concerned, anyway. The table will stay unchanged.)
The trick is to figure out how to write your command to do the separation. We don't know what your data looks like, so you're on your own from here.
Based on the very little information that you have provided and if i was to make a guess.I suggest you make use of the formula field in your report and then use something like this to accomplish your goal.
IF ISNULL{first_priority_field_name} OR {first_priority_field_name} = '' THEN
{second_priority_field_name}
ELSE
{first_priority_field_name}
Use nested IF statement in case there are more than 2 identifier fields.

How to create table occurrences for filtered data..?

I have a table called transactions. Within that is a field called ipn_type. I would like to create separate table occurrences for the different ipn types I may have.
For example, one value for ipn_type is "dispute". In the past I would create a global field called "rel_dispute" and I would populate that with the value of "dispute". Then I could create a new table occurrence of the transactions table, and make a relationship based on transactions::ipn_type = transactions::rel_dispute. This way only the dispute records would show up in my new table occurrence.
Not long ago, somebody pointed out to me that this is no longer necessary, and there is a simpler way to setup such a relationship to create a new table occurrence. I can't for the life of me remember how that was done, though.
Any information on this would be greatly appreciated. Thanks!
To show a found set of only one type, you must either perform a find or use the Go to Related Record script step to show only related records. What you describe as your previous setup fits the latter.
The simpler way is to perform a find - either on demand, or by a script triggered OnLayoutEnter.
The new 'easy' way is probably:
using one base relationship only and
filtering only the displaying portal by type. This can be done with a global field, a global variable containing current display type. Multiple portals with different filter conditions are possible as well.
~jens

coldfusion - bind a form to the database

I have a large table which inserts data into the database. The problem is when the user edits the table I have to:
run the query
use lots of lines like value="<cfoutput>getData.firstname#</cfoutput> in the input boxes.
Is there a way to bind the form input boxes to the database via a cfc or cfm file?
Many Thanks,
R
Query objects include the columnList, which is a comma-delimited list of returned columns.
If security and readability aren't an issue, you can always loop over this. However, it basically removes your opportunity to do things like locking certain columns, reduces your ability to do any validation, and means you either just label the form boxes with the column names or you find a way to store labels for each column.
You can then do an insert/update/whatever with them.
I don't recommend this, as it would be nearly impossible to secure, but it might get you where you are going.
If you are using CF 9 you can use the ORM (Object Relation Management) functionality (via CFCs)
as described in this online chapter
https://www.packtpub.com/sites/default/files/0249-chapter-4-ORM-Database-Interaction.pdf
(starting on page 6 of the pdf)
Take a look at <cfgrid>, it will be the easiest if you're editing table and it can fire 1 update per row.
For security against XSS, you should use <input value="#xmlFormat(getData.firstname)#">, minimize # of <cfoutput> tags. XmlFormat() not needed if you use <cfinput>.
If you are looking for an easy way to not have to specify all the column names in the insert query cfinsert will try to map all the form names you submit to the database column names.
http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7c78.html
This is indeed a very good question. I have no doubt that the answers given so far are helpful. I was faced with the same problem, only my table does not have that many fields though.
Per the docs EntityNew() the syntax shows that you can include the data when instantiating the object:
artistObj = entityNew("Artists",{FirstName="Tom",LastName="Ron"});
instead of having to instantiate and then add the data field by field. In my case all I had to do is:
artistObj = entityNew( "Artists", FORM );
EntitySave( artistObj );
ORMFlush();
NOTE
It does appear from your question that you may be running insert or update queries. When using ORM you do not need to do that. But I may be mistaken.