ADF Dataflow - Replace spaces with underscore in column names - azure-data-factory

To remove spaces with underscore I am using replace($$,' ','_') expression in Select transformation
It works for a column "Period Key" and makes it "Period_Key" but for another column "Week in Month Description" it makes it "Week_in Month Description". So it is replacing only first occurrence
Can someone try this? Or how can we write regex for this?

I used below function and it worked
regexReplace($$,' ','_')

Related

regex expression to replace everything in a string between 2 like characters?

This seems simple but I am not very familiar with regex so I'm having hard time finding a solution.
I have a date/time in string format that looks like this:
"11/18/2022 12:00 AM"
I want to create another property that is just a day ahead (so "11/19/2022 12:00 AM"), so I need a regex expression that just points to the "18" in that string.
Any help or guidance is appreciated! Thanks.
I've tried this:
^(.)(.)(./[^/])$
which just replaces the whole string.
You could match the pattern with capture groups and use those groups in the replacement.
The value for 18 is in capture group 2, and the leading and trailing part in group 1 and group 3 in case you want to do a replacement.
^(\d{1,2}/)(\d{1,2})(/\d{4}\s\d{1,2}:\d{1,2}\s[AP]M)$
See a regex 101 demo
Note that when you want to increment the date with a day, it would be better to match the string and then use a programming language with a date function/api.

How to delete space in character text?

I wrote a code that automatically pulls time-related information from the system. As indicated in the table is fixed t247 Month names to 10 characters in length. But it is a bad image when showing on the report screen.
I print this way:
WRITE : 'Bugün', t_month_names-ltx, ' ayının'.
CONCATENATE gv_words-word '''nci günü' INTO date.
CONCATENATE date ',' INTO date.
CONCATENATE date gv_year INTO date SEPARATED BY space.
TRANSLATE date TO LOWER CASE.
I tried the CONDENSE t_month_names-ltx NO-GAPS. method to delete the spaces, but it was not enough.
After WRITE, I was able to write statically by setting the blank value:
WRITE : 'Bugün', t_month_names-ltx.
WRITE : 14 'ayının'.
CONCATENATE gv_words-word '''nci günü' INTO date.
CONCATENATE date ',' INTO date.
CONCATENATE date gv_year INTO date SEPARATED BY space.
TRANSLATE date TO LOWER CASE.
But this is not a correct use. How do I achieve this dynamically?
You could use a temporary field of type STRING:
DATA l_month TYPE STRING.
l_month = t_month_names-ltx.
WRITE : 'Bugün', l_month.
WRITE : 14 'ayının'.
CONCATENATE gv_words-word '''nci günü' INTO date.
CONCATENATE date ',' INTO date.
CONCATENATE date gv_year INTO date SEPARATED BY space.
TRANSLATE date TO LOWER CASE.
You can not delete trailing spaces from a TYPE C field, because it's of constant length. The unused length is always filled with spaces.
But after you assembled you string, you can use CONDENSE without NO-GAPS to remove any chains of more than one space within the string.
Add CONDENSE date. below the code you wrote and you should get the results you want.
Another option is to abandon CONCATENATE and use string templates (string literals within | symbols) for string assembly instead, which do not have the annoying habit of including trailing spaces of TYPE C fields:
DATA long_char TYPE C LENGTH 128.
long_char = 'long character field'.
WRITE |this is a { long_char } inserted without spaces|.
Output:
this is a long character field inserted without spaces

How to keep the upper case and lower case letters in a column alias in the results in Redshift

In Redshift we are trying to give more meaningful aliases to the columns we are returning from the queries as we are importing the results into TABLEAU, the issue is that RedShift turns all the letter to lower case ones, i.e. from "Event Date" it then returns "event date", any idea on how to work this one out to keep the alias given?
I know I'm a bit late to the party but for anyone else looking, you can enable case sensitivity, so if you want to return a column with camel casing for example
SET enable_case_sensitive_identifier TO true;
Then in your query wrap what you want to return the column as in double quotes
SELECT column AS "thisName"
Or as per OP's example
SELECT a.event_date AS "Event Date"
https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html
Edit: To have this behaviour as default for the cluster you will need to create/update a parameter group in Configurations => Workload Management. You can't change the settings for the default parameter group. Note, you will need to reboot the cluster after applying the parameter group for the changes to take effect.
No, you cannot do this in Redshift. all columns are lowercase only.
You can enforce upper case only by using
set describe_field_name_in_uppercase to on;
Also see the examples here https://docs.aws.amazon.com/redshift/latest/dg/r_names.html you can see that the upper case characters are returned as lower case. and it says "identifiers are case-insensitive and are folded to lowercase in the database"
You can of course rename the column to include uppercase within Tableau.
I was going through AWS docs for redshift and looks like INTCAP function can solve your use case
For reference => https://docs.aws.amazon.com/redshift/latest/dg/r_INITCAP.html
Brief description (copied)
The INITCAP function makes the first letter of each word in a string uppercase, and any subsequent letters are made (or left) lowercase. Therefore, it is important to understand which characters (other than space characters) function as word separators. A word separator character is any non-alphanumeric character, including punctuation marks, symbols, and control characters. All of the following characters are word separators:
! " # $ % & ' ( ) * + , - . / : ; < = > ? # [ \ ] ^ _ ` { | } ~
And in your case you have declared field name as event_date which will convert to Event_Date.
And next you can use REPLACE function to replace underscore '_'
For reference => https://docs.aws.amazon.com/redshift/latest/dg/r_REPLACE.html
You need to put
set describe_field_name_in_uppercase to on;
in your Tableau's Initial SQL.

pyspark replace regex with regex

I am trying to replaces a regex (in this case a space with a number) with
I have a Spark dataframe that contains a string column. I want to replace a regex (space plus a number) with a comma without losing the number. I have tried both of these with no luck:
df.select("A", f.regexp_replace(f.col("A"), "\s+[0-9]", ' ,
').alias("replaced"))
df.select("A", f.regexp_replace(f.col("A"), "\s+[0-9]", '\s+[0-9] ,
').alias("replaced"))
Any help is appreciated.
What you need is another function, regex_extract
So, you have to divide the regex and get the part you need. It could be something like this:
df.select("A", f.regexp_extract(f.col("A"), "(\s+)([0-9])", 2).alias("replaced"))

Variable substitution of multiline list of strings in PostgreSQL

I'm trying to substitute the list in the following code:
kategori NOT IN ('Fors',
'Vattenfall',
'Markerad vinterled',
'Fångstarm till led',
'Ruskmarkering',
'Tält- och eldningsförbud, tidsbegränsat',
'Skidspår')
I found this question for the multiline part. However
SELECT ('Fors',
'Vattenfall',
'Markerad vinterled',
'Fångstarm till led',
'Ruskmarkering',
'Tält- och eldningsförbud, tidsbegränsat',
'Skidspår') exclude_fell \gset
gives
ERROR: column "fors" does not exist
LINE 1: SELECT (Fors,
^
, so I tried using triple quotes, dollar quotation and escape sequenses. Nothing has worked to satisfaction. This is true even if I use a single line variable and \set, so I must have misunderstood something about variable substitution. What is the best way of doing this?