How can I mask a part of email addresses given in a column in a dataframe?
I tried using regular expression but not too sure
Related
These are iPads and phones with their battery levels monitored in Grafana.
How can I filter it, so it will show me the iPads only (starting with device name "H.")
When I try regex like "/^H.$/" it doesn't work.
Thanks
Not a regex expert, but ^H.$ will only match H. literally.
You need to match any amount and type of possible following characters as well; you can do that for numbers, lowercase, and uppercase characters with \w*.
So in your case, ^H.\w*$ should work.
EDIT
After having a closer look at your pictures, I realize that the names you want to filter for are rows of the column device_name, not the column names themselves. However, the used Filter by name filters the column names, in your case device_name and battery_level. Since neither fit the regex, 'No data' is returned.
To filter the results of a column you have to use Filter data by values. You have to specify the field, i.e. the column device_name, use Regex as type for Match and then enter the regex in the Value field.
Use the regex as explained above.
I need to concatanate a few string values in order to obtain the SHA256 encrypted string. I've seen Data Fusion has a plugin to do the job:
The documentation however is very poor and nothing I've tried seems to work. I created a table in BQ with the string fields I need to concatanate but the output is same as input. Can anyone provide with an example on how to use this plugin?
EDIT
Below I present the example,
This is how the workflow looks like:
For the testing purposes, I added one column with the following string:
2022-01-01T00:00:00+01:00
And here's the output:
You can use Wrangler to concatenate the string values.
I tried your scenario adding Wrangler to the Pipeline:
Joining 2 Columns:
I named the column new_col, using , as delimiter:
Output:
What you described can be achieved by 2 Wranglers:
The first Wrangler will be what #angela-b described. Use the merge directive to create a new column with the concatenation of two columns. Example directive that joins column a and b using , as the delimiter and stores the result in column a_b:
merge a b a_b ,
The second Wrangler will use the hash directive which will hash the column in place using a specified algorithm. Example of a directive that hashes column a_b using MD5:
hash :a_b 'MD5' true
Remember to set the last parameter encode to true so that you get a string output instead of a byte array.
I'm building a Spark Scala application that dynamically lists all tables in a SQL Server database and then loads them to Apache Kudu.
I'm building a dynamic string variable that tracks the primary key columns for each table. The primary keys are comma separated within the variable. The following is an example of my variable value:
PrimaryKeys=storeId,storeNum,custId
The following is a required function that I must enter a List[String] as input (which primary keys is definitely not correct):
setRangePartitionColumns(List("storeId","storeNum","custId").asJava
If I just use the PrimaryKeys variable for the List input (like the following), it only works for a single column (and would fail in this example with 3 comma separated values):
setRangePartitionColumns(List(PrimaryKeys).asJava
The following is another example, but using a Seq(). I"m supposed to put the same Primary Key column names in the same format below. Manually typing the column names works fine, however I can not figure out how to dynamically input the variable values:
kuduContext.createTable(tableName, df.schema, Seq(PrimaryKey), kuduTableOptions)
Any idea how I can parse the variable PrimaryKey dynamically and feed it into either function regardless of the number of comma separated values included?
Any assistance is greatly appreciated.
a. 123.12.1 -> 123.12.999
b. 123.12.100.0 -> 123.12.100.999
c. 123.123 -> 123.999
I have a Redshift table with one IP address column, cases as above, I used substring and position function nested many time to match the requirement, but I want to learn if is there any other method to do
a cleaner way is using a Python UDF that splits the string by dot symbol, and returns all elements but the last one, with 999 appended. the body of the function is below (val is the parameter, look how to create the function in official Redshift docs)
return '.'.join(val.split('.')[:-1])+'.999'
No need to use split. the easiest way to update your values is:
update table set ip = regexp_replace(ip, '[.][0-9]{1,3}$','.999');
see redshift regexp_replace function
the $ character makes sure that only the last octet is replaced.
I have 2 data frame in Scala
1DF=all proxy url(url,ip)
2Df=regex list(patten,type)
I want to extract uuid or credit card id from url using list 2df ..and Store the matched pattern record along with it's IP address and the URL. please give me solution