How to substring based on symbol in ADF - substring

I have data like this.. where i would like to remove the data between [] and the symbol. I tried to do the substring based on position, but my string is having different length.
This is I need to do in ADF.
Input : Vloerbedekking specifiek [28]
Output: Vloerbedekking specifiek
Input: Fournituren [45]
Output: Fournituren
Input: Seizoensverlichting [53]
Output: Seizoensverlichting

You can simply do a regexReplace in your dataflow derive column activity.
The expression used is regexReplace(INPUT, '\\s+(\\[.*\\])', '')
The regex's explanation:
\\s+: specifiy 1 or more space
(\\[.*\\]): look for square brackets and words inside

Create dataflow.
Select derived column.
Use replace(Name,dropLeft(Name,length(Name)-4)) expression
Expected output

Related

#{item().TableName} only gives first part of string

I have database from which I take nvarchar to lookup. Then I transport it to foreach. My problem is that when I use #item().TableName in dataset properties as value it works fine but when I use #{item().TableName} in query it takes only first part of it. So when it is "Receipt Header" I only get "Receipt"
Enclose the table name within the square brackets '[]' if it contains space or any special characters and use concat() function when combining constants with variables in the expression.
Query:
#concat('select MAX([Creation Date]) as NewWatermarkvalue from ',item().sourceSchema,'.[',item().TableName),']')

substring based on symbols and letters

I'm trying to substring for some values in postgres starting with some letters and ending with a symbol. Please see example below,
Some row for a column
'value: 12423, store: target, date: 2010-08-22'
How do I substring for target in the column?
The new column becomes 'target'
substring('value: 12423, store: target, date: 2010-08-22' from 'store: ([^,]*),')
More information here.
First split the string on the comma ',' producing a vector/array of 3 items.
Then split the second item in the vector on the colon ':' and read off the second part of that split. Consider the functions string_to_array() and split_part().

USQL Escape Quotes

I am new to Azure data lake analytics, I am trying to load a csv which is double quoted for sting and there are quotes inside a column on some random rows.
For example
ID, BookName
1, "Life of Pi"
2, "Story about "Mr X""
When I try loading, it fails on second record and throwing an error message.
1, I wonder if there is a way to fix this in csv file, unfortunatly we cannot extract new from source as these are log files?
2, is it possible to let ADLA to ignore the bad rows and proceed with rest of the records?
Execution failed with error '1_SV1_Extract Error :
'{"diagnosticCode":195887146,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_ROW_ERROR","message":"Error
occurred while extracting row after processing 9045 record(s) in the
vertex' input split. Column index: 9, column name:
'instancename'.","description":"","resolution":"","helpLink":"","details":"","internalDiagnostics":"","innerError":{"diagnosticCode":195887144,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD","message":"Invalid
character following the ending quote character in a quoted
field.","description":"Invalid character is detected following the
ending quote character in a quoted field. A column delimiter, row
delimiter or EOF is expected.\nThis error can occur if double-quotes
within the field are not correctly escaped as two
double-quotes.","resolution":"Column should be fully surrounded with
double-quotes and double-quotes within the field escaped as two
double-quotes."
As per the error message, if you are importing a quoted csv, which has quotes within some of the columns, then these need to be escaped as two double-quotes. In your particular example, you second row needs to be:
..."Life after death and ""good death"" models - a qualitative study",...
So one option is to fix up the original file on output. If you are not able to do this, then you can import all the columns as one column, use RegEx to fix up the quotes and output the file again, eg
// Import records as one row then use RegEx to clean columns
#input =
EXTRACT oneCol string
FROM "/input/input132.csv"
USING Extractors.Text( '|', quoting: false );
// Fix up the quotes using RegEx
#output =
SELECT Regex.Replace(oneCol, "([^,])\"([^,])", "$1\"\"$2") AS cleanCol
FROM #input;
OUTPUT #output
TO "/output/output.csv"
USING Outputters.Csv(quoting : false);
The file will now import successfully. My results:

Detect Column containing special characters other than space - Postgresql

I need to find the values from a text column which have characters other than alphabets, numbers, and SPACE (It is a name column so having space is allowed).
I am trying this which is not working
select * from table where name ~ '[^a-z0-9 ]';
I have left a space between 9 and ]
The correct regular expression would be:
[^[:alnum:] ]
That will match any string that contains a character that is neither alphabetical nor numerical nor space.
Try ^[-a-z0-9 ]
I think you can use \\w instead of a-z0-9
so that looks like : [-\\w.]

finding a comma in string

[23567,0,0,0,0,0] and other value is [452221,0,0,0,0,0] and the value should be contineously displaying about 100 values and then i want to display only the sensor value like in first sample 23567 and in second sample 452221 , only the these values have to display . For that I have written a code
value = str2double(str(2:7));see here my attempt
so I want to find the comma in the output and only display the value before first comma
As proposed in a comment by excaza, MATLAB has dedicated functions, such as sscanf for such purposes.
sscanf(str,'[%d')
which matches but ignores the first [, and returns the next (i.e. the first) number as a double variable, and not as a string.
Still, I like the idea of using regular expressions to match the numbers. Instead of matching all zeros and commas, and replacing them by '' as proposed by Sardar_Usama, I would suggest directly matching the numbers using regexp.
You can return all numbers in str (still as string!) with
nums = regexp(str,'\d*','match')
and convert the first number to a double variable with
str2double(nums{1})
To match only the first number in str, we can use the regexp
nums = regexp(str,'[(\d*),','tokens')
which finds a [, then takes an arbitrary number of decimals (0-9), and stops when it finds a ,. By enclosing the \d* in brackets, only the parts in brackets are returned, i.e. only the numbers without [ and ,.
Final Note: if you continue working with strings, you could/should consider the regexp solution. If you convert it to a double anyways, using sscanf is probably faster and easier.
You can use regexprep as follows:
str='[23567,0,0,0,0,0]' ;
required=regexprep(str(2:end-1),',0','')
%Taking str(2:end-1) to exclude brackets, and then removing all ,0
If there can be values other than 0 after , , you can use the following more general approach instead:
required=regexprep(str(2:end-1),',[-+]?\d*\.?\d*','')