Removing Unique Characters in Different Parts of String - powershell

I'm trying to cut the ? character and pixels work out of a text file export in a unique column.
Sample String: ?300 dpi
#{N='Dpi' ; E={$_.'Horizontal resolution'.Split(" ")[0]}}
I am using split to successfully remove dpi although I also want to remove the ? at the start of the string.
"Name","Path","BaseName","Dpi","Width(Pixels)","Height(Pixels)","DpiTest"
"test.png","\\directory\TCG\Labels\test.png","test","?300","?2623","?1229","?2623 pixels"

You can use the TrimStart() method to remove one or more characters at the start of a string:
$_.'Horizontal resolution'.Split(" ")[0].TrimStart('?')
But I would suggest using the -replace operator for both operations:
$_.'Horizontal resolution' -replace '\?(\d+).*','$1'
The regex matches on a literal ?, 1 or more numerical digits and anything, and then replaces it with with the digits

just do it:
$_.'Horizontal resolution'.Split(" ")[0].Replace('?', '')

Related

How to remove double Quotes In DataStage using a transformer stage?

We receiving Input data like below
“VENKATA,KRISHNA”
I want output like below
VENKATA,KRISHNA
Can anyone help me with this
Check out the Ereplace function - it allows to replace certain characters so you could rplace " with '' (empty string).
An alternative is TRIM - you can specify which character the command should trim and also if All occurrences or Both (from both sides of the string) plus more.

How to identify a character in a string?

I am trying to write a Powershell code to identify a string with a specific character from a filename from multiple files.
An example of a filename
20190902091031_202401192_50760_54206_6401.pdf
$Variable = $Filename.Substring(15,9)
Results:
202401192 (this is what I am after)
However in some instances the filename will be like below
20190902091031_20240119_50760_54206_6401.pdf
$Variable = $Filename.Substring(15,9)
Results:
20240119_ (this is NOT what I am after)
I am trying to find a code to identify the 9th character,
IF the 9th character = "_"
THEN Set
$Variable = $Filename.Substring(15,8)
Results:
20240119
All credit to TheMadTechnician who beat me to the punch with this answer.
To expand on the technique a bit, use the split method or operator to split a string every time a certain character shows up. Your data is separated by the underscore character, so is a perfect example of using this technique. By using either of the following:
$FileName.Split('_')
$FileName -split '_'
You can turn your long string into an array of shorter strings, each containing one of the parts of your original string. Since you want the 2nd one, you use the array descriptor [1] (0 is 1st) and you're done.
Good luck

how to remove # character from national data type in cobol

i am facing issue while converting unicode data into national characters.
When i convert the Unicode data into national using national-of function, some junk character like # is appended after the string.
E.g
Ws-unicode pic X(200)
Ws-national pic N(600)
--let the value in Ws-Unicode is これらの変更は. getting from java end.
move function national-of ( Ws-unicode ,1208 ) to Ws-national.
--after converting value is like これらの変更は #.
i do not want the extra # character added after conversion.
please help me to find out the possible solution, i have tried to replace N'#' with space using inspect clause.
it worked well but failed in some specific scenario like if we have # in input from user end. in that case genuine # also converted to space.
Below is a snippet of code I used to convert EBCDIC to UTF. Before I was capturing string lengths, I was also getting # symbols:
STRING
FUNCTION DISPLAY-OF (
FUNCTION NATIONAL-OF (
WS-EBCDIC-STRING(1:WS-XML-EBCDIC-LENGTH)
WS-EBCDIC-CCSID
)
WS-UTF8-CCSID
)
DELIMITED BY SIZE
INTO WS-UTF8-STRING
WITH POINTER WS-XML-UTF8-LENGTH
END-STRING
SUBTRACT 1 FROM WS-XML-UTF8-LENGTH
What this code does is string the UTF8 representation of the EBCIDIC string into another variable. The WITH POINTER clause will capture the new length of the string + 1 (+ 1 because the pointer is positioned to the next position after the string ended).
Using this method, you should be able to know exactly how long second string is and use that string with the exact length.
That should remove the unwanted #s.
EDIT:
One thing I forgot to mention, in my case, the # signs were actually EBCDIC low values when viewing the actual hex on the mainframe
Use inspect with reverse and stop after first occurence of #

finding a comma in string

[23567,0,0,0,0,0] and other value is [452221,0,0,0,0,0] and the value should be contineously displaying about 100 values and then i want to display only the sensor value like in first sample 23567 and in second sample 452221 , only the these values have to display . For that I have written a code
value = str2double(str(2:7));see here my attempt
so I want to find the comma in the output and only display the value before first comma
As proposed in a comment by excaza, MATLAB has dedicated functions, such as sscanf for such purposes.
sscanf(str,'[%d')
which matches but ignores the first [, and returns the next (i.e. the first) number as a double variable, and not as a string.
Still, I like the idea of using regular expressions to match the numbers. Instead of matching all zeros and commas, and replacing them by '' as proposed by Sardar_Usama, I would suggest directly matching the numbers using regexp.
You can return all numbers in str (still as string!) with
nums = regexp(str,'\d*','match')
and convert the first number to a double variable with
str2double(nums{1})
To match only the first number in str, we can use the regexp
nums = regexp(str,'[(\d*),','tokens')
which finds a [, then takes an arbitrary number of decimals (0-9), and stops when it finds a ,. By enclosing the \d* in brackets, only the parts in brackets are returned, i.e. only the numbers without [ and ,.
Final Note: if you continue working with strings, you could/should consider the regexp solution. If you convert it to a double anyways, using sscanf is probably faster and easier.
You can use regexprep as follows:
str='[23567,0,0,0,0,0]' ;
required=regexprep(str(2:end-1),',0','')
%Taking str(2:end-1) to exclude brackets, and then removing all ,0
If there can be values other than 0 after , , you can use the following more general approach instead:
required=regexprep(str(2:end-1),',[-+]?\d*\.?\d*','')

Trouble determining the pattern for NSRegularExpression...?

i am relatively new to NSRegularExpression and just can't come up with a pattern to find a string within a string....
here is the string...
##$294#001#[12345-678[123-456-7#15665#2
I want to extract the string..
#001#[12345-678[123-456-7#
for more info I know that there will be 3 digits(like 001) between two # 's and 20 characters between the last two # 's..
I have tried n number of combinations but nothing seem to work. any help is appreciated.
How about something like this:
#[0-9]{3}#.{20}#
If you know that the 20 characters will always consist of digits, [ and -, your pattern would become:
#[0-9]{3}#[0-9\[\-]{20}#
Be careful with the backslashes: When you use create the pattern with a string literal (#"..."), you need to add an extra backslash before each backslash.
You can test NSRegularExpression patterns without recompiling each time by using RegexTester https://github.com/liyanage/regextester