Use regex to get value from varying strings - powershell

I am trying to figure out how to setup a regex to find a specific part of a string off of varying pieces of data.
IE, I have a PC log with the filename of TTEST7-17.txt or a filename of TTEST7-17.28-11-2018.txt
Basically, I just want the value "TTEST7-17" which would be the PC name.
But it has to pull that value no matter if the filename is TTEST7-17.txt or TTEST7-17.28-11-2018.txt
Each PC name and date is different, so it can't just match to one PC name and pull the string that way. It has to somehow determine IF it is formatted like TTEST7-17 or TTEST7-17.28.11.2018.txt and THEN either get rid of .txt or .28.11.2018.txt, no matter what the PC name is.

You can use the split() method. For example:
"TTEST7-17.28.11.2018.txt".Split('.')[0]
"TTEST7-17.txt".Split('.')[0]
Both of these give output of TTEST7-17.
Of course, if the name is in a variable, you can still use this method:
$fileName = "TTEST7-17.28.11.2018.txt"
$fileName.Split('.')[0]

Related

How to identify a character in a string?

I am trying to write a Powershell code to identify a string with a specific character from a filename from multiple files.
An example of a filename
20190902091031_202401192_50760_54206_6401.pdf
$Variable = $Filename.Substring(15,9)
Results:
202401192 (this is what I am after)
However in some instances the filename will be like below
20190902091031_20240119_50760_54206_6401.pdf
$Variable = $Filename.Substring(15,9)
Results:
20240119_ (this is NOT what I am after)
I am trying to find a code to identify the 9th character,
IF the 9th character = "_"
THEN Set
$Variable = $Filename.Substring(15,8)
Results:
20240119
All credit to TheMadTechnician who beat me to the punch with this answer.
To expand on the technique a bit, use the split method or operator to split a string every time a certain character shows up. Your data is separated by the underscore character, so is a perfect example of using this technique. By using either of the following:
$FileName.Split('_')
$FileName -split '_'
You can turn your long string into an array of shorter strings, each containing one of the parts of your original string. Since you want the 2nd one, you use the array descriptor [1] (0 is 1st) and you're done.
Good luck

How to parse logs and mask specific characters using Powershell

I have a problem that I really hope to get some help with.
It's rather complex but I will try and keep my explanation as simple and objective as possible. In a nutshell, I have log files that contain thousands of lines. Each line consists of information like date/time, source, type and message.
In this case the message contains a variable size ...999 password that I need to mask. Basically the message looks something like this (its an ISO message):
year-day-month 00:00:00,computername,source, info,rx 0210 22222222222222333333333333333333444444444444444444444444455555008PASSWORD6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff
For each line I need to zero in on password length identifier (008) do a count on it and then proceed to mask the number of following characters, which would be PASSWORD in this case. I would change it to something like XXXXXXXX instead so once done the line would look like this:
year-day-month 00:00:00,computername,source, info,rx 0210 22222222222222333333333333333333444444444444444444444444455555008XXXXXXXX6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff
I honestly have no idea how to start doing this with PowerShell. I need to loop though each line in the log file, and identify the number of characters to mask.
I've kept this high level as a starting point, there are some other complexities that I hope to figure out at a later time, like the fact that there are different types of messages and depending on the type the password length starts at another character position. I might be able to build on my aforementioned question first but if anyone understands what I mean then I would appreciate some help or tips about that too.
Any help is appreciated.
Thanks!
Additional information to original post:
Firstly, thank you to everyone for your answers thus far, its been greatly appreciated. Now that I have a baseline for how your answers are being formulated based on my information I feel I need to provide some more details.
1) There was a question about whether or not the password starting position is fixed and the logic behind it.
The password position is not fixed. In an ISO message (which these are) the password, and all information in the message, is dependent on the data elements present in the message which are in turn are indicated by the bitmap. The bitmap is also part of the message. So in my case, I need to script additional logic above and beyond the answers provided to come full circle.
2) This is what I know and these are the steps I hope to accomplish with the script.
What I know:
- There are 3 different msg types that contain passwords. I've figured out where the starting position of the password is for each msg type based on the bitmap and the data elements present.
For example 0210 contains one in this case:
year-day-month 00:00:00,computername,source, info,rx 0210 22222222222222333333333333333333444444444444444444444444455555008PASSWORD6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff
What I need to do:
Pass the log file to the script
For each line in the log identify if the line has a msg type that contains a password
If the message type contains a password then determine length of password by reading the preceding 3 digits to the password ("ans ...999" which means alphanumeric - special with length max of 999 and 3 digit length info). Lets say the character position of the password would be 107 in this case for arguments sake, so we know to read the 3 numbers before it.
Starting at the character position of the password, mask the number of characters required with XXX. Loop through log until complete.
It does seem as though you're indicating the position of the password and the length of the password will vary. As long as you have the '008' and something like '666' to indicate a starting and stopping point something like this should work.
$filePath = '.\YourFile.log'
(Get-Content $filePath) | ForEach-Object {
$startIndex = $_.IndexOf('008') + 3
$endIndex = $_.IndexOf('666', $startIndex)
$passwordLength = $endIndex - $startIndex
$passwordToReplace = $_.Substring($startIndex,$passwordLength)
$obfuscation = New-Object 'string' -ArgumentList 'X', $passwordLength
$_.Replace($passwordToReplace, $obfuscation)
} | Set-Content $filePath
If the file is too large to load into memory then you will have to StreamReader and StreamWriter to write the content to a new file and delete the old.
Assuming a fixed position where the password-length field starts, based on your sample line (if that position is variable, as you've hinted at, you need to tell us more):
$line = '22222222222222333333333333333333444444444444444444444444455555008PASSWORD6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff'
$posStart = 62 # fixed 0-based pos. where length-of-password field stats
$pwLenFieldLen = 3 # length of length-of-password field
$pwLen = [int] $line.SubString($posStart, $pwLenFieldLen) # extract password length
$pwSubstitute = 'X' * $pwLen # determine the password replacement string
# replace the password with all Xs
$line -replace "(?<=^.{$($posStart + $pwLenFieldLen)}).{$pwLen}(?=.*)", $pwSubstitute
Note: This is not the most efficient way to do it, but it is concise.

how to remove # character from national data type in cobol

i am facing issue while converting unicode data into national characters.
When i convert the Unicode data into national using national-of function, some junk character like # is appended after the string.
E.g
Ws-unicode pic X(200)
Ws-national pic N(600)
--let the value in Ws-Unicode is これらの変更は. getting from java end.
move function national-of ( Ws-unicode ,1208 ) to Ws-national.
--after converting value is like これらの変更は #.
i do not want the extra # character added after conversion.
please help me to find out the possible solution, i have tried to replace N'#' with space using inspect clause.
it worked well but failed in some specific scenario like if we have # in input from user end. in that case genuine # also converted to space.
Below is a snippet of code I used to convert EBCDIC to UTF. Before I was capturing string lengths, I was also getting # symbols:
STRING
FUNCTION DISPLAY-OF (
FUNCTION NATIONAL-OF (
WS-EBCDIC-STRING(1:WS-XML-EBCDIC-LENGTH)
WS-EBCDIC-CCSID
)
WS-UTF8-CCSID
)
DELIMITED BY SIZE
INTO WS-UTF8-STRING
WITH POINTER WS-XML-UTF8-LENGTH
END-STRING
SUBTRACT 1 FROM WS-XML-UTF8-LENGTH
What this code does is string the UTF8 representation of the EBCIDIC string into another variable. The WITH POINTER clause will capture the new length of the string + 1 (+ 1 because the pointer is positioned to the next position after the string ended).
Using this method, you should be able to know exactly how long second string is and use that string with the exact length.
That should remove the unwanted #s.
EDIT:
One thing I forgot to mention, in my case, the # signs were actually EBCDIC low values when viewing the actual hex on the mainframe
Use inspect with reverse and stop after first occurence of #

Store user input as wildcard

I am having some trouble with a data processing function in MATLAB. The function takes the name of the file to be processed as an input, finds the desired files, and reads in the data.
However, several of the desired files are variants, such as Data_00.dat, Data.dat, or Data_1_March.dat. Within my function, I would like to search for all files containing Data and condense them into one usable file for processing.
To solve this, I would like desiredfile to be converted into a wildcard.
Here is the statement I would like to use.
selectedfiles = dir *desiredfile*.dat % Search for file names containing desiredfile
This returns all files containing the variable name desiredfile, rather than the user input.
The only solution that I can think of is writing a separate function that manually condenses all the variants into one file before my function is run, but I am trying to keep the number of files used down and would like to avoid this.
You could concatenate strings for that. Considering desiredFile as a variable.
desiredFile = input('Files: ');
selectedfiles = dir(['*' desiredfile '*.dat']) % Search for file names containing desiredfile
Enclosing strings between square brackets [string1 string2 ... stringN]concatenates them. Matlab's dir function receives a string.
I believe you can achieve that using the dir command.
dataSets = dir('/path/to/dir/containing/Data*.dat');
dataSets = {dataSets.name};
Now simply loop over them, more information here.
To quote the matlab help:
dir lists the files and folders in the MATLAB® current folder. Results appear in the order returned by the operating system.
dir name lists the files and folders that match the string name. When name is a folder, dir lists the contents of the folder. Specify name using absolute or relative path names. You can use wildcards (*).

How to print variable inside function

I'm using RRDs::Simple function and it needs bunch of parameters.
I have placed these parameters in a special variable (parsing, sorting and calculating data from a file) with all quotes, commas and other stuff.
Of course
RRDs::create ($variable);
doesn't work.
I've glanced through all perl special variables and have found nothing.
How to substitute name of variable for the data that contained in that variable?
At least could you tell me with what kind of tools(maybe another special variables) it can be done?
Assuming I'm understanding what you're asking:
You've build the 'create' data in $variable, and are now trying to use RRDs::create to actually do it?
First step is:
print $variable,"\n"; - to see what is actually there. You should be able to use this from the command line, with rrdtool create. (Which needs a filename, timestep, and some valid DS and RRA parameters)
usually, I'll use an array to pass into RRDs::create:
RRDs::create ( "test.rrd", "-s 300",
"DS:name:GAUGE:600:U:U", )
etc.
If $variable contains this information already, then that should be ok. The way to tell what went wrong is:
if ( RRDs::error ) { print RRDs::error,"\n"; }
It's possible that creating the file is the problem, or that your RRD definitions are invalid for some reason. rrdtool create on command line will tell you, as will RRDs::error;