How to parse logs and mask specific characters using Powershell

How to parse logs and mask specific characters using Powershell - powershell

I have a problem that I really hope to get some help with.
It's rather complex but I will try and keep my explanation as simple and objective as possible. In a nutshell, I have log files that contain thousands of lines. Each line consists of information like date/time, source, type and message.
In this case the message contains a variable size ...999 password that I need to mask. Basically the message looks something like this (its an ISO message):
year-day-month 00:00:00,computername,source, info,rx 0210 22222222222222333333333333333333444444444444444444444444455555008PASSWORD6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff
For each line I need to zero in on password length identifier (008) do a count on it and then proceed to mask the number of following characters, which would be PASSWORD in this case. I would change it to something like XXXXXXXX instead so once done the line would look like this:
year-day-month 00:00:00,computername,source, info,rx 0210 22222222222222333333333333333333444444444444444444444444455555008XXXXXXXX6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff
I honestly have no idea how to start doing this with PowerShell. I need to loop though each line in the log file, and identify the number of characters to mask.
I've kept this high level as a starting point, there are some other complexities that I hope to figure out at a later time, like the fact that there are different types of messages and depending on the type the password length starts at another character position. I might be able to build on my aforementioned question first but if anyone understands what I mean then I would appreciate some help or tips about that too.
Any help is appreciated.
Thanks!
Additional information to original post:
Firstly, thank you to everyone for your answers thus far, its been greatly appreciated. Now that I have a baseline for how your answers are being formulated based on my information I feel I need to provide some more details.
1) There was a question about whether or not the password starting position is fixed and the logic behind it.
The password position is not fixed. In an ISO message (which these are) the password, and all information in the message, is dependent on the data elements present in the message which are in turn are indicated by the bitmap. The bitmap is also part of the message. So in my case, I need to script additional logic above and beyond the answers provided to come full circle.
2) This is what I know and these are the steps I hope to accomplish with the script.
What I know:
- There are 3 different msg types that contain passwords. I've figured out where the starting position of the password is for each msg type based on the bitmap and the data elements present.
For example 0210 contains one in this case:
year-day-month 00:00:00,computername,source, info,rx 0210 22222222222222333333333333333333444444444444444444444444455555008PASSWORD6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff
What I need to do:
Pass the log file to the script
For each line in the log identify if the line has a msg type that contains a password
If the message type contains a password then determine length of password by reading the preceding 3 digits to the password ("ans ...999" which means alphanumeric - special with length max of 999 and 3 digit length info). Lets say the character position of the password would be 107 in this case for arguments sake, so we know to read the 3 numbers before it.
Starting at the character position of the password, mask the number of characters required with XXX. Loop through log until complete.

It does seem as though you're indicating the position of the password and the length of the password will vary. As long as you have the '008' and something like '666' to indicate a starting and stopping point something like this should work.
$filePath = '.\YourFile.log'
(Get-Content $filePath) | ForEach-Object {
$startIndex = $_.IndexOf('008') + 3
$endIndex = $_.IndexOf('666', $startIndex)
$passwordLength = $endIndex - $startIndex
$passwordToReplace = $_.Substring($startIndex,$passwordLength)
$obfuscation = New-Object 'string' -ArgumentList 'X', $passwordLength
$_.Replace($passwordToReplace, $obfuscation)
} | Set-Content $filePath
If the file is too large to load into memory then you will have to StreamReader and StreamWriter to write the content to a new file and delete the old.

Assuming a fixed position where the password-length field starts, based on your sample line (if that position is variable, as you've hinted at, you need to tell us more):
$line = '22222222222222333333333333333333444444444444444444444444455555008PASSWORD6666666666666666677777777777777777777777ccccdddddddddddffffffffffffff'
$posStart = 62 # fixed 0-based pos. where length-of-password field stats
$pwLenFieldLen = 3 # length of length-of-password field
$pwLen = [int] $line.SubString($posStart, $pwLenFieldLen) # extract password length
$pwSubstitute = 'X' * $pwLen # determine the password replacement string
# replace the password with all Xs
$line -replace "(?<=^.{$($posStart + $pwLenFieldLen)}).{$pwLen}(?=.*)", $pwSubstitute
Note: This is not the most efficient way to do it, but it is concise.

Related

Partial String Replacement using PowerShell

Problem
I am working on a script that has a user provide a specific IP address and I want to mask this IP in some fashion so that it isn't stored in the logs. My problem is, that I can easily do this when I know what the first three values of the IP typically are; however, I want to avoid storing/hard coding those values into the code to if at all possible. I also want to be able to replace the values even if the first three are unknown to me.
Examples:
10.11.12.50 would display as XX.XX.XX.50
10.12.11.23 would also display as XX.XX.XX.23
I have looked up partial string replacements, but none of the questions or problems that I found came close to doing this. I have tried doing things like:
# This ended up replacing all of the numbers
$tempString = $str -replace '[0-9]', 'X'
I know that I am partway there, but I aiming to only replace only the first 3 sets of digits so, basically every digit that is before a '.', but I haven't been able to achieve this.
Question
Is what I'm trying to do possible to achieve with PowerShell? Is there a best practice way of achieving this?

Here's an example of how you can accomplish this:
Get-Content 'File.txt' |
ForEach-Object { $_ = $_ -replace '\d{1,3}\.\d{1,3}\.\d{1,3}','xx.xx.xx' }
This example matches a digit 1-3 times, a literal period, and continues that pattern so it'll capture anything from 0-999.0-999.0-999 and replace with xx.xx.xx

TheIncorrigible1's helpful answer is an exact way of solving the problem (replacement only happens if 3 consecutive .-separated groups of 1-3 digits are matched.)
A looser, but shorter solution that replaces everything but the last .-prefixed digit group:
PS> '10.11.12.50' -replace '.+(?=\.\d+$)', 'XX.XX.XX'
XX.XX.XX.50
(?=\.\d+$) is a (positive) lookahead assertion ((?=...)) that matches the enclosed subexpression (a literal . followed by 1 or more digits (\d) at the end of the string ($)), but doesn't capture it as part of the overall match.
The net effect is that only what .+ captured - everything before the lookahead assertion's match - is replaced with 'XX.XX.XX'.
Applied to the above example input string, 10.11.12.50:
(?=\.\d+$) matches the .-prefixed digit group at the end, .50.
.+ matches everything before .50, which is 10.11.12.
Since the (?=...) part isn't captured, it is therefore not included in what is replaced, so it is only substring 10.11.12 that is replaced, namely with XX.XX.XX, yielding XX.XX.XX.50 as a result.

How to take a substring with the endpoint being a carriage return and/or line feed?

How do I take a substring where I don't know the length of the thing I want, but I know that the end of it is a CR/LF?
I'm communicating with a server trying to extract some information. The start point of the substring is well defined, but the end point can be variable. In other scripting languages, I'd expect there to be a find() command, but I haven't found one in PowerShell yet. Most articles and SE questions refer to Get-Content, substring, and Select-String, with the intent to replace a CRLF rather than just find it.
The device I am communicating with has a telnet-like command structure. It starts out with it's model as a prompt. You can give it commands and it responds. I'm trying to grab the hostname from it. This is what a prompt, command, and response look like in a terminal:
TSS-752>hostname
Host Name: ThisIsMyHostname
TSS-752>
I want to extract the hostname. I came across IndexOf(), which seems to work like the find command I am looking for. ":" is a good start point, and then I want to truncate it to the next CRLF.
NOTE: I have made my code work to my satisfaction, but in the interest of not receiving anymore downvotes (3 at the time of this writing) or getting banned again, I will not post the solution, nor delete the question. Those are taboo here. Taking into account the requests for more info from the comments has only earned me downvotes, so I think I'm just stuck in the SO-Catch-22.

You could probably have found the first 20 examples in c# outlining this exact same approach, but here goes with PowerShell examples
If you want to find the index at which CR/LF occurs, use String.IndexOf():
PS C:\> " `r`n".IndexOf("`r`n")
2
Use it to calculate the length parameter argument for String.Substring():
$String = " This phrase starts at index 4 ends at some point`r`nand then there's more"
# Define the start index
$Offset = 4
# Find the index of the end marker
$CRLFIndex = $string.IndexOf("`r`n")
# Check that the end marker was actually found
if($CRLFIndex -eq -1){
throw "CRLF not found in string"
}
# Calculate length based on end marker index - start index
$Length = $CRLFIndex - $Offset
# Generate substring
$Substring = $String.Substring($Offset,$Length)

how to remove # character from national data type in cobol

i am facing issue while converting unicode data into national characters.
When i convert the Unicode data into national using national-of function, some junk character like # is appended after the string.
E.g
Ws-unicode pic X(200)
Ws-national pic N(600)
--let the value in Ws-Unicode is これらの変更は. getting from java end.
move function national-of ( Ws-unicode ,1208 ) to Ws-national.
--after converting value is like これらの変更は #.
i do not want the extra # character added after conversion.
please help me to find out the possible solution, i have tried to replace N'#' with space using inspect clause.
it worked well but failed in some specific scenario like if we have # in input from user end. in that case genuine # also converted to space.

Below is a snippet of code I used to convert EBCDIC to UTF. Before I was capturing string lengths, I was also getting # symbols:
STRING
FUNCTION DISPLAY-OF (
FUNCTION NATIONAL-OF (
WS-EBCDIC-STRING(1:WS-XML-EBCDIC-LENGTH)
WS-EBCDIC-CCSID
)
WS-UTF8-CCSID
)
DELIMITED BY SIZE
INTO WS-UTF8-STRING
WITH POINTER WS-XML-UTF8-LENGTH
END-STRING
SUBTRACT 1 FROM WS-XML-UTF8-LENGTH
What this code does is string the UTF8 representation of the EBCIDIC string into another variable. The WITH POINTER clause will capture the new length of the string + 1 (+ 1 because the pointer is positioned to the next position after the string ended).
Using this method, you should be able to know exactly how long second string is and use that string with the exact length.
That should remove the unwanted #s.
EDIT:
One thing I forgot to mention, in my case, the # signs were actually EBCDIC low values when viewing the actual hex on the mainframe

Use inspect with reverse and stop after first occurence of #

The torrent info_hash parameter

How does one calculate the info_hash parameter? Aka the hash corresponding to the info dictionar??
From official specs:
info_hash
The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. Note that this is a substring of the metainfo file.
This value will almost certainly have to be escaped.
Does this mean simply get the substring from the meta-info file and do a sha-1 hash on the reprezentative bytes??
.... because this is how i tried 12 times but without succes meaning I have compared the resulting hash with the one i should end up with..and they differ ..that + tracker response is FAILURE, unknown torrent ...or something
So how do you calculate the info_hash?

The metafile is already bencoded so I don't understand why you encode it again?
I finally got this working in Java code, here is my code:
byte metaData[]; //the raw .torrent file
int infoIdx = ?; //index of 'd' right after the "4:info" string
info_hash = SHAsum(Arrays.copyOfRange(metaData, infoIdx, metaData.length-1));
This assumes the 'info' block is the last block in the torrent file (wrong?)
Don't sort or anything like that, just use a substring of the raw torrent file.
Works for me.

bdecode the metafile. Then it's simply sha1(bencode(metadata['info']))
(i.e. bencode only the info dict again, then hash that).

Perl: pattern match a string and then print next line/lines

I am using Net::Whois::Raw to query a list of domains from a text file and then parse through this to output relevant information for each domain.
It was all going well until I hit Nominet results as the information I require is never on the same line as that which I am pattern matching.
For instance:
Name servers:
ns.mistral.co.uk 195.184.229.229
So what I need to do is pattern match for "Name servers:" and then display the next line or lines but I just can't manage it.
I have read through all of the answers on here but they either don't seem to work in my case or confuse me even further as I am a simple bear.
The code I am using is as follows:
while ($record = <DOMAINS>) {
$domaininfo = whois($record);
if ($domaininfo=~ m/Name servers:(.*?)\n/){
print "Nameserver: $1\n";
}
}
I have tried an example of Stackoverflow where
<DOMAINS>;
will take the next line but this didn't work for me and I assume it is because we have already read the contents of this into $domaininfo.
EDIT: Forgot to say thanks!
how rude.

So, the $domaininfo string contains your domain?
What you probably need is the m parameter at the end of your regular expression. This treats your string as a multilined string (which is what it is). Then, you can match on the \n character. This works for me:
my $domaininfo =<<DATA;
Name servers:
ns.mistral.co.uk 195.184.229.229
DATA
$domaininfo =~ m/Name servers:\n(\S+)\s+(\S+)/m;
print "Server name = $1\n";
print "IP Address = $2\n";
Now, I can match the \n at the end of the Name servers: line and capture the name and IP address which is on the next line.
This might have to be munged a bit to get it to work in your situation.

This is half a question and perhaps half an answer (the question's in here as I am not yet allowed to write comments...). Okay, here we go:
Name servers:
ns.mistral.co.uk 195.184.229.229
Is this what an entry in the file you're parsing looks like? What will follow immediately afterwards - more domain names and IP addresses? And will there be blank lines in between?
Anyway, I think your problem may (in part?) be related to your reading the file line by line. Once you get to the IP address line, the info about 'Name servers:' having been present will be gone. Multiline matching will not help if you're looking at your file line by line. Thus I'd recommend switching to paragraph mode:
{
local $/ = ''; # one paragraph instead of one line constitutes a record
while ($record = <DOMAINS>) {
# $record will now contain all consecutive lines that were NOT separated
# by blank lines; once there are >= 1 blank lines $record will have a
# new value
# do stuff, e.g. pattern matching
}
}
But then you said
I have tried an example of Stackoverflow where
<DOMAINS>;
will take the next line but this didn't work for me and I assume it is because we have already read the contents of this into $domaininfo.
so maybe you've already tried what I have just suggested? An alternative would be to just add another variable ($indicator or whatever) which you'll set to 1 once 'Name servers:' has been read, and as long as it's equal to 1 all following lines will be treated as containing the data you need. Whether this is feasible, however, depends on you always knowing what else your data file contains.
I hope something in here has been helpful to you. If there are any questions, please ask :)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse