Powershellv2 - remove last x characters from a string - powershell

I have a file containing a few thousand lines of text. I need to extract some data from it,
but the data I need is always 57 characters from the left, and 37 characters from the end.
The bit I need (in the middle) is of varying length.
e.g. 20141126_this_piece_of_text_needs_to_be_removed<b>this_needs_to_be_kept</b>this_also_needs_to_be_removed
So far I have got:
SELECT-STRING -path path_to_logfile.log -pattern "20141126.*<b>" |
FOREACH{$_.Line} |
FOREACH{
$_.substring(57)
}
This gets rid of the text at the start of the line, but I can't see how to get rid of the text from the end.
I tried:
$_.subString(0,-37)
$_.subString(-37)
but these didn't work
Is there a way to get rid of the last x characters?

to remove the last x chars in a text, use:
$text -replace ".{x}$"
ie
PS>$text= "this is a number 1234"
PS>$text -replace ".{5}$" #drop last 5 chars
this is a number

If I understand you correctly, you need this:
$_.substring(57,$_.length-57-37)
Though this doesn't seem to work precisely with the example you gave but it will give you the varying middle section i.e. starting at 57 chars from the start and ending 37 chars from the end

This is how to remove the last 37 characters from your string:
$_.subString(0,$_.length-37)
but arco´s answer is the preferred solution to your overall problem

Related

Powershell add text after 20 characters

I want to add text after exactly 20 characters inklusiv blanks. Does someone have a short solution with add-content or can post a link where i can read about a way to do so.
My file looks somthing like this:
/path1/path1/path1 /path2/path2/path2 /path3/path3/path3
than an application will read this pahts (not my application and i can not edit it in any way) the application will read these paths and it will read them on their position so if the second path starts 10 characters later it wont recognize it, so i can not simply replace the path or edit it easy sinc the path has not always the same lenght. Why the application reads it that way dont ask me.
So i need to add a string at start than the next string at exactly character 20 and than the next at charcter 40.
You could use the regex -replace operator to inject a new substring after 20 characters:
PS ~> $inject = "Hello Manuel! ..."
PS ~> $string = "Injected text goes: and then there's more"
PS ~> $string -replace '(?<=^.{20})',$inject
Injected text goes: Hello Manuel! ...and then there's more
The regex pattern (?<=^.{20}) describes a position in the string where exactly 20 characters occur between the start of the string and the current position, and the -replace operator then replaces the empty string at said position with the value in $inject
This did it for me
$data.PadRight(20," ") | Out-File -FilePath F:\test\path.txt -NoNewline -Append

Find NRIC from a large text file and replace NRIC first 4 characters with X

I have requirement to find NRIC field from text file and mask first 4 characters using PowerShell. So far below is my code.
$FileName = "E:\test.txt"
$patternNRIC = '[SGTGsftg]\d{7}\w'
$file= New-Object System.IO.StreamReader -Arg $FileName
while($s= $file.ReadLine())
{
$s = $s -replace $patternNRIC ,'XXXX'
Write-Host $s -BackgroundColor Magenta
}
$file.Close()
Problem with above is it replaces whole NRIC with XXX character which I don't want. I want to replace only first 4 characters while keeping rest intact.
Taking the National Registration Identity Card definition, hiding just the first 4 characters of the NRIC doesn't guaranty much privacy of the card owner (there are probably far less than 100 persons that meat the disclosed part and if you know the birthday and the fact that the first following digit is likely a 0, it is probably down to a single person that matches the information!).
Excerpt from WikiPedia:
only the last three or four digits and the letters are publicly displayed or published as the first three digits can easily give away a person's age.
Anyways:
PowerShell is case insensitive (so there is no need to list lowercase and uppercase characters)
To check for and following characters (and not replacing them) use the lookahead assertion
I recommend you to also use word boundaries (\b) to make sure that the NRIC is of a given length and not a part of another sequence
$Test = 'I have requirement to find NRIC field, like S1234567G, and mask the first 4 characters.'
$Pattern = '\b[STFG]\d{3}(?=\d{4}\w\b)'
$Test -Replace $Pattern, 'XXXX'
Result
I have requirement to find NRIC field, like XXXX4567G, and mask the first 4 characters.

replacing text in Powershell every alternate match

I have looked at this question, and it's close to what I need to do, but the text I need to replace is inconsistent.
I need to replace "`r`n with ", but only the first of the 2 adjacent lines
example: (the full file is 50k lines and up to 500 chars wide)
ID,Name,LinkedRecords
54429,Abe,
54247,Jonathan,"
63460|63461"
54249,Teresa,
54418,Cody,
58046,Joseph,
58243,David,
,Barry,"
74330"
C8876,Simon,
X_10934,David,
should become
ID,Name,LinkedRecords
54429,Abe,
54247,Jonathan,"63460|63461"
54249,Teresa,
54418,Cody,
58046,Joseph,
58243,David,
,Barry,"74330"
C8876,Simon,
X_10934,David,
I can see this will probably be useful, but I'm having a hard time getting the command to work as desired
If the `r`n characters are literal, then you can do the following:
[System.IO.File]::ReadAllText('c:\path\file.txt') -replace '(?<=,")`r`n\r?\n' |
Set-Content c:\path\file.txt
If `r`n are actual carriage return and line feed chars, then you can do the following:
[System.IO.File]::ReadAllText('c:\path\file.txt') -replace '(?<=,")\r\n' |
Set-Content c:\path\file.txt
Note if memory becomes an issue, a different approach may be needed.

Partial String Replacement using PowerShell

Problem
I am working on a script that has a user provide a specific IP address and I want to mask this IP in some fashion so that it isn't stored in the logs. My problem is, that I can easily do this when I know what the first three values of the IP typically are; however, I want to avoid storing/hard coding those values into the code to if at all possible. I also want to be able to replace the values even if the first three are unknown to me.
Examples:
10.11.12.50 would display as XX.XX.XX.50
10.12.11.23 would also display as XX.XX.XX.23
I have looked up partial string replacements, but none of the questions or problems that I found came close to doing this. I have tried doing things like:
# This ended up replacing all of the numbers
$tempString = $str -replace '[0-9]', 'X'
I know that I am partway there, but I aiming to only replace only the first 3 sets of digits so, basically every digit that is before a '.', but I haven't been able to achieve this.
Question
Is what I'm trying to do possible to achieve with PowerShell? Is there a best practice way of achieving this?
Here's an example of how you can accomplish this:
Get-Content 'File.txt' |
ForEach-Object { $_ = $_ -replace '\d{1,3}\.\d{1,3}\.\d{1,3}','xx.xx.xx' }
This example matches a digit 1-3 times, a literal period, and continues that pattern so it'll capture anything from 0-999.0-999.0-999 and replace with xx.xx.xx
TheIncorrigible1's helpful answer is an exact way of solving the problem (replacement only happens if 3 consecutive .-separated groups of 1-3 digits are matched.)
A looser, but shorter solution that replaces everything but the last .-prefixed digit group:
PS> '10.11.12.50' -replace '.+(?=\.\d+$)', 'XX.XX.XX'
XX.XX.XX.50
(?=\.\d+$) is a (positive) lookahead assertion ((?=...)) that matches the enclosed subexpression (a literal . followed by 1 or more digits (\d) at the end of the string ($)), but doesn't capture it as part of the overall match.
The net effect is that only what .+ captured - everything before the lookahead assertion's match - is replaced with 'XX.XX.XX'.
Applied to the above example input string, 10.11.12.50:
(?=\.\d+$) matches the .-prefixed digit group at the end, .50.
.+ matches everything before .50, which is 10.11.12.
Since the (?=...) part isn't captured, it is therefore not included in what is replaced, so it is only substring 10.11.12 that is replaced, namely with XX.XX.XX, yielding XX.XX.XX.50 as a result.

Powershell break a file up on character count

I have a binary file that I need to process, but it contains no line breaks in it.
The data is arranged, within the file, into 104 character blocks and then divided into its various fields by character count alone (no delimiting characters).
I'd like to firstly process the file, so that there is a line break (`n) every 104 characters, but after much web searching and a lot of disappointment, I've found nothing useful yet. (Unless I ditch PowerShell and use awk.)
Is there a Split option that understands character counts?
Not only would it allow me to create the file with nice easy to read lines of 104 chars, but it would also allow me to then split these lines into their component fields.
Can anyone help please, without *nix options?
Cheers :)
$s = get-content YourFileName | Out-String
$a = $s.ToCharArray()
$a[0..103] # will return an array of first 104 chars
You can get your string back the following way, the replace removes space char( which is what array element separators turn into)
$ns = ([string]$a[0..103]).replace(" ","")
Using the V4 Where method with Split option:
$text = 'abcdefghi'
While ($text)
{
$x,$text = ([char[]]$text).where({$_},'Split',3)
$x -join ''
}
abc
def
ghi