Replace each occurrence of string in a file dynamically - powershell

I have some text file which has some occurrences of the string "bad" in it. I want to replace each occurrence of "bad" with good1, good2, good3, ,, good100 and so on.
I am trying this but it is replacing all occurrences with the last number, good100
$raw = $(gc raw.txt)
for($i = 0; $i -le 100; $i++)
{
$raw | %{$_ -replace "bad", "good$($i)" } > output.txt
}
How to accomplish this?

Try this:
$i = 1
$raw = $(gc raw.txt)
$new = $raw.split(" ") | % { $_ -replace "bad" , "good($i)" ; if ($_ -eq "bad" ) {$i++} }
$new -join " " | out-file output.txt
This is good if the raw.txt is single line and contains the word "bad" always separed by one space " " like this: alfa bad beta bad gamma bad (and so on...)
Edit after comment:
for multiline txt:
$i = 1
$new = #()
$raw = $(gc raw.txt)
for( $c = 0 ; $c -lt $raw.length ; $c++ )
{
$l = $raw[$c].split(" ") | % { $_ -replace "bad" , "good($i)" ; if ($_ -eq "bad" ) {$i++} }
$l = $l -join " "
$new += $l
}
$new | out-file output.txt

For such things, I generally use Regex::Replace overload that takes a Matchevaluator:
$evaluator ={
$count++
"good$count"
}
gc raw.txt | %{ [Regex]::Replace($_,"bad",$evaluator) }
The evaluator also gets the matched groups as argument, so you can do some advanced replaces with it.

Here's another way, replacing just one match at a time:
$raw = gc raw.txt | out-string
$occurrences=[regex]::matches($raw,'bad')
$regex = [regex]'bad'
for($i=0; $i -le $occurrences.count; $i++)
{
$raw = $regex.replace($raw,{"good$i"},1)
}
$raw

Related

How to remove the last comma in powershell?

Below is my script and I want to pass the value if not null to my $weekDays variable in comma separated format but I want to remove the last comma, so please help me on this.
$a = "sun"
$b = "mon"
$c = $null
$d = $a,$b,$c
$weekDays = $null
Foreach ($i in $d)
{
if ($i)
{
$weekDays = $i
$weekDays = $weekDays + ","
Write-Host "$weekDays"
}
}
Output: sun,mon,
I want: sun, mon
No need to loop through the list yourself since there's already the -join operator for that purpose, but you need to remove the null elements first
($d | Where-Object { $_ -ne $null }) -join ", "
Where-Object (alias where) will filter out the null elements.
If you just want to exclude the last null item then use this
$d[0..($d.Length - 2)] -join ", "
Note that your code produces the below output
sun,
mon,
and not sun,mon, in the same line. To print with new lines like that you need to use
($d | where { $_ -ne $null }) -join ",`n"
$a = "sun"
$b = "mon"
$c = $null
$d = $a,$b,$c
$weekDays = $null
Foreach ($i in $d)
{
if ($i)
{
if (-not ([string]::IsNullOrEmpty($weekDays)))
{
$weekDays += ","
}
$weekDays = $weekDays + $i
}
}
Write-Host "$weekDays"
##Output : sun,mon, I want : sun, mon
Your variable is null at start and you want to prepend a comma in all subsequent cases, that is, when the variable is no longer null.

Remove starting content from each line of file

I need to remove the time stamp details from each line which is present at the start. How can I achieve?
I tried to use
regex = "[0-9]{1,2}/[0-9]{1,2}/[0-9]{1,4} [0-9]{1,2}:[0-9]{1,2}:[0-9]
{1,2}):[0-9]{1,3}"
and my approach was like
$Check = (Get-Content -Path .\file.txt|Select-Object -last 3|Out-String)
$Check = $Check -replace('$regex','')
The lines in text file.txt would be like :
[06/13/19 08:52:58] The new world
[06/13/19 08:52:58] Computing
[06/13/19 08:52:58] Technology
and alternate method would be to use the -split string operator and split on ] [that is close-bracket & space]. something like this ...
('[06/13/19 08:52:58] The new world' -split '] ')[1].Trim()
output = The new world
if it is a static stamp just use substring.
$start = Get-Content -Path C:\Windows\Panther\setupact.log -First 1
$start.Substring(11,8)
This for example only extracs the timestamp.
in your case:
$start = "[06/13/19 08:52:58] Technology"
$start.Substring(20)
this removes it.
Hope it helps! BR
Edit (see comment):
$content = Get-Content .\Desktop\test.txt
$newLine = ""
foreach($line in $content){
if($line.IndexOf("[") -eq 0){
$newLine += $line.Substring(20) + " "
$prev = $true
}
else{
if($line -ne ""){
Write-Host $newLine
$newLine = ""
}
write-host $line
$prev = $false
}
}

Keep lines from line X then delete others if does not contain pattern

I am trying to manipulate a textfile. I want it to keep the first X numbers of lines and after that it should look for a string pattern. If a line contains the pattern it should be kept otherwise deleted.
I got both things to work separately but not together. It works to keep lines until X and remove the rest. And I got it to work to remove all lines except for lines with a pattern, but I can't get it to work for both together.
EDIT: here is the code:
$data = Get-Content test.md
$newdata = ""
$n = 0
Foreach ($line in $data) {
if ($n++ -ge 6) {
$newdata += $line | Where{$_ -match '\[R\]'}
} else {
$newdata += $line
}
$newdata += " `r`n"
}
$newdata > test2.md
The problem is the lines are still there as empty lines. But they should be completely deleted.
$data = Get-Content test.md
$newdata = ""
$n = 0
Foreach ($line in $data) {
if ($n++ -gt 6) {
if ($line -match '\[R\]') {
$newdata += $line + " `r`n"
}
} else {
$newdata += $line + " `r`n"
}
}
$newdata > test2.md
got it to work like that.
You could use
"test.md" | % {
Get-Content $_ -TotalCount 6
(Select-String -path $_ -match '\[R \]' -AllMatches).Line
} | Out-File test2.md -Encoding Ascii

Transpose rows to columns in PowerShell

I have a source file with the below contents:
0
ABC
1
181.12
2
05/07/16
4
Im4thData
5
hello
-1
0
XYZ
1
1333.21
2
02/02/16
3
Im3rdData
5
world
-1
...
The '-1' in above lists is the record separator which indicates the start of the next record. 0,1,2,3,4,5 etc are like column identifiers (or column names).
This is my code below.
$txt = Get-Content 'C:myfile.txt' | Out-String
$txt -split '(?m)^-1\r?\n' | ForEach-Object {
$arr = $_ -split '\r?\n'
$indexes = 1..$($arr.Count - 1) | Where-Object { ($_ % 2) -ne 0 }
$arr[$indexes] -join '|'
}
The above code creates output like below:
ABC|181.12|05/07/16|Im4thData|hello
XYZ|1333.21|02/02/16|Im3rdData|World
...
But I need output like below. When there are no columns in the source file, then their row data should have blank pipe line (||) like below in the output file. Please advise the change needed in the code.
ABC|181.12|05/07/16||Im4thData|hello ← There is no 3rd column in the source file. so blank pipe line (||).
XYZ|1333.21|02/02/16|Im3rdData||World ← There is no 4th column column in the source file. so blank pipe line (||).
...
If you know the maximum number of columns beforehand you could do something like this:
$cols = 6
$txt = Get-Content 'C:myfile.txt' | Out-String
$txt -split '(?m)^-1\r?\n' | ForEach-Object {
# initialize array of required size
$row = ,$null * $cols
$arr = $_ -split '\r?\n'
for ($n = 0; $n -lt $arr.Count; $n += 2) {
$i = [int]$arr[$n]
$row[$i] = $arr[$n+1]
}
$row -join '|'
}
Otherwise you could do something like this:
$txt = Get-Content 'C:myfile.txt' | Out-String
$txt -split '(?m)^-1\r?\n' | ForEach-Object {
# create empty array
$row = #()
$arr = $_ -split '\r?\n'
$k = 0
for ($n = 0; $n -lt $arr.Count; $n += 2) {
$i = [int]$arr[$n]
# if index from record ($i) is greater than current index ($k) append
# required number of empty fields
for ($j = $k; $j -lt $i-1; $j++) { $row += $null }
$row += $arr[$n+1]
$k = $i
}
$row -join '|'
}
Needs quite a bit of processing. There might be a more efficient way to do this, but the below does work.
$c = Get-Content ".\file.txt"
$rdata = #{}
$data = #()
$i = 0
# Parse the file into an array of key-value pairs
while ($i -lt $c.count) {
if($c[$i].trim() -eq '-1') {
$data += ,$rdata
$rdata = #{}
$i++
continue
}
$field = $c[$i].trim()
$value = $c[++$i].trim()
$rdata[$field] = $value
$i++
}
# Check if there are any missing values between 0 and the highest value and set to empty string if so
foreach ($row in $data) {
$top = [int]$($row.GetEnumerator() | Sort-Object Name -descending | select -First 1 -ExpandProperty Name)
for($i = 0; $i -lt $top; $i++) {
if ($row["$i"] -eq $null) {
$row["$i"] = ""
}
}
}
# Sort each hash by field order and join with pipe
$data | ForEach-Object { ($_.GetEnumerator() | Sort-Object -property Name | Select-Object -ExpandProperty Value) -join '|' }
In the while loop, we are just iterating over each line of the file. The field number an value are separated by a value of one, so each iteration we take both values and add them to the hash.
If we encounter -1 then we know we have a record separator, so add the hash to an array, reset it, bump the counter to the next record and continue to the next iteration.
Once we've collected everything we need to check if there are any missing field values, so we grab the highest number from each hash, loop over it from 0 and fill any missing values with an empty string.
Once that is done you can then iterate the array, sort each hash by field number and join the values.

Find replace using PowerShell get-content

I am attempting to mask SSN numbers with Random SSNs in a large text file. The file is 400M or .4 gigs.
There are 17,000 instances of SSNs that i want to find and replace.
Here is an example of the powershell script I am using.
(get-content C:\TrainingFile\TrainingFile.txt) | foreach-object {$_ -replace "123-45-6789", "666-66-6666"} | set-content C:\TrainingFile\TrainingFile.txt
My problem is that that i have 17,000 lines of this code to that I have in a .ps1 file. The ps1 file looks similar to
(get-content C:\TrainingFile\TrainingFile.txt) | foreach-object {$_ -replace "123-45-6789", "666-66-6666"} | set-content C:\TrainingFile\TrainingFile.txt
(get-content C:\TrainingFile\TrainingFile.txt) | foreach-object {$_ -replace "122-45-6789", "666-66-6668"} | set-content C:\TrainingFile\TrainingFile.txt
(get-content C:\TrainingFile\TrainingFile.txt) | foreach-object {$_ -replace "223-45-6789", "666-66-6667"} | set-content C:\TrainingFile\TrainingFile.txt
(get-content C:\TrainingFile\TrainingFile.txt) | foreach-object {$_ -replace "123-44-6789", "666-66-6669"} | set-content C:\TrainingFile\TrainingFile.txt
For 17,000 powershell commands in the .ps1 file. One command per line.
I did a test on just one command and it took about 15 secoonds to execute. Doing the math, 170000 X 15 seconds comes out to about 3 days to run my .ps1 script of 17,000 commands.
Is there a faster way to do this?
The reason for poor performance is that a lot of extra work is being done. Let's look the process as a pseudoalgorithm like so,
select SSN (X) and masked SSN (X') from a list
read all rows from file
look each file row for string X
if found, replace with X'
save all rows to file
loop until all SSNs are processed
So what's the problem? It is that for each SSN replacement, you process all the rows. Not only those that do need masking but those that don't. That's a lot of extra work. If you got, say 100 rows and 10 replacements, you are going to use 1000 steps when only 100 are needed. In addition, reading and saving file creates disk IO. Whlist that's not often an issue for single operation, multiply the IO cost with loop count and you'll find quite large a time wasted for disk waits.
For great performance, tune the algorithm like so,
read all rows from file
loop through rows
for current row, change X -> X'
save the result
Why should this be faster? 1) You read and save the file once. Disk IO is slow. 2) You process each row only once, so extra work is not being done. As how to actually perform the X -> X' transform, you got to define more carefully what the masking rule is.
Edit
Here's more practical an resolution:
Since you already know the f(X) -> X' results, you should have a pre-calculated list saved to disk like so,
ssn, mask
"123-45-6789", "666-66-6666"
...
"223-45-6789", "666-66-6667"
Import the file into a hash table and work forward by stealing all the juicy bits from Ansgar's answer like so,
$ssnMask = #{}
$ssn = import-csv "c:\temp\SSNMasks.csv" -delimiter ","
# Add X -> X' to hashtable
$ssn | % {
if(-not $ssnMask.ContainsKey($_.ssn)) {
# It's an error to add existing key, so check first
$ssnMask.Add($_.ssn, $_.mask)
}
}
$dataToMask = get-content "c:\temp\training.txt"
$dataToMask | % {
if ( $_ -match '(\d{3}-\d{2}-\d{4})' ) {
# Replace SSN look-a-like with value from hashtable
# NB: This simply removes SSNs that don't have a match in hashtable
$_ -replace $matches[1], $ssnMask[$matches[1]]
}
} | set-content "c:\temp\training2.txt"
Avoid reading and writing the file multiple times. I/O is expensive and is what slows your script down. Try something like this:
$filename = 'C:\TrainingFile\TrainingFile.txt'
$ssnMap = #{}
(Get-Content $filename) | % {
if ( $_ -match '(\d{3}-\d{2}-\d{4})' ) {
# If SSN is found, check if a mapping of that SSN to a random SSN exists.
# Otherwise create a new mapping.
if ( -not $ssnMap.ContainsKey($matches[1]) ) {
do {
$rnd = Get-Random -Min 100000 -Max 999999
$newSSN = "666-$($rnd -replace '(..)(....)','$1-$2')"
} while ( $ssnMap.ContainsValue($newSSN) ) # loop to avoid collisions
$ssnMap[$matches[1]] = $newSSN
}
# Replace the SSN with the corresponding randomly generated SSN.
$_ -replace $matches[1], $ssnMap[$matches[1]]
} else {
# If no SSN is found, simply print the line.
$_
}
} | Set-Content $filename
If you already have a list of random SSNs and also have them mapped to specific "real" SSNs, you could read those mappings from a CSV (example column titles: realSSN, randomSSN) into the $ssnMap hashtable:
$ssnMap = #{}
Import-Csv 'C:\mappings.csv' | % { $ssnMap[$_.realSSN] = $_.randomSSN }
If you've already generated a list of random SSNs for replacement, and each SSN in the file just needs to be replaced with one of them (not necessarily mapped to a specific replacement string), thing I think this will be much faster:
$inputfile = 'C:\TrainingFile\TrainingFile.txt'
$outputfile = 'C:\TrainingFile\NewTrainingFile.txt'
$replacements = Get-Content 'C:\TrainingFile\SSN_Replacements.txt'
$i=0
Filter Replace-SSN { $_ -replace '\d{3}-\d{2}-\d{4}',$replacements[$i++] }
Get-Content $inputfile |
Replace-SSN |
Set-Content $outputfile
This will walk through your list of replacement SSNs, selecting the next one in the list for each new replacement.
Edit:
Here's a solution for mapping specific SSNs to specific replacement strings. It assumes you have a CSV file of the original SSNs and their intended replacement strings, as columns 'OldSSN' and 'NewSSN':
$inputfile = 'C:\TrainingFile\TrainingFile.txt'
$outputfile = 'C:\TrainingFile\NewTrainingFile.txt'
$replacementfile = 'C:\TrainingFile\SSN_Replacements.csv'
$SSNmatch = [regex]'\d{3}-\d{2}-\d{4}'
$replacements = #{}
Import-Csv $replacementfile |
ForEach-Object { $replacements[$_.OldSSN] = $_.NewSSN }
Get-Content $inputfile -ReadCount 1000|
ForEach-Object {
foreach ($Line in $_){
if ( $Line -match $SSNmatch ) #Found SSN in line
{ if ( $replacements.ContainsKey($matches[0]) ) #Found replacement string for this SSN
{ $Line -replace $SSNmatch,$replacements[$matches[0]] } #Replace SSN and ouput line
else {Write-Warning "Warning - no replacement string found for $($matches[0])"
}
}
else { $Line } #No SSN in this line - output line as-is
}
} | Set-Content $outputfile
# Fairly fast PowerShell code for masking up to 1000 SSN number per line in a large text file (with unlimited # of lines in the file) where the SSN matches the pattern of " ###-##-#### ", " ##-####### ", or " ######### ".
# This code can handle a 14 MB text file that has SSN numbers in nearly every row within about 4 minutes.
# $inputFilename = 'C:/InputFile.txt'
$inputFileName = "
1
0550 125665 338066
- 02 CR05635 07/06/16
0 SAMPLE CUSTOMER NAME
PO BOX 12345
ROSEVILLE CA 12345-9109
EMPLOYEE DEFERRALS
FREDDIE MAC RO 16 9385456 164-44-9120 XXX
SALLY MAE RO 95 9385356 07-4719130 XXX
FRED FLINTSTONE RO 95 1185456 061741130 XXX
WILMA FLINTSTONE RO 91 9235456 364-74-9130 123456789 123456389 987354321 XXX
PEBBLES RUBBLE RO 10 9235456 06-3749130 064-74-9150 034-74-9130 XXX
BARNEY RUBBLE RO 11 9235456 06-3449130 06-3749140 063-74-9130 XXX
BETTY RUBBLE RO 16 9235456 9-74-9140 123456789 123456789 987654321 XXX
PLEASE ENTER BELOW ANY ADDITIONAL PARTICIPANTS FOR WHOM YOU ARE
REMITTING. FOR GENERAL INFORMATION AND SERVICE CALL
"
$outputFilename = 'D:/OutFile.txt'
#(Get-Content $inputFilename ) | % {
($inputFilename ) | % {
$NewLine=$_
# Write-Host "0 new line value is ($NewLine)."
$ChangeFound='Y'
$WhileCounter=0
While (($ChangeFound -eq 'Y') -and ($WhileCounter -lt 1000))
{
$WhileCounter=$WhileCounter+1
$ChangeFound='N'
$matches = $NewLine | Select-String -pattern "[ ][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9][ |\t|\r|\n]" -AllMatches
If ($matches.length -gt 0)
{
$ChangeFound='Y'
$NewLine=''
for($i = 0; $i -lt 1; $i++){
for($k = 0; $k -lt 1; $k++){
# Write-Host "AmHere 1a `$i ($i), `$k ($k), `$NewLine ($NewLine)."
$t = $matches[$i] -replace $matches[$i].matches[$k].value, (" ###-##-" + $matches[$i].matches[$k].value.substring(8) )
$NewLine=$NewLine + $t
# Write-Host "AmHere 1b `$i ($i), `$k ($k), `$NewLine ($NewLine)."
}
}
# Write-Host "1 new line value is ($NewLine)."
}
$matches = $NewLine | Select-String -pattern "[ ][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][ |\t|\r|\n]" -AllMatches
If ($matches.length -gt 0)
{
$ChangeFound='Y'
$NewLine=''
for($i = 0; $i -lt 1; $i++){
for($k = 0; $k -lt 1; $k++){
# Write-Host "AmHere 2a `$i ($i), `$k ($k), `$NewLine ($NewLine)."
$t = $matches[$i] -replace $matches[$i].matches[$k].value, (" ##-###" + $matches[$i].matches[$k].value.substring(7) )
$NewLine=$NewLine + $t
# Write-Host "AmHere 2b `$i ($i), `$k ($k), `$NewLine ($NewLine)."
}
}
# Write-Host "2 new line value is ($NewLine)."
}
$matches = $NewLine | Select-String -pattern "[ ][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][ |\t|\r|\n]" -AllMatches
If ($matches.length -gt 0)
{
$ChangeFound='Y'
$NewLine=''
for($i = 0; $i -lt 1; $i++){
for($k = 0; $k -lt 1; $k++){
# Write-Host "AmHere 3a `$i ($i), `$k ($k), `$NewLine ($NewLine)."
$t = $matches[$i] -replace $matches[$i].matches[$k].value, (" #####" + $matches[$i].matches[$k].value.substring(6) )
$NewLine=$NewLine + $t
# Write-Host "AmHere 3b `$i ($i), `$k ($k), `$NewLine ($NewLine)."
}
}
#print the line
# Write-Host "3 new line value is ($NewLine)."
}
# Write-Host "4 new line value is ($NewLine)."
} # end of DoWhile
Write-Host "5 new line value is ($NewLine)."
$NewLine
# Replace the SSN with the corresponding randomly generated SSN.
# $_ -replace $matches[1], $ssnMap[$matches[1]]
} | Set-Content $outputFilename