Duplicate lines in a text file multiple times based on a string and alter duplicated lines - powershell

SHORT: I am trying to duplicate lines in all files in a folder based on a certain string and then replace original strings in duplicated lines only.
Contents of the original text file (there are double quotes in the file):
"K:\FILE1.ini"
"K:\FILE1.cfg"
"K:\FILE100.cfg"
I want to duplicate the entire line 4 times only if a string ".ini" is present in a line.
After duplicating the line, I want to change the string in those duplicated lines (original line stays the same) to: for example, ".inf", ".bat", ".cmd", ".mov".
So the expected result of the script is as follows:
"K:\FILE1.ini"
"K:\FILE1.inf"
"K:\FILE1.bat"
"K:\FILE1.cmd"
"K:\FILE1.mov"
"K:\FILE1.cfg"
"K:\FILE100.cfg"
Those files are small, so using streams is not neccessary.
I am at the beginning of my PowerShell journey, but thanks to this community, I already know how to replace string in files recursively:
$directory = "K:\PS"
Get-ChildItem $directory -file -recurse -include *.txt |
ForEach-Object {
(Get-Content $_.FullName) -replace ".ini",".inf" |
Set-Content $_.FullName
}
but I have no idea how to duplicate certain lines multiple times and handle multiple string replacements in those duplicated lines.
Yet ;)
Could point me in the right direction?

To achieve this with the operator -replace you can do:
#Define strings to replace pattern with
$2replace = #('.inf','.bat','.cmd','.mov','.ini')
#Get files, use filter instead of include = faster
get-childitem -path [path] -recurse -filter '*.txt' | %{
$cFile = $_
#add new strings to array newData
$newData = #(
#Read file
get-content $_.fullname | %{
#If line matches .ini
If ($_ -match '\.ini'){
$cstring = $_
#Add new strings
$2replace | %{
#Output new strings
$cstring -replace '\.ini',$_
}
}
#output current string
Else{
$_
}
}
)
#Write to disk
$newData | set-content $cFile.fullname
}
This gives you the following output:
$newdata
"K:\FILE1.inf"
"K:\FILE1.bat"
"K:\FILE1.cmd"
"K:\FILE1.mov"
"K:\FILE1.ini"
"K:\FILE1.cfg"
"K:\FILE100.cfg"

Related

Powershell update string in each file within all sub-folders

I have a set of config files stored in each subfolder within a directory. These config files only contain a single string in the format XXX_YYYYMMDD where XXX is a number e.g. 006, 007 etc, so an example string would be 006_20150101. I want the powershell script to replace the XXX number with a new one in each of these config files. I'm using the below script to achieve that and it works fine. However, the issue is that it puts a new line character (ENTER) at the end of the string which I don't want. Any way to fix this?
$sourceDir = "C:\Users\001"
$configFiles = Get-ChildItem $sourceDir *.dat -rec
foreach ($file in $configFiles)
{
(Get-Content $file.PSPath) |
Foreach-Object { $_ -replace "006", "007" } |
Set-Content $file.PSPath
}
By default set-content ends with a newline, use -NoNewline to not have this behavior:
Set-Content -path $file.PSPath -NoNewline
I dont know if u can use this but you can use regex replace to match the first 3 digits in the string:
$regex = "^\d{3}"
# matches any 3 digits("\d{3}") at the beginning("^") of a string
"124_20201030" -replace $regex, "007"

insert blank line before matching pattern in multiple files using powershell

Requirement is to insert a blank line in multiple files before the matching pattern line
Consider a file with below contents
Apple
Tree
orange
[Fruit]
Red
Green
Expected output:
Apple
Tree
orange
[Fruit]
Red
Green
Tried below code. Help me to figure out the mistake in below code
$FileName = Get-ChildItem -Filter *.ini -Recurse
$Pattern = "\[Fruit]\"
[System.Collections.ArrayList]$file = Get-Content $FileName
$insert = #()
for ($i=0; $i -lt $file.count; $i++) {
if ($file[$i] -match $pattern) {
$insert += $i #Record the position of the line before this one
}
}
#Now loop the recorded array positions and insert the new text
$insert | Sort-Object -Descending | ForEach-Object { $file.insert($_," ") }
Set-Content $FileName $file
above code owrks fine for single file but for multiple file, the contents of the file are repeated
Re: how to make this work for multiple files...
$FileName = Get-ChildItem -Filter *.ini -Recurse
If there is only one .ini file then $FileName will be a single file.
The use of the wildcard and -Recurse switch suggests that you are expecting to find multiple files; thus this command will assign that collection of files to the $FileName variable (i.e. it will be an array).
Notice that when you call Get-Content you pass $FileName:
[System.Collections.ArrayList]$file = Get-Content $FileName
This won't work when $FileName is a collection/array of files.
What you need to do is put a loop in place that will perform your "insert a line break" logic foreach (hint hint) of the files in the array. NOW go and look at those PS tutorials again...
Regex character class
Try to take the time to learn regex properly
$Pattern = "\[Fruit\]"

Read numbers from multiple files and sum

I have a logfile C:\temp\data.log
It contains the following data:
totalSize = 222,6GB
totalSize = 4,2GB
totalSize = 56,2GB
My goal is to extract the numbers from the file and sum them up including the number after the comma. So far it works if I don't regex the number included with value after comma, and only use the number in front of the comma. The other problem I have is if the file only contains one row like below example, if it only contains one line it splits up the number 222 into three file containing the number 2 in three files. If the above logfile contains 2 lines or more it works and sums up as it should, as long I don't use value with comma.
totalSize = 222,6GB
Here is a bit of the code for the regex to add to end of existing variable $regex included with comma is:
[,](\d{1,})
I haven't included the above regex, as it does not sum up properly then.
The whole script is below:
#Create path variable to store contents grabbed from $log_file
$extracted_strings = "C:\temp\amount.txt"
#Create path variable to read from original file
$log_file = "C:\temp\data.log"
#Read data from file $log_file
Get-Content -Path $log_file | Select-String "(totalSize = )" | out-file $extracted_strings
#Create path variable to write only numbers to file $output_numbers
$output_numbers = "C:\temp\amountresult.log"
#Create path variable to write to file jobblog1
$joblog1_file = "C:\temp\joblog1.txt"
#Create path variable to write to file jobblog2
$joblog2_file = "C:\temp\joblog2.txt"
#Create path variable to write to file jobblog3
$joblog3_file = "C:\temp\joblog3.txt"
#Create path variable to write to file jobblog4
$joblog4_file = "C:\temp\joblog4.txt"
#Create path variable to write to file jobblog5
$joblog5_file = "C:\temp\joblog5.txt"
#Create pattern variable to read with select string
$regex = "[= ](\d{1,})"
select-string -Path $extracted_strings -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_numbers
(Get-Content -Path $output_numbers)[0..0] -replace '\s' > $joblog1_file
(Get-Content -Path $output_numbers)[1..1] -replace '\s' > $joblog2_file
(Get-Content -Path $output_numbers)[2..2] -replace '\s' > $joblog3_file
(Get-Content -Path $output_numbers)[3..3] -replace '\s' > $joblog4_file
(Get-Content -Path $output_numbers)[4..4] -replace '\s' > $joblog5_file
$jobdata0 = (Get-Content -Path $joblog1_file)
$jobdata1 = (Get-Content -Path $joblog2_file)
$jobdata2 = (Get-Content -Path $joblog3_file)
$jobdata3 = (Get-Content -Path $joblog4_file)
$jobdata4 = (Get-Content -Path $joblog5_file)
$result = $jobdata0 + $jobdata1 + $jobdata2 + $jobdata3 + $jobdata4
$result
So my questions is:
How can I get this to work if the file C:\temp\data.log only contains one string without dividing that single number into multiple files. It should also work if it contains multiple strings, as it is now it works with multiple strings.
And how can I include the comma values in the calculation?
The result I get if I run this script should be 282, maybe its even possible to shorten the script?
Where $log_file has contents like the example above.
Get-Content $log_file | Where-Object{$_ -match "\d+(,\d+)?"} |
ForEach-Object{[double]($matches[0] -replace ",",".")} |
Measure-Object -Sum |
Select-Object -ExpandProperty sum
Match all of the lines that have numerical values with optional commas. I am assuming they could be optional as I do not know how whole numbers appear. Replace the comma with a period and cast as a double. Using measure object we sum up all the values and expand the result.
Not the only way to do it but it is simple enough to understand what is going on.
You can always wrap the above up in a loop so that you can use it for multiple files. Get-ChildItem "C:temp\" -Filter "job*" | ForEach-Object... etc.
Matt's helpful answer shows a concise and effective solution.
As for what you tried:
As for why a line with a single token such as 222,6 can result in multiple outputs in this command:
select-string -Path $extracted_strings -Pattern $regex -AllMatches |
% { $_.Matches } | % { $_.Value } > $output_numbers
Your regex, [= ](\d{1,}), does not explain the symptom, but just \d{1,} would, because that would capture 222 and 6 separately, due to -AllMatches.
[= ](\d{1,}) probably doesn't do what you want, because [= ] matches a single character that can be either a = or a space; with your sample input, this would only ever match the space before the numbers.
To match characters in sequence, simply place them next to each other: = (\d{1,})
Also note that even though you're enclosing \d{1,} in (...) to create a capture group, your later code doesn't actually use what that capture group matched; use (...) only if you need it for precedence (in which case you can even opt out of subexpression capturing with (?:...)) or if you do have a need to access what the subexpression matched.
That said, you could actually utilize a capture group here (an alternative would be to use a look-behind assertion), which allows you to both match the leading =<space> for robustness and extract only the numeric token of interest (saving you the need to trim whitespace later).
If we simplify \d{1,} to \d+ and append ,\d+ to also match the number after the comma, we get:
= (\d+,\d+)
The [System.Text.RegularExpressions.Match] instances returned by Select-String then allow us to access what the capture group captured, via the .Groups property (the following simplified example also works with multiple input lines):
> 'totalSize = 222,6GB' | Select-String '= (\d+,\d+)' | % { $_.Matches.Groups[1].Value }
222,6
On a side note: your code contains a lot of repetition that could be eliminated with arrays and pipelines; for instance:
$joblog1_file = "C:\temp\joblog1.txt"
$joblog2_file = "C:\temp\joblog2.txt"
$joblog3_file = "C:\temp\joblog3.txt"
$joblog4_file = "C:\temp\joblog4.txt"
$joblog5_file = "C:\temp\joblog5.txt"
could be replaced with (create an array of filenames, using a pipeline):
$joblog_files = 1..5 | % { "C:\temp\joblog$_.txt" }
and
$jobdata0 = (Get-Content -Path $joblog1_file)
$jobdata1 = (Get-Content -Path $joblog2_file)
$jobdata2 = (Get-Content -Path $joblog3_file)
$jobdata3 = (Get-Content -Path $joblog4_file)
$jobdata4 = (Get-Content -Path $joblog5_file)
$result = $jobdata0 + $jobdata1 + $jobdata2 + $jobdata3 + $jobdata4
could then be replaced with (pass the array of filenames to Get-Content):
$result = Get-Content $joblog_files

Powershell - reading ahead and While

I have a text file in the following format:
.....
ENTRY,PartNumber1,,,
FIELD,IntCode,123456
...
FIELD,MFRPartNumber,ABC123,,,
...
FIELD,XPARTNUMBER,ABC123
...
FIELD,InternalPartNumber,3214567
...
ENTRY,PartNumber2,,,
...
...
the ... indicates there is other data between these fields. The ONLY thing I can be certain of is that the field starting with ENTRY is a new set of records. The rows starting with FIELD can be in any order, and not all of them may be present in each group of data.
I need to read in a chunk of data
Search for any field matching the
string ABC123
If ABC123 found, search for the existence of the
InternalPartNumber field & return that row of data.
I have not seen a way to use Get-Content that can read in a variable number of rows as a set & be able to search it.
Here is the code I currently have, which will read a file, searching for a string & replacing it with another. I hope this can be modified to be used in this case.
$ftype = "*.txt"
$fnames = gci -Path $filefolder1 -Filter $ftype -Recurse|% {$_.FullName}
$mfgPartlist = Import-Csv -Path "C:\test\mfrPartList.csv"
foreach ($file in $fnames) {
$contents = Get-Content -Path $file
foreach ($partnbr in $mfgPartlist) {
$oldString = $mfgPartlist.OldValue
$newString = $mfgPartlist.NewValue
if (Select-String -Path $file -SimpleMatch $oldString -Debug -Quiet) {
$stringData = $contents -imatch $oldString
$stringData = $stringData -replace "[\n\r]","|"
foreach ($dataline in $stringData) {
$file +"|"+$stringData+"|"+$oldString+"|"+$newString|Out-File "C:\test\Datachanges.txt" -Width 2000 -Append
}
$contents = $contents -replace $oldString $newString
Set-Content -Path $file -Value $contents
}
}
}
Is there a way to read & search a text file in "chunks" using Powershell? Or to do a Read-ahead & determine what to search?
Assuming your fine isn't too big to read into memory all at once:
$Text = Get-Content testfile.txt -Raw
($Text -split '(?ms)^(?=ENTRY)') |
foreach {
if ($_ -match '(?ms)^FIELD\S+ABC123')
{$_ -replace '(?ms).+(^Field\S+InternalPartNumber.+?$).+','$1'}
}
FIELD,InternalPartNumber,3214567
That reads the entire file in as a single multiline string, and then splits it at the beginning of any line that starts with 'ENTRY'. Then it tests each segment for a FIELD line that contains 'ABC123', and if it does, removes everything except the FIELD line for the InternalPartNumber.
This is not my best work as I have just got back from vacation. You could use a while loop reading the text and set an entry flag to gobble up the text in chunks. However if your files are not too big then you could just read up the text file at once and use regex to split up the chunks and then process accordingly.
$pattern = "ABC123"
$matchedRowToReturn = "InternalPartNumber"
$fileData = Get-Content "d:\temp\test.txt" | Where-Object{$_ -match '^(entry|field)'} | Out-String
$parts = $fileData | Select-String '(?smi)(^Entry).*?(?=^Entry|\Z)' -AllMatches | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
$parts | Where-Object{$_ -match $pattern} | Select-String "$matchedRowToReturn.*$" | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Value
What this will do is read in the text file, drop any lines that are not entry or field related, as one long string and split it up into chunks that start with lines that begin with the work "Entry".
Then we drop those "parts" that do not contain the $pattern. Of the remaining that match extract the InternalPartNumber line and present.

find and delete lines without string pattern in text files

I'm trying to find out how to use powershell to find and delete lines without certain string pattern in a set of files. For example, I have the following text file:
111111
22x222
333333
44x444
This needs to be turned into:
22x222
44x444
given that the string pattern of 'x' is not in any of the other lines.
How can I issue such a command in powershell to process a bunch of text files?
thanks.
dir | foreach { $out = cat $_ | select-string x; $out | set-content $_ }
The dir command lists the files in the current directory; the foreach goes through each file; cat reads the file and pipes into select-string; select-string finds the lines that contains the specific pattern, which in this case is "x"; the result of select-string is stored in $out; and finally, $out is written to the same file with set-content.
We need the temporary variable $out because you cannot read and write the same file at the same time.
This will process all txt files from the working directory. Each file content is checked and only lines that have 'x' in them are allowed to pass on. The result is written back to the file.
Get-ChildItem *.txt | ForEach-Object{
$content = Get-Content $_.FullName | Where-Object {$_ -match 'x'}
$content | Out-File $_.FullName
}