Powershell: modifying text - powershell

ok, I give up! why doesn't this work? Im just trying to loop through a csv file and replace any value in the nth column with some value.
$source = "C:\blah.csv"
(gc $source) | foreach{ $_.Split(',')[10] = 'something'} | sc $source

Basically, what you are trying is something like that:
$s = 'a,b,c,d,e'
$s.Split(',')[4] = 'something'
$s
That won't work, because you don't assign 'something' to any variable.
I would either read the file as csv (via Import-CSV) or (if it structure is really simple) use regexes:
$s = 'c,d,e', 'c,d,x', 'z,f,d'
$s | % { $_ -replace '(?<=([^,]*,){2}).*','something'}

Related

Changing multiple lines in a text file based on a psobject

I'm working on a script which will add some additional informations to a txt file. These informations are stored in a CSV file which looks like this (the data will differs each time the script will launch):
Number;A;B;ValueOfB
FP01340/05/20;0;1;GTU_01,GTU_03
FP01342/05/20;1;0;GTU01
The txt file looks like this (data inside will of course differ each time):
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere|||||
2|zwol|9,00|9,00|0,00
2|23|157,91|194,23|36,32
1|1|FP01341/05/20|2020-05-02|2020-05-02|2020-05-02|12,19|14,99|2,80|Some info |2222222|blabla|11-111 something||||
2|23|12,19|14,99|2,80
1|1|FP01342/05/20|2020-05-02|2020-05-02|2020-05-02|525,36|589,64|64,28|bla|222222|blba 36||62030|something||
2|5|213,93|224,63|10,70
2|8|120,34|129,97|9,63
2|23|191,09|235,04|43,95
What I need to do is to find a line which contains 'Number' and then add value 'A' and 'B' from a CSV in a form: |0|1 and then on the first line below, at the end, add 'ValueofB' in a form |AAA_01,AAA_03
So the first two lines should look like this at the end:
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere||||||0|1
2|zwol|9,00|9,00|0,00|AAA_01,AAA_03
2|23|157,91|194,23|36,32
Rest of lines should not be touched.
I made a script which uses select-string method with context to find what I need to - put that into an object and then add to previously found strings what I need to and put that in to an another object.
My script is as follws:
$csvFile = Import-Csv -Path Somepath\file.csv -Delimiter ";"
$file = "Somepath2\SomeName.txt"
$LinesToChange = #()
$script:LinesToChange = $LinesToChange
$LinesOriginal = #()
$script:LinesOriginal = $LinesOriginal
foreach ($line in $csvFile) {
Select-String -Path $file -Pattern "$($Line.number)" -Encoding default -Context 0, 1 | ForEach-Object {
$1 = $_.Line
$2 = $_.Context.PostContext
}
$ListOrg = [pscustomobject]#{
Line_org = $1
Line_GTU_org = $2
}
$LinesOriginal = $LinesOriginal + $ListOrg
$lineNew = $ListOrg.Line_org | foreach { $_ + "|$($line.A)|$($line.B)" }
$GTUNew = $ListOrg.Line_GTU_org | foreach { $_ + "|$($line.ValueofB)" }
$ListNew = [pscustomobject]#{
Line_new = $lineNew
Line_GTU_new = $GTUNew
Line_org = $ListOrg.Line_org
Line_GTU_org = $ListOrg.Line_GTU_org
}
$LinesToChange = $LinesToChange + $ListNew
}
The output is an object $LinesToChange which have original lines and lines after the change. The issue is I have no idea how to use that to change the txt file. I tried few methods and ended up with file which contains updated lines but all others are doubbled (I tried foreach) or PS is using whole RAM and couldn't finish the job :)
My latest idea is to use something like that:
(Get-Content -Path $file) | ForEach-Object {
$line = $_
$LinesToChange.GetEnumerator() | ForEach-Object {
if ($line -match "$($LinesToChange.Line_org)") {
$line = $line -replace "$($LinesToChange.Line_org)", "$($LinesToChange.Line_new)"
}
if ($line -match "$($LinesToChange.Line_GTU_org)") {
$line = $line -replace "$($LinesToChange.Line_GTU_org)", "$($LinesToChange.Line_GTU_new)"
}
}
} | Set-Content -Path Somehere\newfile.txt
It seemed promising at first, but the variable $line contains all lines and as such it can't find the match.
Also I need to be sure that the second line will be directly below the first one (it is unlikely but it can be a case that there will be two or more lines with the same data while the "number" from CSV file is unique) so preferably while changing the txt file it would be needed to find a match for a two-liner; in short:
find this two lines:
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere|||||
2|zwol|9,00|9,00|0,00
change them to:
1|1|FP01340/05/20|2020-05-02|2020-05-02|2020-05-02|166,91|203,23|36,32|nothing interesting 18|33333|63-111 somewhere||||||0|1
2|zwol|9,00|9,00|0,00|AAA_01,AAA_03
Do that for all lines in a $LinesToChange
Any help will be much appreciated!
Greetings!
Some strange text file you have there, but anyway, this should do it:
# read in the text file as string array
$txt = Get-Content -Path '<PathToTheTextFile>'
$csv = Import-Csv -Path '<PathToTheCSVFile>' -Delimiter ';'
# loop through the items (rows) in the CSV and find matching lines in the text array
foreach ($item in $csv) {
$match = $txt | Select-String -Pattern ('|{0}|' -f $item.Number) -SimpleMatch
if ($match) {
# update the matching text line (array indices count from 0, so we do -1)
$txt[$match.LineNumber -1] += ('|{0}|{1}' -f $item.A, $item.B)
# update the line following
$txt[$match.LineNumber] += ('|{0}' -f $item.ValueOfB)
}
}
# show updated text on screen
$txt
# save updated text to file
$txt | Set-Content -Path 'Somehere\newfile.txt'

How to modify contents of a pipe-delimited text file with PowerShell

I have a pipe-delimited text file. The file contains "records" of various types. I want to modify certain columns for each record type. For simplicity, let's say there are 3 record types: A, B, and C. A has 3 columns, B has 4 columns, and C has 5 columns. For example, we have:
A|stuff|more_stuff
B|123|other|x
C|something|456|stuff|more_stuff
B|78903|stuff|x
A|1|more_stuff
I want to append the prefix "P" to all desired columns. For A, the desired column is 2. For B, the desired column is 3. For C, the desired column is 4.
So, I want the output to look like:
A|Pstuff|more_stuff
B|123|Pother|x
C|something|456|Pstuff|more_stuff
B|78903|Pstuff|x
A|P1|more_stuff
I need to do this in PowerShell. The file could be very large. So, I'm thinking about going with the File-class of .NET. If it were a simple string replacement, I would do something like:
$content = [System.IO.File]::ReadAllText("H:\test_modify_contents.txt").Replace("replace_text","something_else")
[System.IO.File]::WriteAllText("H:\output_file.txt", $content)
But, it's not so simple in my particular situation. So, I'm not even sure if ReadAllText and WriteAllText is the best solution. Any ideas on how to do this?
I would ConvertFrom-Csv so you can check each line as an object. On this code, I did add a header, but mainly for code readability. The header is cut out of the output on the last line anyway:
$input = "H:\test_modify_contents.txt"
$output = "H:\output_file.txt"
$data = Get-Content -Path $input | ConvertFrom-Csv -Delimiter '|' -Header 'Column1','Column2','Column3','Column4','Column5'
$data | % {
If ($_.Column5) {
#type C:
$_.Column4 = "P$($_.Column4)"
} ElseIf ($_.Column4) {
#type B:
$_.Column3 = "P$($_.Column3)"
} Else {
#type A:
$_.Column2 = "P$($_.Column2)"
}
}
$data | Select Column1,Column2,Column3,Column4,Column5 | ConvertTo-Csv -Delimiter '|' -NoTypeInformation | Select-Object -Skip 1 | Set-Content -Path $output
It does add extra | for the type A and B lines. Output:
"A"|"Pstuff"|"more_stuff"||
"B"|"123"|"Pother"|"x"|
"C"|"something"|"456"|"Pstuff"|"more_stuff"
"B"|"78903"|"Pstuff"|"x"|
"A"|"P1"|"more_stuff"||
If your file sizes are large then reading the complete file contents at once using Import-Csv or ReadAll is probably not a good idea. I would use Get-Content cmdlet using the ReadCount property which will stream the file one row at time and then use a regex for the processing. Something like this:
Get-Content your_in_file.txt -ReadCount 1 | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Set-Content your_out_file.txt
EDIT:
This version should output faster:
$d = Get-Date
Get-Content input.txt -ReadCount 1000 | % {
$_ | % {
$_ -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'
} | Add-Content output.txt
}
(New-TimeSpan $d (Get-Date)).Milliseconds
For me this processed 50k rows in 350 milliseconds. You probably get more speed by tweaking the -ReadCount value to find the ideal amount.
Given the large input file, i would not use either ReadAllText or Get-Content.
They actually read the entire file into memory.
Consider using something along the lines of
$filename = ".\input2.csv"
$outfilename = ".\output2.csv"
function ProcessFile($inputfilename, $outputfilename)
{
$reader = [System.IO.File]::OpenText($inputfilename)
$writer = New-Object System.IO.StreamWriter $outputfilename
$record = $reader.ReadLine()
while ($record -ne $null)
{
$writer.WriteLine(($record -replace '^(A\||B\|[^\|]+\||C\|[^\|]+\|[^\|]+\|)(.*)$', '$1P$2'))
$record = $reader.ReadLine()
}
$reader.Close()
$reader.Dispose()
$writer.Close()
$writer.Dispose()
}
ProcessFile $filename $outfilename
EDIT: After testing all the suggestions on this page, i have borrowed the regex from Dave Sexton and this is the fastest implementation. Processes a 1gb+ file in 175 seconds. All other implementations are significantly slower on large input files.

Powershell. Writing out lines based on string within the file

I'm looking for a way to export all lines from within a text file where part of the line matches a certain string. The string is actually the first 4 bytes of the file and I'd like to keep the command to only checking those bytes; not the entire row. I want to write the entire row. How would I go about this?
I am using Windows only and don't have the option to use many other tools that might do this.
Thanks in advance for any help.
Do you want to perform a simple "grep"? Then try this
select-string .\test.txt -pattern "\Athat" | foreach {$_.Line}
or this (very similar regex), also writes to an outfile
select-string .\test.txt -pattern "^that" | foreach {$_.Line} | out-file -filepath out.txt
This assumes that you want to search for a 4-byte string "that" at the beginning of the string , or beginning of the line, respectively.
Something like the following Powershell function should work for you:
function Get-Lines {
[cmdletbinding()]
param(
[string]$filename,
[string]$prefix
)
if( Test-Path -Path $filename -PathType Leaf -ErrorAction SilentlyContinue ) {
# filename exists, and is a file
$lines = Get-Content $filename
foreach ( $line in $lines ) {
if ( $line -like "$prefix*" ) {
$line
}
}
}
}
To use it, assuming you save it as get-lines.ps1, you would load the function into memory with:
. .\get-lines.ps1
and then to use it, you could search for all lines starting with "DATA" with something like:
get-lines -filename C:\Files\Datafile\testfile.dat -prefix "DATA"
If you need to save it to another file for viewing later, you could do something like:
get-lines -filename C:\Files\Datafile\testfile.dat -prefix "DATA" | out-file -FilePath results.txt
Or, if I were more awake, you could ignore the script above, use a simpler solution such as the following one-liner:
get-content -path C:\Files\Datafile\testfile.dat | select-string -Pattern "^DATA"
Which just uses the ^ regex character to make sure it's only looking for "DATA" at the beginning of each line.
To get all the lines from c:\somedir\somefile.txt that begin with 'abcd' :
(get-content c:\somedir\somefile.txt) -like 'abcd*'
provided c:\somedir\somefile.txt is not an unusually large (hundreds of MB) file. For that situation:
get-content c:\somedir\somefile.txt -readcount 1000 |
foreach {$_ -like 'abcd*'}

Powershell V2 find and replace

I am trying to change dates programmatically in a file. The line I need to fix looks like this:
set ##dateto = '03/15/12'
I need to write a powershell V2 script that replaces what's inside the single quotes, and I have no idea how to do this.
The closest I've come looks like this:
gc $file | ? {$_ -match "set ##dateto ="} | % {$temp=$_.split("'");$temp[17]
=$CorrectedDate;$temp -join ","} | -outfile newfile.txt
Problems with this: It gives an error about the index 17 being out of range. Also, the outfile only contains one line (The unmodified line). I'd appreciate any help with this. Thanks!
You can do something like this ( though you may want to handle the corner cases) :
$CorrectedDate = '10/09/09'
gc $file | %{
if($_ -match "^set ##dateto = '(\d\d/\d\d/\d\d)'") {
$_ -replace $matches[1], $CorrectedDate;
}
else {
$_
}
} | out-file test2.txt
mv test2.txt $file -force

Remove Top Line of Text File with PowerShell

I am trying to just remove the first line of about 5000 text files before importing them.
I am still very new to PowerShell so not sure what to search for or how to approach this. My current concept using pseudo-code:
set-content file (get-content unless line contains amount)
However, I can't seem to figure out how to do something like contains.
While I really admire the answer from #hoge both for a very concise technique and a wrapper function to generalize it and I encourage upvotes for it, I am compelled to comment on the other two answers that use temp files (it gnaws at me like fingernails on a chalkboard!).
Assuming the file is not huge, you can force the pipeline to operate in discrete sections--thereby obviating the need for a temp file--with judicious use of parentheses:
(Get-Content $file | Select-Object -Skip 1) | Set-Content $file
... or in short form:
(gc $file | select -Skip 1) | sc $file
It is not the most efficient in the world, but this should work:
get-content $file |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $file -Force
Using variable notation, you can do it without a temporary file:
${C:\file.txt} = ${C:\file.txt} | select -skip 1
function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
if ( -not (Test-Path $path -PathType Leaf) ) {
throw "invalid filename"
}
ls $path |
% { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}
I just had to do the same task, and gc | select ... | sc took over 4 GB of RAM on my machine while reading a 1.6 GB file. It didn't finish for at least 20 minutes after reading the whole file in (as reported by Read Bytes in Process Explorer), at which point I had to kill it.
My solution was to use a more .NET approach: StreamReader + StreamWriter.
See this answer for a great answer discussing the perf: In Powershell, what's the most efficient way to split a large text file by record type?
Below is my solution. Yes, it uses a temporary file, but in my case, it didn't matter (it was a freaking huge SQL table creation and insert statements file):
PS> (measure-command{
$i = 0
$ins = New-Object System.IO.StreamReader "in/file/pa.th"
$outs = New-Object System.IO.StreamWriter "out/file/pa.th"
while( !$ins.EndOfStream ) {
$line = $ins.ReadLine();
if( $i -ne 0 ) {
$outs.WriteLine($line);
}
$i = $i+1;
}
$outs.Close();
$ins.Close();
}).TotalSeconds
It returned:
188.1224443
Inspired by AASoft's answer, I went out to improve it a bit more:
Avoid the loop variable $i and the comparison with 0 in every loop
Wrap the execution into a try..finally block to always close the files in use
Make the solution work for an arbitrary number of lines to remove from the beginning of the file
Use a variable $p to reference the current directory
These changes lead to the following code:
$p = (Get-Location).Path
(Measure-Command {
# Number of lines to skip
$skip = 1
$ins = New-Object System.IO.StreamReader ($p + "\test.log")
$outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
try {
# Skip the first N lines, but allow for fewer than N, as well
for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
$ins.ReadLine()
}
while( !$ins.EndOfStream ) {
$outs.WriteLine( $ins.ReadLine() )
}
}
finally {
$outs.Close()
$ins.Close()
}
}).TotalSeconds
The first change brought the processing time for my 60 MB file down from 5.3s to 4s. The rest of the changes is more cosmetic.
$x = get-content $file
$x[1..$x.count] | set-content $file
Just that much. Long boring explanation follows. Get-content returns an array. We can "index into" array variables, as demonstrated in this and other Scripting Guys posts.
For example, if we define an array variable like this,
$array = #("first item","second item","third item")
so $array returns
first item
second item
third item
then we can "index into" that array to retrieve only its 1st element
$array[0]
or only its 2nd
$array[1]
or a range of index values from the 2nd through the last.
$array[1..$array.count]
I just learned from a website:
Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }
Or you can use the aliases to make it short, like:
gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }
Another approach to remove the first line from file, using multiple assignment technique. Refer Link
$firstLine, $restOfDocument = Get-Content -Path $filename
$modifiedContent = $restOfDocument
$modifiedContent | Out-String | Set-Content $filename
skip` didn't work, so my workaround is
$LinesCount = $(get-content $file).Count
get-content $file |
select -Last $($LinesCount-1) |
set-content "$file-temp"
move "$file-temp" $file -Force
Following on from Michael Soren's answer.
If you want to edit all .txt files in the current directory and remove the first line from each.
Get-ChildItem (Get-Location).Path -Filter *.txt |
Foreach-Object {
(Get-Content $_.FullName | Select-Object -Skip 1) | Set-Content $_.FullName
}
For smaller files you could use this:
& C:\windows\system32\more +1 oldfile.csv > newfile.csv | out-null
... but it's not very effective at processing my example file of 16MB. It doesn't seem to terminate and release the lock on newfile.csv.