I am trying to run this script on a 50GB file in Windows 2012 R2 and I would like to hopefully get the three replace statements into one pass rather than three. Also, it is important that the replaces occur in that order. Any suggestions to simplify this and make it run efficiently would be greatly appreciated!
$filePath = "D:\FileLocation\file_name.csv"
(Get-Content $filePath | out-string).Replace('"', '""') | Set-Content $filePath
(Get-Content $filePath | out-string).Replace('|~|', '"') | Set-Content $filePath
(Get-Content $filePath | out-string).Replace('|#|', ',') | Set-Content $filePath
With such a large file, I suggest you process the file line by line (or in batches) which should speed up the entire process.
You can copy the Script mentioned by True here http://community.idera.com/powershell/ask_the_experts/f/learn_powershell-12/18821/how-to-remove-specific-rows-from-csv-files-in-powershell
but instead of writing $Line straight away, performing you replaces
$sw.WriteLine($line.replace().replace().replace())
Be careful with get-content since that will try to load the entire file and becomes very slow once you are out of memory.
Also be careful if you don't have much disk space. The linked solution will make a copy of the file (with the changes) before replacing it.
You can use -replace operator
$filepath="c:\temp\text.txt"
(Get-Content $filepath) -replace 'test','1' -replace 'text','2' -replace '123','3' |Set-Content $filepath
You can combine the .replace() in the same line.
$filepath="/Users/me/Desktop/text.txt"
'test text 123' |Out-File -Path $filepath
(Get-Content $filepath|Out-String).Replace('test','1').Replace('text','2').Replace('123','3')|Set-Content $filepath
Get-Content $filepath
1 2 3
Related
I'm very new to scripting.
I have a couple of files File1.txt and File2.txt. "RemPattern" is the pattern which I'm expecting to find and remove recursively from the above files.
Is it possible to remove them with the help of any windows or powershell batch command?
I have seen Get-Content can be used to remove an entire line of the matched pattern, but it doesn't fit for my case.
(Get-Content 'File1.txt') -notmatch 'RemPattern' | Set-Content 'File1.txt'
Is it required to write a batch file to achieve this or is it possible to do it by batch commands?
You can try out the -replace instead of -nomatch.
(Get-Content 'D:\File.txt') -replace 'RemPattern' | Set-Content 'D:\File.txt'
I was assuming that you wanted to recurse through a set of files and not do them by manually typing the filenames. So you can:
Get-ChildItem F:\ -Filter File*.txt | Foreach-Object{
(Get-Content $_.FullName) | Foreach-Object {$_ -replace 'RemPattern'} | Set-Content $_.FullName
}
The filter here simply checks File*.txt which in your example will do the replacement for both File1.txt and File2.txt without havign to type out each file manually per line. You can change the filter as you please.
I have a working powershell script to find and and replace a few different strings with a new string in thousands of files, without changing the modified date on the files. In any given file there could be hundreds of instances of said strings to replace. The files themselves aren't very large and probably range from 1-50MB (a quick glance at the directory I am testing with shows the largest as ~33MB).
I'm running the script inside a Server 2012 R2 VM with 4 vCPUs and 4GB of RAM. I have set the MaxMemoryPerShellMB value for Powershell to 3GB. As mentioned previously, the script works, but after 2-4 hours powershell will start throwing OutOfMemoryExceptions and crash. The script is 'V2 friendly' and I haven't adopted it to V3+ but I doubt that matters too much.
My question is whether or not the script can be improved to prevent/eliminate the memory exceptions I am running into at the moment. I don't mind if it runs slower, as long as it can get the job done without having to check back every couple of hours and restart it.
$i=0
$all = Get-ChildItem -Recurse -Include *.txt
$scriptfiles = Select-String -Pattern string1,string2,string3 $all
$output = "C:\Temp\scriptoutput.txt"
foreach ($file in $scriptFiles)
{
$filecreate=(Get-ChildItem $file.Path).creationtime
$fileaccess=(Get-ChildItem $file.Path).lastaccesstime
$filewrite=(Get-ChildItem $file.Path).lastwritetime
"$file.Path,Created: $filecreate,Accessed: $fileaccess,Modified: $filewrite" | out-file -FilePath $output -Append
(Get-Content $file.Path) | ForEach-Object {$_ -replace "string1", "newstring" `
-replace "string2", "newstring" `
-replace "string3", "newstring"
} | Set-Content $file.Path
(Get-ChildItem $file.Path).creationtime=$filecreate
(Get-ChildItem $file.Path).lastaccesstime=$fileaccess
(Get-ChildItem $file.Path).lastwritetime=$filewrite
$filecreate=(Get-ChildItem $file.Path).creationtime
$fileaccess=(Get-ChildItem $file.Path).lastaccesstime
$filewrite=(Get-ChildItem $file.Path).lastwritetime
"$file.Path,UPDATED Created: $filecreate,UPDATED Accessed: $fileaccess,UPDATED Modified: $filewrite" | out-file -FilePath $output -Append
$i++}
Any comments, criticisms, and suggestions welcomed.
Thanks
Biggest issue I can see is that you are repeatedly getting the file for every property you are querying. Replace that with one call per loop pass and save it to be used during the pass. Also Out-File is one of the slower methods of outputting data to file.
$output = "C:\Temp\scriptoutput.txt"
$scriptfiles = Get-ChildItem -Recurse -Include *.txt |
Select-String -Pattern string1,string2,string3 |
Select-Object -ExpandProperty Path
$scriptfiles | ForEach-Object{
$file = Get-Item $_
# Save currrent file times
$filecreate=$file.creationtime
$fileaccess=$file.lastaccesstime
$filewrite=$file.lastwritetime
"$file,Created: $filecreate,Accessed: $fileaccess,Modified: $filewrite"
# Update content.
(Get-Content $file) -replace "string1", "newstring" `
-replace "string2", "newstring" `
-replace "string3", "newstring" | Set-Content $file
# Write all the original times back.
$file.creationtime=$filecreate
$file.lastaccesstime=$fileaccess
$file.lastwritetime=$filewrite
# Verify the changes... Should not be required but it is what you were doing.
$filecreate=$file.creationtime
$fileaccess=$file.lastaccesstime
$filewrite=$file.lastwritetime
"$file,UPDATED Created: $filecreate,UPDATED Accessed: $fileaccess,UPDATED Modified: $filewrite"
} | Set-Content $output
Not tested but should be fine.
Depending on what you replacements are actually like you could probably save some time there as well. Test first before running in production obviously.
I remove the counter you had since it appeared nowhere in the code.
Your logging could easily be csv based since you have all the object ready to go but I just want to be sure we are one the right track before we go to far.
I'm trying to create a 'find and replace' script for the website our company just acquired. Right now, I just want to use it to replace their address and phone number with
ours, but I'll likely need to customize it in the future to replace or update other stuffs.
So far, what I got is:
(Get-Content C:\Scripts\Test.txt) |
Foreach-Object {$_ -replace "\*", "#"} |
Set-Content C:\Scripts\Test.txt
which I got from The Scripting Guy :P
However, I need help customizing it. What I need it to do is:
Do it for all files in a directory and all sub-directories, not just one file. The website as far as I can tell is a collection of *.php files
Handle special characters that appear in some addresses, like copyrights (©) pipes (|) commas (,) and periods (.)
Here's the exact string I'm trying to replace (as it appears in the .php's):
<p>©Copyright 2012 GSS | 48009 Fremont Blvd., Fremont, CA 94538 USA</p>
Since this could be the first tool in my powershell toolbox, any explaining of what you're adding or changing would greatly help me understand what's going on.
Bonus points:
Any way to log which files were 'find-and-replace'ed?
My suggestion would be to use a ForEach loop. I don't see the need for a function in this case, just have the code in your ForEach loop. I would define a string to search for, and a string to replace with. When you perform the replace make sure that it is escaped. Something along these lines:
$TxtToFind = "<p>©Copyright 2012 GSS | 48009 Fremont Blvd., Fremont, CA 94538 USA</p>"
$UpdatedTxt = "<p>©Copyright 2014 | 1234 Somenew St., Houston, TX 77045 USA</p>"
$Logfile = "C:\Temp\FileUpdate.log"
ForEach($File in (GCI C:\WebRoot\ -Recurse)){
If($File|Select-String $TxtToFind -SimpleMatch -Quiet){
"Updating lines in $($File.FullName)" |Out-File $Logfile -append
$File|Select-String $TxtToFind -SimpleMatch -AllMatches|Select -ExpandProperty LineNumber -Unique|Out-File $Logfile -append
(GC $File.FullName) | %{$_ -replace [RegEx]::Escape($TxtToFind),$UpdatedTxt} | Set-Content $File.Fullname
}
}
You can leverage regular expression to find/replace the string you desire and the following script will iterate over all the php files within the provided folder recursively.
function ParseFile($file){
#Add logic to parse the file
Write-Host $file.FullName
}
$files = Get-ChildItem -recurse C:\Path -Filter *.php
foreach ($file in $files) {
ParseFile $file
}
I have five .sql files and know the name of each file. For this example, call them one.sql, two.sql, three.sql, four.sql and five.sql. I want to append the text of all files and create one file called master.sql. How do I do this in PowerShell? Feel free to post multiple answers to this problem because I am sure there are several ways to do this.
My attempt does not work and creates a file with several hundred thousand lines.
PS C:\sql> get-content '.\one.sql' | get-content '.\two.sql' | get-content '.\three.sql' | get-content '.\four.sql' | get-content '.\five.sql' | out-file -encoding UNICODE master.sql
Get-Content one.sql,two.sql,three.sql,four.sql,five.sql > master.sql
Note that > is equivalent to Out-File -Encoding Unicode. I only tend to use Out-File when I need to specify a different encoding.
There are some good answers here but if you have a whole lot of files and maybe you don't know all of the names this is what I came up with:
$vara = get-childitem -name "path"
$varb = foreach ($a in $vara) {gc "path\$a"}
example
$vara = get-childitem -name "c:\users\test"
$varb = foreach ($a in $vara) {gc "c:\users\test\$a"}
You can obviously pipe this directly into | add-content or whatever but I like to capture in variables so I can manipulate later on.
See if this works better
get-childitem "one.sql","two.sql","three.sql","four.sql","five.sql" | get-content | out-file -encoding UNICODE master.sql
I needed something similar, Chris Berry's post helped, but I think this is more efficient:
gci -name "*PathToFiles*" | gc > master.sql
The first part gci -name "*PathToFiles*" gets you your file list. This can be done with wildcards to just get your .sql files i.e. gci -name "\\share\folder\*.sql"
Then pipes to Get-Content and redirects the output to your master.sql file. As noted by Kieth Hill, you can use Out-File in place of > to better control your output if needed.
I think logical way of solving this is to use Add-Content
$files = Get-ChildItem '.\one.sql', '.\two.sql', '.\three.sql', '.\four.sql', '.\five.sql'
$files | foreach { Get-Content $_ | Add-Content '.\master.sql' -encoding UNICODE }
hovewer Get-Content is usually very slow when reading multiple very large files. If its your case this article could help: http://keithhill.spaces.live.com/blog/cns!5A8D2641E0963A97!756.entry
What about:
Get-Content .\one.sql,.\two.sql,.\three.sql,.\four.sql,.\five.sql | Set-Content .\master.sql
Here is how I do concatenate sql files from the Sql folder:
# Set the current location of the script to use relative path
Set-Location $PSScriptRoot
# Concatenate all the sql files
$concatSql = Get-Content -Path .\Sql\*.sql
# Write/overwrite sql to single file
Add-Content -Path concatFile.sql -Value $concatSql
I am trying to do something very simple in PowerShell.
Reading the contents of a file
Manipulation some string
Saving the modified test back to the file
function Replace {
$file = Get-Content C:\Path\File.cs
$file | foreach {$_ -replace "document.getElementById", "$"} |out-file -filepath C:\Path\File.cs
}
I have tried Set-Content as well.
I always get unauthorized exception. I can see the $file has the file content, error is coming while writing the file.
How can I fix this?
This is likely caused by the Get-Content cmdlet that gets a lock for reading and Out-File that tries to get its lock for writing. Similar question is here: Powershell: how do you read & write I/O within one pipeline?
So the solution would be:
${C:\Path\File.cs} = ${C:\Path\File.cs} | foreach {$_ -replace "document.getElementById", '$'}
${C:\Path\File.cs} = Get-Content C:\Path\File.cs | foreach {$_ -replace "document.getElementById", '$'}
$content = Get-Content C:\Path\File.cs | foreach {$_ -replace "document.getElementById", '$'}
$content | Set-Content C:\Path\File.cs
Basically you need to buffer the content of the file so that the file can be closed (Get-Content for reading) and after that the buffer should be flushed to the file (Set-Content, during that write lock will be required).
The accepted answer worked for me if I had a single file operation, but when I did multiple Set-Content or Add-Content operations on the same file, I still got the "is being used by another process" error.
In the end I had to write to a temp file, then copy the temp file to the original file:
(Get-Content C:\Path\File.cs) | foreach {$_ -replace "document.getElementById", '$'} | Set-Content C:\Path\File.cs.temp
Copy-Item C:\Path\File.cs.temp C:\Path\File.cs
Personal experience: I had the "locked file syndrome" in one of my procedures. I found it was caused by a New-Object assignment on the file. I realised that I had not issued a "Dispose()" call on the object. I rewrote the offending code to dispose of the 'New-Object' as soon as convenient and the "locked file" syndrome was resolved.
A learning event for me = always dispose of each New-Object!