I have a process in SSIS where I create three files.
Header.txt
work.txt
Trailer.txt
Then I use an Execute Process Task to call my Powershell script. I basically need to take the work.txt file and prepend the header record to it (while maintaining integrity of original values in work.txt) and then append the trailer record (which is generated with total row counts, etc.).
Currently I have:
Set-Location "H:\Documentation\Projects\CVS\StageCVS"
Clear-Content "H:\Documentation\Projects\CVS\StageCVS\CVSMemberEligibility"
Get-Content Header.txt, work.txt, Trailer.txt|out-file "H:\Documentation\Projects\CVS\StageCVS\CVSMemberEligibility" -Confirm
This is fine in testing where I only had 1000 rows, but now that I have 67,000 rows the process takes forever.
I was looking at the Add-Content cmdlet but I can't find an example where it adds the header. Can someone assist with the syntax on going to the first line in the file and then adding the content before that first line?
many thanks in advance!
Just to clarify: I would like to build off the work.txt file. This si where the majority of the data is already, so instead of rewriting it all to a new file, I think a copy would make more sense. So in theory I would create all three files. copy the work file to say workfile.txt . Prepend header to workfile, append trailer to workfile, rename workfile.
UPDATE
This seems to work for the trailer.
Set-Location "H:\Documentation\Projects\CVS\StageCVS"
#Clear-Content "H:\Documentation\Projects\CVS\StageCVS\CVSMemberEligibility"
Copy-Item work.txt workfile.txt
#Get-Content Header.txt, work.txt, Trailer.txt|out-file "H:\Documentation\Projects\CVS\StageCVS\CVSMemberEligibility"
Add-Content workfile.txt -value (get-content Trailer.txt)
UPDATE
Also tried:
Set-Location "H:\Documentation\Projects\CVS\StageCVS"
$header = "H:\Documentation\Projects\CVS\StageCVS\Header.txt"
#Clear-Content "H:\Documentation\Projects\CVS\StageCVS\CVSMemberEligibility.txt"
Copy-Item work.txt workfile.txt
#(Get-Content Header.txt, work.txt, Trailer.txt -readcount 1000)|Set-Content "H:\Documentation\Projects\CVS\StageCVS\CVSMemberEligibility"
Add-Content workfile.txt -value (get-content Trailer.txt)
.\workfile.txt = $header + (gc workfile.txt)
This is something that seems so easy but the reality is that it is not due to the underlying filesystem. You are going to need a file buffer or a temp file or if you are really brave you can look at extending the file and transposing the characters. As this guy did in C#.
Insert Text into Existing Files in C#, Without Temp Files or Memory Buffers
http://www.codeproject.com/Articles/17716/Insert-Text-into-Existing-Files-in-C-Without-Temp
So as it turns out out-file and get-content are not very performance enhanced. I found that it was taking over 5 minutes to run 5000 record result set and write/read the data.
When i researched some different performance options for Powershell I found the streamwriter .NET method. For the same process this ran in under 15 seconds.
Being that my result set in production environment would be 70-90000 records this was the approach I took.
Here is what i did:
[IO.Directory]::SetCurrentDirectory("H:\Documentation\Projects\CVS\StageCVS")
Set-Location "H:\Documentation\Projects\CVS\StageCVS"
Copy-Item ".\ELIGFINAL.txt" H:\Documentation\Projects\CVS\StageCVS\archive\ELIGFINAL(Get-Date -f yyyyMMdd).txt
Clear-Content "H:\Documentation\Projects\CVS\StageCVS\ELIGFINAL.txt"
Copy-Item work.txt workfile.txt
Add-Content workfile.txt -value (get-content Trailer.txt)
$work = ".\workfile.txt"
$output = "H:\Documentation\Projects\CVS\StageCVS\ELIGFINAL.txt"
$readerwork = [IO.File]::OpenText("H:\Documentation\Projects\CVS\StageCVS\workfile.txt")
$readerheader = [IO.File]::OpenText("H:\Documentation\Projects\CVS\StageCVS\Header.txt")
try
{
$wStream = New-Object IO.FileStream $output ,'Append','Write','Read'
$writer = New-Object System.IO.StreamWriter $wStream
#$write-host "OK"
}
finally
{
}
$writer.WriteLine($readerheader.ReadToEnd())
$writer.flush()
$writer.WriteLine($readerwork.ReadToEnd())
$readerheader.close()
$readerwork.close()
$writer.flush()
$writer.close()
Related
I have a script that renames all the files in a directory from fields/columns after importing a .CSV. My problem is that PS is renaming the files asynchronously and not synchronously. Is there a better way to accomplish get the result I want?
Current file name = 123456789.pdf
New File Name = $documentID_$fileID
I need to new file name to rename the files in order to make the script viable.
Here's my code (I'm new at this):
$csvPath = "C:\Users\dougadmin28\Desktop\Node Modify File Name App\test.csv"
$filePath = "C:\Users\dougadmin28\Desktop\Node Modify File Name App\pdfs"
$csv = Import-Csv $csvPath Select-Object -Skip 0
$files = Get-ChildItem $filePath
foreach ($item in $csv) {
foreach($file in $files) {
Rename-Item $file.fullname -NewName "$($item.DocumentID +"_"+ ($item.FileID)+($file.extension))" -Verbose
}
}
you may try using workflows, which would allow you to execute tasks in parallel:
https://learn.microsoft.com/en-us/powershell/module/psworkflow/about/about_foreach-parallel?view=powershell-5.1
Have in mind that PowerShell Workflows, have some limitations:
https://devblogs.microsoft.com/scripting/powershell-workflows-restrictions/
Hope it helps!
I thought synchronous meant sequentially as in 'one after the other', Which is what your script is doing now.
If you mean to say 'in parallel' as in Asynchronously or independent of each other, you can look at using
Background Jobs. They include Start-Job, wait-job and
receive-job. Easiest to work with but not efficient in terms to performance. It is also available in some cmdlets as an -AsJob switch.
PowerShell Runspaces. [Most efficient but hard to code for]
PowerShell Workflows [Balanced but has limitations]
I have a simple PowerShell script that replaces "false" or "true" with "0" or "1":
$InputFolder = $args[0];
if($InputFolder.Length -lt 3)
{
Write-Host "Enter a path name as your first argument" -foregroundcolor Red
return
}
if(-not (Test-Path $InputFolder)) {
Write-Host "File path does not appear to be valid" -foregroundcolor Red
return
}
Get-ChildItem $InputFolder
$content = [System.IO.File]::ReadAllText($InputFolder).Replace("`"false`"", "`"0`"").Replace("`"true`"", "`"1`"").Replace("`"FALSE`"", "`"0`"").Replace("`"TRUE`"", "`"1`"")
[System.IO.File]::WriteAllText($InputFolder, $content)
[GC]::Collect()
This works fine for almost all files I have to amend, with the exception of one 808MB CSV.
I have no idea how many lines are in this CSV, as nothing I have will open it properly.
Interestingly, the PowerShell script will complete successfully when invoked manually via either PowerShell directly or via command prompt.
When this is launched as part of the SSIS package it's required for, that's when the error happens.
Sample data for the file:
"RowIdentifier","DateProfileCreated","IdProfileCreatedBy","IDStaffMemberProfileRole","StaffRole","DateEmploymentStart","DateEmploymentEnd","PPAID","GPLocalCode","IDStaffMember","IDOrganisation","GmpID","RemovedData"
"134","09/07/1999 00:00","-1","98","GP Partner","09/07/1999 00:00","14/08/2009 15:29","341159","BRA 871","141","B83067","G3411591","0"
Error message thrown:
I'm not tied to PowerShell - I'm open to other options. I had a cribbed together C# script previously, but that died on small files than this - I'm no C# developer, so was unable to debug it at all.
Any suggestions or help gratefully received.
Generally, avoiding read large files all at once, as you can run out of memory, as you've experienced.
Instead, process text-based files line by line - both reading and writing.
While PowerShell generally excels at line-by-line (object-by-object) processing, it it is slow with files with many lines.
Using the .NET Framework directly - while more complex - offers much better performance.
If you process the input file line by line, you cannot directly write back to it and must instead write to a temporary output file, which you can replace the input file with on success.
Here's a solution that uses .NET types directly for performance reasons:
# Be sure to use a *full* path, because .NET typically doesn't have the same working dir. as PS.
$inFile = Convert-Path $Args[0]
$tmpOutFile = [io.path]::GetTempFileName()
$tmpOutFileWriter = [IO.File]::CreateText($tmpOutFile)
foreach ($line in [IO.File]::ReadLines($inFile)) {
$tmpOutFileWriter.WriteLine(
$line.Replace('"false"', '"0"').Replace('"true"', '"1"').Replace('"FALSE"', '"0"').Replace('"TRUE"', '"1"')
)
}
$tmpOutFileWriter.Dispose()
# Replace the input file with the temporary file.
# !! BE SURE TO MAKE A BACKUP COPY FIRST.
# -WhatIf *previews* the move operation; remove it to perform the actual move.
Move-Item -Force -LiteralPath $tmpOutFile $inFile -WhatIf
Note:
UTF-8 encoding is assumed, and the rewritten file will not have a BOM. You can change this by specifying the desired encoding to the .NET methods.
As an aside: Your chain of .Replace() calls on each input line can be simplified as follows, using PowerShell's -replace operator, which is case-insensitive, so only 2 replacements are needed:
$line -replace '"false"', '"0"' -replace '"true"', '"1"'
However, while that is shorter to write, it is actually slower than the .Replace() call chain, presumably because -replace is regex-based, which incurs extra processing.
You could read the file Per line with get-content -readcount, Out-file a temp file, then delete old file and rename-item the temp file the old files name.
Small things that would need fixing. This will add a new empty line at end of file. This will change the encoding. You could try and get the current file encoding and set the encoding on the Out-file -encoding
function Replace-LargeFilesInFolder(){
Param(
[string]$DirectoryPath,
[string]$OldString,
[string]$NewString,
[string]$TempExtention = "temp",
[int]$LinesPerRead = 500
)
Get-ChildItem $DirectoryPath -File | %{
$File = $_
Get-Content $_.FullName -ReadCount $LinesPerRead |
%{
$_ -replace $OldString, $NewString |
out-file "$($File.FullName).$($TempExtention)" -Append
}
Remove-Item $File.FullName
Rename-Item "$($File.FullName).$($TempExtention)" -NewName $($File.FullName)
}
}
Replace-LargeFilesInFolder -DirectoryPath C:\TEST -LinesPerRead 1 -OldString "a" -NewString "5"
I have a working powershell script to find and and replace a few different strings with a new string in thousands of files, without changing the modified date on the files. In any given file there could be hundreds of instances of said strings to replace. The files themselves aren't very large and probably range from 1-50MB (a quick glance at the directory I am testing with shows the largest as ~33MB).
I'm running the script inside a Server 2012 R2 VM with 4 vCPUs and 4GB of RAM. I have set the MaxMemoryPerShellMB value for Powershell to 3GB. As mentioned previously, the script works, but after 2-4 hours powershell will start throwing OutOfMemoryExceptions and crash. The script is 'V2 friendly' and I haven't adopted it to V3+ but I doubt that matters too much.
My question is whether or not the script can be improved to prevent/eliminate the memory exceptions I am running into at the moment. I don't mind if it runs slower, as long as it can get the job done without having to check back every couple of hours and restart it.
$i=0
$all = Get-ChildItem -Recurse -Include *.txt
$scriptfiles = Select-String -Pattern string1,string2,string3 $all
$output = "C:\Temp\scriptoutput.txt"
foreach ($file in $scriptFiles)
{
$filecreate=(Get-ChildItem $file.Path).creationtime
$fileaccess=(Get-ChildItem $file.Path).lastaccesstime
$filewrite=(Get-ChildItem $file.Path).lastwritetime
"$file.Path,Created: $filecreate,Accessed: $fileaccess,Modified: $filewrite" | out-file -FilePath $output -Append
(Get-Content $file.Path) | ForEach-Object {$_ -replace "string1", "newstring" `
-replace "string2", "newstring" `
-replace "string3", "newstring"
} | Set-Content $file.Path
(Get-ChildItem $file.Path).creationtime=$filecreate
(Get-ChildItem $file.Path).lastaccesstime=$fileaccess
(Get-ChildItem $file.Path).lastwritetime=$filewrite
$filecreate=(Get-ChildItem $file.Path).creationtime
$fileaccess=(Get-ChildItem $file.Path).lastaccesstime
$filewrite=(Get-ChildItem $file.Path).lastwritetime
"$file.Path,UPDATED Created: $filecreate,UPDATED Accessed: $fileaccess,UPDATED Modified: $filewrite" | out-file -FilePath $output -Append
$i++}
Any comments, criticisms, and suggestions welcomed.
Thanks
Biggest issue I can see is that you are repeatedly getting the file for every property you are querying. Replace that with one call per loop pass and save it to be used during the pass. Also Out-File is one of the slower methods of outputting data to file.
$output = "C:\Temp\scriptoutput.txt"
$scriptfiles = Get-ChildItem -Recurse -Include *.txt |
Select-String -Pattern string1,string2,string3 |
Select-Object -ExpandProperty Path
$scriptfiles | ForEach-Object{
$file = Get-Item $_
# Save currrent file times
$filecreate=$file.creationtime
$fileaccess=$file.lastaccesstime
$filewrite=$file.lastwritetime
"$file,Created: $filecreate,Accessed: $fileaccess,Modified: $filewrite"
# Update content.
(Get-Content $file) -replace "string1", "newstring" `
-replace "string2", "newstring" `
-replace "string3", "newstring" | Set-Content $file
# Write all the original times back.
$file.creationtime=$filecreate
$file.lastaccesstime=$fileaccess
$file.lastwritetime=$filewrite
# Verify the changes... Should not be required but it is what you were doing.
$filecreate=$file.creationtime
$fileaccess=$file.lastaccesstime
$filewrite=$file.lastwritetime
"$file,UPDATED Created: $filecreate,UPDATED Accessed: $fileaccess,UPDATED Modified: $filewrite"
} | Set-Content $output
Not tested but should be fine.
Depending on what you replacements are actually like you could probably save some time there as well. Test first before running in production obviously.
I remove the counter you had since it appeared nowhere in the code.
Your logging could easily be csv based since you have all the object ready to go but I just want to be sure we are one the right track before we go to far.
New to PowerShell, so kind of learning by doing.
The process I have created works, but it ends up locking down my machine until it is completed, eating up all memory. I thought I had this fixed by looking into forcing the garbage collector, and also moving from a for-each statement to using %() to loop through everything.
Quick synopsis of process: Need to merge multiple SharePoint log files into single ones to track usage across all of the companies' different SharePoint sites. PowerShell loops through all log directories on the SP server, and checks each file in the directory if it already exists on my local machine. If it does exist it appends the file text, otherwise it does a straight copy. Rinse-repeat for each file and directory on the SharePoint Log server. Between each loop, I'm forcing the GC because... Well because my basic understanding is the looped variables are held in memory, and I want to flush them. I'm probably looking at this all wrong. So here is the script in question.
$FinFiles = 'F:\Monthly Logging\Logs'
dir -path '\\SP-Log-Server\Log-Directory' | ?{$_.PSISContainer} | %{
$CurrentDir = $_
dir $CurrentDir.FullName | ?(-not $_.PSISContainer} | %{
if($_.Extension -eq ".log"){
$DestinationFile = $FinFiles + '\' + $_.Name
if((Test-Path $DestinationFile) -eq $false){
New-Item -ItemType file -path $DestinationFile -Force
Copy-Item $_.FullName $DestinationFile
}
else{
$A = Get-Content $_.FullName ; Add-Content $DestinationFile $A
Write-Host "Log File"$_.FullName"merged."
}
[GC]::Collect()
}
[GC]::Collect()
}
Granted the completed/appended log files get very very large (min 300 MB, max 1GB). Am I not closing something I should be, or keeping something open in memory? (It is currently sitting at 7.5 of my 8 Gig memory total.)
Thanks in advance.
Don't nest Get-ChildItem commands like that. Use wildcards instead. Try: dir "\\SP-Log-Server\Log-Directory\*\*.log" instead. That should improve things to start with. Then move this to a ForEach($X in $Y){} loop instead of a ForEach-Object{} loop (what you're using now). I'm betting that takes care of your problem.
So, re-written just off the top of my head:
$FinFiles = 'F:\Monthly Logging\Logs'
ForEach($LogFile in (dir -path '\\SP-Log-Server\Log-Directory\*\*.log')){
$DestinationFile = $FinFiles + '\' + $LogFile.Name
if((Test-Path $DestinationFile) -eq $false){
New-Item -ItemType file -path $DestinationFile -Force
Copy-Item $LogFile.FullName $DestinationFile
}
else{
$A = Get-Content $LogFile.FullName ; Add-Content $DestinationFile $A
Write-Host "Log File"$LogFile.FullName"merged."
}
}
}
Edit: Oh, right, Alexander Obersht may be quite right as well. You may well benefit from a StreamReader approach as well. At the very least you should use the -readcount argument to Get-Content, and there's no reason to save it as a variable, just pipe it right to the add-content cmdlet.
Get-Content $LogFile.FullName -ReadCount 5000| Add-Content $DestinationFile
To explain my answer a little more, if you use ForEach-Object in the pipeline it keeps everything in memory (regardless of your GC call). Using a ForEach loop does not do this, and should take care of your issue.
You might find this and this helpful.
In short: Add-Content, Get-Content and Out-File are convenient but notoriously slow when you need to deal with large amounts of data or I/O operations. You want to fall back to StreamReader and StreamWriter .NET classes for performance and/or memory usage optimization in cases like yours.
Code sample:
$sInFile = "infile.txt"
$sOutFile = "outfile.txt"
$oStreamReader = New-Object -TypeName System.IO.StreamReader -ArgumentList #($sInFile)
# $true sets append mode.
$oStreamWriter = New-Object -TypeName System.IO.StreamWriter -ArgumentList #($sOutFile, $true)
foreach ($sLine in $oStreamReader.ReadLine()) {
$oStreamWriter.WriteLine($sLine)
}
$oStreamReader.Close()
$oStreamWriter.Close()
I have five .sql files and know the name of each file. For this example, call them one.sql, two.sql, three.sql, four.sql and five.sql. I want to append the text of all files and create one file called master.sql. How do I do this in PowerShell? Feel free to post multiple answers to this problem because I am sure there are several ways to do this.
My attempt does not work and creates a file with several hundred thousand lines.
PS C:\sql> get-content '.\one.sql' | get-content '.\two.sql' | get-content '.\three.sql' | get-content '.\four.sql' | get-content '.\five.sql' | out-file -encoding UNICODE master.sql
Get-Content one.sql,two.sql,three.sql,four.sql,five.sql > master.sql
Note that > is equivalent to Out-File -Encoding Unicode. I only tend to use Out-File when I need to specify a different encoding.
There are some good answers here but if you have a whole lot of files and maybe you don't know all of the names this is what I came up with:
$vara = get-childitem -name "path"
$varb = foreach ($a in $vara) {gc "path\$a"}
example
$vara = get-childitem -name "c:\users\test"
$varb = foreach ($a in $vara) {gc "c:\users\test\$a"}
You can obviously pipe this directly into | add-content or whatever but I like to capture in variables so I can manipulate later on.
See if this works better
get-childitem "one.sql","two.sql","three.sql","four.sql","five.sql" | get-content | out-file -encoding UNICODE master.sql
I needed something similar, Chris Berry's post helped, but I think this is more efficient:
gci -name "*PathToFiles*" | gc > master.sql
The first part gci -name "*PathToFiles*" gets you your file list. This can be done with wildcards to just get your .sql files i.e. gci -name "\\share\folder\*.sql"
Then pipes to Get-Content and redirects the output to your master.sql file. As noted by Kieth Hill, you can use Out-File in place of > to better control your output if needed.
I think logical way of solving this is to use Add-Content
$files = Get-ChildItem '.\one.sql', '.\two.sql', '.\three.sql', '.\four.sql', '.\five.sql'
$files | foreach { Get-Content $_ | Add-Content '.\master.sql' -encoding UNICODE }
hovewer Get-Content is usually very slow when reading multiple very large files. If its your case this article could help: http://keithhill.spaces.live.com/blog/cns!5A8D2641E0963A97!756.entry
What about:
Get-Content .\one.sql,.\two.sql,.\three.sql,.\four.sql,.\five.sql | Set-Content .\master.sql
Here is how I do concatenate sql files from the Sql folder:
# Set the current location of the script to use relative path
Set-Location $PSScriptRoot
# Concatenate all the sql files
$concatSql = Get-Content -Path .\Sql\*.sql
# Write/overwrite sql to single file
Add-Content -Path concatFile.sql -Value $concatSql