Add-Content infinite loop - powershell

I have an old piece of PowerShell script that uses Add-Content in a for loop to write data to a file and it works fine when run locally on a PC's C: drive.
I've now been asked to relocate the script and files to a QNAP folder share (not sure if this has anything to do with the problem).
Now when the script runs from the share, the for loop runs infinitely - you can see the file size increasing and check the row count once you break out of the program.
It doesn't seem to matter if I use UNC or drive mapping the infinite looping still occurs.
Here is the script block:
####################
# Define Variables #
####################
$Source = 'G:\Label_Spreadsheets\' + $textbox1.Text + '.csv'
$Target = 'G:\Label_Spreadsheets\labels.csv'
$EndNum = ($LastUsedNumber + $numericupdown1.Value)
######################
# Create Source File #
######################
#######################
# Add CSV Header rows #
#######################
Add-Content $Source "Stock Code,Filter#,ProductionOrder#,SalesOrder#,Compatability";
#####################
# Add specific Rows #
#####################
for ($i = $StartNum; $i -le $EndNum; $i++) {
$line = $combobox1.SelectedItem + ',' + $i + ',' + $textbox1.Text + ',' + $textbox2.Text + ','
Add-Content $Source $line
}
I wondered if it was a known provider problem (as shown at this URL) but trying these did not resolve the issue.
And yes I know it's writing a CSV - like I said: old script.

Related

Powershell to Break up CSV by Number of Row

So I am now tasked with getting constant reports that are more than 1 Million lines long.
My last question did not explain all things so I'm tryin got do a better question.
I'm getting a dozen + daily reports that are coming in as CSV files. I don't know what the headers are or anything like that as I get them.
They are huge. I cant open in excel.
I wanted to basically break them up into the same report, just each report maybe 100,000 lines long.
The code I wrote below does not work as I keep getting a
Exception of type 'System.OutOfMemoryException' was thrown.
I am guessing I need a better way to do this.
I just need this file broken down to a more manageable size.
It does not matter how long it takes as I can run it over night.
I found this on the internet, and I tried to manipulate it, but I cant get it to work.
$PSScriptRoot
write-host $PSScriptRoot
$loc = $PSScriptRoot
$location = $loc
# how many rows per CSV?
$rowsMax = 10000;
# Get all CSV under current folder
$allCSVs = Get-ChildItem "$location\Split.csv"
# Read and split all of them
$allCSVs | ForEach-Object {
Write-Host $_.Name;
$content = Import-Csv "$location\Split.csv"
$insertLocation = ($_.Name.Length - 4);
for($i=1; $i -le $content.length ;$i+=$rowsMax){
$newName = $_.Name.Insert($insertLocation, "splitted_"+$i)
$content|select -first $i|select -last $rowsMax | convertto-csv -NoTypeInformation | % { $_ -replace '"', ""} | out-file $location\$newName -fo -en ascii
}
}
The key is not to read large files into memory in full, which is what you're doing by capturing the output from Import-Csv in a variable ($content = Import-Csv "$location\Split.csv").
That said, while using a single pipeline would solve your memory problem, performance will likely be poor, because you're converting from and back to CSV, which incurs a lot of overhead.
Even reading and writing the files as text with Get-Content and Set-Content is slow, however.
Therefore, I suggest a .NET-based approach for processing the files as text, which should substantially speed up processing.
The following code demonstrates this technique:
Get-ChildItem $PSScriptRoot/*.csv | ForEach-Object {
$csvFile = $_.FullName
# Construct a file-path template for the sequentially numbered chunk
# files; e.g., "...\file_split_001.csv"
$csvFileChunkTemplate = $csvFile -replace '(.+)\.(.+)', '$1_split_{0:000}.$2'
# Set how many lines make up a chunk.
$chunkLineCount = 10000
# Read the file lazily and save every chunk of $chunkLineCount
# lines to a new file.
$i = 0; $chunkNdx = 0
foreach ($line in [IO.File]::ReadLines($csvFile)) {
if ($i -eq 0) { ++$i; $header = $line; continue } # Save header line.
if ($i++ % $chunkLineCount -eq 1) { # Create new chunk file.
# Close previous file, if any.
if (++$chunkNdx -gt 1) { $fileWriter.Dispose() }
# Construct the file path for the next chunk, by
# instantiating the template with the next sequence number.
$csvFileChunk = $csvFileChunkTemplate -f $chunkNdx
Write-Verbose "Creating chunk: $csvFileChunk"
# Create the next chunk file and write the header.
$fileWriter = [IO.File]::CreateText($csvFileChunk)
$fileWriter.WriteLine($header)
}
# Write a data row to the current chunk file.
$fileWriter.WriteLine($line)
}
$fileWriter.Dispose() # Close the last file.
}
Note that the above code creates BOM-less UTF-8 files; if your input contains ASCII-range characters only, these files will effectively be ASCII files.
Here's the equivalent single-pipeline solution, which is likely to be substantially slower.
Get-ChildItem $PSScriptRoot/*.csv | ForEach-Object {
$csvFile = $_.FullName
# Construct a file-path template for the sequentially numbered chunk
# files; e.g., ".../file_split_001.csv"
$csvFileChunkTemplate = $csvFile -replace '(.+)\.(.+)', '$1_split_{0:000}.$2'
# Set how many lines make up a chunk.
$chunkLineCount = 10000
$i = 0; $chunkNdx = 0
Get-Content -LiteralPath $csvFile | ForEach-Object {
if ($i -eq 0) { ++$i; $header = $_; return } # Save header line.
if ($i++ % $chunkLineCount -eq 1) { #
# Construct the file path for the next chunk.
$csvFileChunk = $csvFileChunkTemplate -f ++$chunkNdx
Write-Verbose "Creating chunk: $csvFileChunk"
# Create the next chunk file and write the header.
Set-Content -Encoding ASCII -LiteralPath $csvFileChunk -Value $header
}
# Write data row to the current chunk file.
Add-Content -Encoding ASCII -LiteralPath $csvFileChunk -Value $_
}
}
Another option from linux world - split command. To get it on windows just install git bash, then you'll be able to use many linux tools in your CMD/powershell.
Below is the syntax to achieve your goal:
split -l 100000 --numeric-suffixes --suffix-length 3 --additional-suffix=.csv sourceFile.csv outputfile
It's very fast. If you want you can wrap split.exe as a cmdlet

Moving Files to Folders with Powershell and RegEx

I wrote the following script to batch process the files in to folders based on the title of the magazine (everything before the first hyphen):
magazine title - year-month.pdf eg National Geographic - 2017-07.pdf
After running the script the magazine(s) are moved from the parent folder to a new sub folder, in this case "National Geographic Magazine".
Three related questions:
The '_Orphans' folder (line 38) is created even if there are no 'orphans'
to file in to it for later manual processing. How do I make the folder
creation conditional?
Duplicate files create an error message during processing. Not a big deal as the script continues to run, but I'd like to handle duplicates the same way 'orphans' are handled, with a new "_Duplicates" folder/move.
How do I comment multiple lines without the # at the beginning of each
line (as at the top of the script, for example). There must be a more elegant
way to handle comments/documentation...?
Bonus Question:
If you're really bored waiting for that multi-TB file copy
you're watching progress like an hour glass, could anyone help with the code
for an array of delimiters (wrong term/name probably) as shown on line 10? I'd
like to be able to specify more than just the hard-coded hyphen I used in my
regex match (line 26, which took me the better part of a day to get working).
$OrigFolder = ".\"
$NewFolder = ".\_Sorted to Move"
# Orphans folder, where files that return null in the regex match will be moved
# Example: file "- title.pdf"
# will be moved to ".\_Orphans" folder
$Orphans = '_Orphans' # Use the underscore to sort the folder to the top of the window
#### How to use an array of values for the delimiters in the regex instead of literals
#### My proposed code, but I am missing how to use the delims in the regex match
#### $delims = "\s-\s" ",\s"\s and\s"
# First count the number of files in the $OrigFolder directory
$numFiles = (Get-ChildItem -Path $OrigFolder).Count
$i=0
# Tell the user what will happen
clear-host;
Write-Host 'This script will copy ' $numFiles ' files from ' $OrigFolder ' to _Sorted to Move'
# Ask user to confirm the copy operation
Read-host -prompt 'Press enter to start copying the files'
# Regex to match filenames
$Regex = [regex]"(?:(.*?)\s-)|(?:(.*?),\s)|(?:(.*?)\sand\s)"
# Loop through the $OrigFolder directory, skipping folders
Get-ChildItem -LiteralPath $OrigFolder | Where-Object {!$_.PsIsContainer} |
ForEach-Object {
if($_.BaseName -match $Regex){
$ChildPath = $_.BaseName -replace $Regex
#Caluclate copy operation progress as a percentage
[int]$percent = $i / $numFiles * 100
# If first part of the file name is empty, move it to the '_Orphans' folder
if(!$Matches[1]){
$ChildPath = $Orphans}
else {
$ChildPath = $Matches[1]
}
# Generate new folder name
$FolderName = Join-Path -Path $NewFolder -ChildPath ($ChildPath + ' Magazine')
# Create folder if it doesn't exist
if(!(Test-Path -LiteralPath $FolderName -PathType Container)){
$null = New-Item -Path $FolderName -ItemType Directory}
# Log progress to the screen
Write-Host "$($_.FullName) -> $FolderName"
# Move the file to the folder
Move-Item -LiteralPath $_.FullName -Destination $FolderName
# Tell the user how much has been moved
Write-Progress -Activity "Copying ... ($percent %)" -status $_ -PercentComplete $percent -verbose
$i++
}
}
Write-Host 'Total number of files in '$OrigFolder ' is ' $numFiles
Write-Host 'Total number of files copied to '$NewFolder ' is ' $i
Read-host -prompt "Press enter to complete..."
clear-host;
Q1: I tested your script on my machine with PDFs that followed the same naming scheme and it didn't create an Orphans folder or move my Orphan PDFs. I noticed you have a if ($_.BaseName -match $Regex) immediately after your foreach. Inside that is where you are looking for Orphans, but orphans wouldn't make it into this if block because they wouldn't match the Regex. In pseudocode, you structure should be something like:
foreach{
if (match)
{$childpath $_.BaseName -replace $Regex}
else
{Childpath = $Orphans}
Create your folders and do your moves.
}
Q2: Try, Catch blocks: https://blogs.technet.microsoft.com/heyscriptingguy/2014/07/05/weekend-scripter-using-try-catch-finally-blocks-for-powershell-error-handling/
Q3: You can comment multiple lines by enclosing them in <# #> pairs.
<#
How to use an array of values for the delimiters in the regex instead of literals
My proposed code, but I am missing how to use the delims in the regex match
$delims = "\s-\s" ",\s"\s and\s"
#>

VBScript to Powershell - Environment Variables

I am currently working on a powershell script that maps directories along with loading database software. I have this current vbscript I am converting to powershell that is suppose to validate a temporary file path , but I am getting a little confused on what I may need to take out and what I can leave in.
Here is the original vbscript ...
'
' assure that temp version of Perl is used
'
perlPath = basePath & "install\perl\bin;"
WshShell.Environment("Process")("PATH") = perlPath & WshShell.Environment("System")("PATH")
'
' assure that temp version of Perl Lib is used
'
perlLib = basePath & "\install\perl\lib;" & basePath & "\install\perl\site\lib;"
WshShell.Environment("Process")("PERL5LIB") = perlLib
Here is what I have written in powershell so far ...
#
# assure that Oracle's version of Powershell is used
#
$psPath = $basePath + "install\powershell\bin;"
$sysPath = $WshShell.Environment("System") | Where-Object { $_ -match "PATH" } |
foreach-object {$_.Substring(9)} | Out-String
$psPos = $sysPath.contains($psPath)
if( -not ($psPos)){
[Environment]::SetEnvironmentVariable("PATH", ($psPath + $sysPath), "Process")
}
#
# assure that Oracle's version of Powershell Module is used
#
$psMod = $homePath + "\perl\lib;" + $homePath + "\perl\site\lib;" # still need to convert
$sysMod = $Env:PSModulePath
$psPos = $sysMod.contains($psMod)
if( -not ($psPos)){
[Environment]::SetEnvironmentVariable("PATH", ($psPath + $sysChk), "Process")
}}
The same validation is done later in the script with the "System" variables. I do have a module that I will be using, but the rest are scripts. I guess I am not sure if what I am converting is the right way to verify these pathways exist and if not to add the new pathways.
First of all, you should use the Join-Path cmdlet for combining a path:
$psPath = Join-Path $basePath "install\powershell\bin"
You can access the Pathvariable using $env:Path split it using -split ';' and select the first path entry using [0]. All in all, I would define the three path you want to set, put them into an array and iterate over it.
$powershellBin = Join-Path $basePath "install\powershell\bin"
$perLib = Join-Path $homePath "\perl\lib"
$perlSiteLib = Join-Path $homePath "\perl\site\lib"
#($powershellBin, $perLib, $perlSiteLib) | foreach {
if (-not (($env:Path -split ';')[0].Equals($_)))
{
[Environment]::SetEnvironmentVariable("PATH", ("{0};{1}" -f $_, $env:Path), "Process")
}
}

Slow Powershell script for CSV modification

I'm using a powershell script to append data to the end of a bunch of files.
Each file is a CSV around 50Mb (Say 2 millionish lines), there are about 50 files.
The script I'm using looks like this:
$MyInvocation.MyCommand.path
$files = ls *.csv
foreach($f in $files)
{
$baseName = [System.IO.Path]::GetFileNameWithoutExtension($f)
$year = $basename.substring(0,4)
Write-Host "Starting" $Basename
$r = [IO.File]::OpenText($f)
while ($r.Peek() -ge 0) {
$line = $r.ReadLine()
$line + "," + $year | Add-Content $(".\DR_" + $basename + ".CSV")
}
$r.Dispose()
}
Problem is, it's pretty slow. It's taken about 12 hours to get through them.
It's not super complex, so I wouldn't expect it to take that long to run.
What could I do to speed it up?
Reading and writing a file row by row can be a bit slow. Maybe your antivirus is contributing to slowness as well. Use Measure-Command to see which parts of the script are the slow ones.
As a general advise, rather write a few large blocks instead of lots of small ones. You can achieve this by storing some content in a StringBuilder and appending its contents into the output file every, say, 1000 processed rows. Like so,
$sb = new-object Text.StringBuilder # New String Builder for stuff
$i = 1 # Row counter
while ($r.Peek() -ge 0) {
# Add formatted stuff into the buffer
[void]$sb.Append($("{0},{1}{2}" -f $r.ReadLine(), $year, [Environment]::NewLine ) )
if(++$i % 1000 -eq 0){ # When 1000 rows are added, dump contents into file
Add-Content $(".\DR_" + $basename + ".CSV") $sb.ToString()
$sb = new-object Text.StringBuilder # Reset the StringBuilder
}
}
# Don't miss the tail of the contents
Add-Content $(".\DR_" + $basename + ".CSV") $sb.ToString()
Don't go into .NET Framework static methods and building up strings when there are cmdlets that can do the work on objects. Collect your data, add the year column, then export to your new file. You're also doing a ton of file I/O and that'll also slow you down.
This will probably require a little bit more memory. But it reads the whole file at once, and writes the whole file at once. It also assumes that your CSV files have column headings. But it's much easier for someone else to look at and understand exactly what's going on (write your scripts so they can be read!).
# Always use full cmdlet names in scripts, not aliases
$files = get-childitem *.csv;
foreach($f in $files)
{
#basename is a property of the file object in PowerShell, there's no need to call a static method
$basename = $f.basename;
$year = $f.basename.substring(0,4)
# Every time you use Write-Host, a puppy dies
"Starting $Basename";
# If you've got CSV data, treat it as CSV data. PowerShell can import it into a collection natively.
$data = Import-Csv $f;
$exportData = #();
foreach ($row in $data) {
# Add a year "property" to each row object
$row |Add-Member -membertype NoteProperty -Name "Year" -Value $year;
# Export the modified row to the output file
$row |Export-Csv -NoTypeInformation -Path $("r:\DR_" + $basename + ".CSV") -Append -NoClobber
}
}

Can the PowerShell commandlet move-item move a currently open file?

I want to avoid moving files that are currently open by another process. Is there any way the move-item PowerShell command can move, or even worse copy, a currently open file?
We currently have a situation where we have two processes that need data files transferred from process A's output folder to process B's input folder. The idea is that process A writes a file, and then a PowerShell script moves the files to the folder that process B reads.
We are having an issue sometimes that the same file is transferred twice, and it is not a partial file either time.
The below code is executed at 00, 10, 20, 30, 40, 50 minutes past the hour. Process B on the Samba server runs at 05, 15, 25, 35, 45, 55 minutes past the hour and moves the files out of the folder the PowerShell script puts them in, once process B has finished processing the files. There are only ever up to about a dozen 1 KB files being moved at a time.
Process A is not controlled by us and can write files to that location at any time. It seems there is some race condition happening with Process A creating a file just before the PowerShell script moves the file, where the script copies the file and then moves it 10 minutes later when the script runs again.
With the below code is the only possibility that Process A is making the file twice if two entries are logged for the same file with "Moved File" in the log file?
$source = "C:\folder\*.txt"
$target_dir = "\\samba-server\share\"
$bad_dir = "C:\folder\bad_files\"
$log = "C:\SystemFiles\Logs\transfer.log"
$files = Get-ChildItem $source
foreach ($file in $files){
if ($file.name -eq $null) {
# Nothing to do, Added this in since for some reason it executes the conditions below
}
elseif (test-path ($target_dir + $file.name)) {
# If there is a duplicate file, write to the log file, then copy it to the bad dir with
# the datetime stamp in front of the file name
$log_string = ((Get-Date -format G) + ",Duplicate File," + "'" + $file.name + "', " + $file.LastWriteTime)
write-output ($log_string) >> $log
$new_file = ($bad_dir + (get-date -format yyyy.MM.dd.HHmmss) + "_" + $file.name)
move-item $file.fullname $new_file
}
else {
# The file doesnt exist on the remote source, so we are good to move it.
move-item $file.fullname $target_dir
if ($?) { # If the last command completed successfully
$log_string = ((Get-Date -format G) + ",Moved File," + "'" + $file.name + "', " + $file.LastWriteTime)
} else {
$log_string = ((Get-Date -format G) + ",Failed to Move File," + "'" + $file.name + "', " + $file.LastWriteTime)
}
write-output ($log_string) >> $log
}
}
This is the classic producer-consumer problem, which is a well researched topic.
Some solutions you might try are checking the file's last write time. If it is well enough in the past, it can be moved without issues. Another one would be trying to open the file with exclusive access. If it fails, the file is still being used by the producer process. Othervise, close the file and move it.
Some examples are like so,
# List files that were modified at least five minutes ago
gci | ? { $_.lastwritetime -le (get-date).addminutes(-5) }
# Try to open a file with exclusive mode
try {
$f1 = [IO.File]::Open("c:\temp\foo.txt", [IO.Filemode]::Open, [IO.FileAccess]::Read, [IO.FileShare]::None)
# If that didn't fail, close and move the file to new location
$f1.Close()
$f1.Dispose()
Move-Item "c:\temp\foo.txt" $newLocation
} catch [System.IO.IOException]{
"File is already open" # Catch the file is locked exception, try again later
}