Split csv file into specified number of files without page break - powershell
I have a 200,000 file to be split into 8 chunks using powershell
The file has rows with the first value being the record 'KEY'
I would like to ensure that rows corresponding to the key field value (which is the first value of the row) do not break across files when the split happens.
Here is the simple split I use
$i=0
Get-Content -Encoding Default "C:\Test.csv" -ReadCount 10130 | ForEach-Object {
$i++
$_ | Out-File -Encoding Default "C:\Test_$i.csv"
}
Sample Data
0190709,HP16,B,B,3,3,
0190709,HP17,B,B,3,3,
0190709,HP18,B,B,3,3,
0196597,HP11,,CNN,,,
0196597,HP119,,CNN,,,
0196597,HP13,,CNN,,,
01919769,HP11,,ANN,,,
01919769,HP119,,OPN,,,
01919769,HP13,,CNN,,,
01919769,HP14,X,X,X,X,
01919769,HP15,A,A,X,X,
01919769,HP16,S,S,X,X,
01919769,HP17,S,S,5,5,
01919769,HP18,S,S,5,5,
0797819,HP14,X,AX,X,X,
0797819,HP15,X,XA,X,X,
0797819,HP16,X,X,XA,XA,
0797819,HP17,A,A,X,X,
0797819,HP18,A,A,AX,X,
Expected Output
Lets say we want 2 chunks of equal size. I would like 2 files like below with the key not split between files. Its ok if the file gets bigger (more lines) in an attempt to prevent page break of the key.
File 1
0190709,HP16,B,B,3,3,
0190709,HP17,B,B,3,3,
0190709,HP18,B,B,3,3,
0196597,HP11,,CaweNN,,,
0196597,HP119,,CNN,,,
0196597,HP13,,CNwN,,,
01919769,HP11,,AawNN,,,
01919769,HP119,,OePN,,,
01919769,HP13,,CNN,,,
01919769,HP14,XY,X,X,X,
01919769,HP15,A,A,XC,XA,
01919769,HP16,S,S,X,X,
01919769,HP17,S,S,5A,5,
01919769,HP18,S,S,5,5,
File 2
0797819,HP14,X,AX,X,X,
0797819,HP15,X,XA,X,X,
0797819,HP16,X,X,XA,XA,
0797819,HP17,A,A,X,X,
0797819,HP18,A,A,AX,X,
Although you have not supplied an example (first couple of lines) of your CSV file, the below function assumes the input csv file is valid.
function Split-Csv {
[CmdletBinding()]
Param (
[Parameter(Mandatory = $true, Position = 0)]
[string]$Path, # the full path and filename of the source CSV file
[Parameter(Mandatory = $true, Position = 1)]
[string]$Destination, # the path of the output folder
[ValidateRange(1,[int]::MaxValue)]
[int]$Chunks = 8, # the number of parts to split into
[switch]$FirstLineHasHeaders
)
# create the destination folder if it does not already exist
if (!(Test-Path -Path $Destination -PathType Container)) {
Write-Verbose "Creating folder '$Destination'"
New-Item -Path $Destination -ItemType Directory | Out-Null
}
$outputFile = [System.IO.Path]::GetFileNameWithoutExtension($Path)
$content = Get-Content -Path $Path
$totalLines = $content.Count
if ($FirstLineHasHeaders) {
$headers = $content[0]
$partsize = [Math]::Ceiling(($totalLines - 1) / $Chunks)
for ($i = 0; $i -lt $Chunks; $i++) {
$first = ($i * $partsize + 1)
$last = [Math]::Min($first + $partsize -1, $totalLines - 1)
$newFile = Join-Path -Path $Destination -ChildPath ('{0}-{1:000}.csv' -f $outputFile, ($i + 1))
Write-Verbose "Creating file '$newFile'"
Set-Content -Path $newFile -Value $headers -Force
Add-Content -Path $newFile -Value $content[$first..$last]
}
}
else {
$partsize = [Math]::Ceiling($totalLines / $Chunks)
for ($i = 1; $i -le $Chunks; $i++) {
$first = (($i - 1) * $partsize)
$last = [Math]::Min(($i * $partsize) - 1, $totalLines - 1)
$newFile = Join-Path -Path $Destination -ChildPath ('{0}-{1:000}.csv' -f $outputFile, $i)
Write-Verbose "Creating file '$newFile'"
Set-Content -Path $newFile -Value $content[$first..$last] -Force
}
}
}
If your input csv file has headers, you need to ensure every 'chunk' file also has these headers.
Use the function WITH switch $FirstLineHasHeaders
Split-Csv -Path 'C:\Test.csv' -Destination 'D:\test' -Chunks 8 -FirstLineHasHeaders -Verbose
If your input csv file does NOT have headers, use it like:
Split-Csv -Path 'C:\Test.csv' -Destination 'D:\test' -Chunks 8 -Verbose
Related
PowerShell Copy All Files in Folders and Sub=Folders Not older than 300 minutes
I am trying to copy all files in folders and sub-folders not older than 300 minutes, but the code I got working only copies the files in the main folder, it doesn't copy the files in subfolders. At the destination I don't want to maintain the folder structure of the original files, I just want to put all the origin files into a single specific destination folder. This is the code I have: Powershell -NoL -NoP -C "&{$ts=New-TimeSpan -M 300;"^ "Get-ChildItem "C:\Origin" -Filter '*.dat'|?{"^ "$_.LastWriteTime -gt ((Get-Date)-$ts)}|"^ %%{Copy-Item $_.FullName 'C:\Destination'}}" Could someone help me out please? Thanks in advance.
Here's a modified script for you you can save as "Copy-Unique.ps1" you can run from a batch file. function Copy-Unique { # Copies files to a destination. If a file with the same name already exists in the destination, # the function will create a unique filename by appending '(x)' after the name, but before the extension. # The 'x' is a numeric sequence value. [CmdletBinding(SupportsShouldProcess)] # add support for -WhatIf switch Param( [Parameter(Mandatory = $true, ValueFromPipeline = $true, Position = 0)] [Alias("Path")] [ValidateScript({Test-Path -Path $_ -PathType Container})] [string]$SourceFolder, [Parameter(Mandatory = $true, Position = 1)] [string]$DestinationFolder, [Parameter(Mandatory = $false)] [int]$NewerThanMinutes = -1, [Parameter(Mandatory = $false)] [string]$Filter = '*', [switch]$Recurse ) # create the destination path if it does not exist if (!(Test-Path -Path $DestinationFolder -PathType Container)) { Write-Verbose "Creating folder '$DestinationFolder'" $null = New-Item -Path $DestinationFolder -ItemType 'Directory' -Force } # get a list of file FullNames in this source folder $sourceFiles = #(Get-ChildItem -Path $SourceFolder -Filter $Filter -File -Recurse:$Recurse) # if you want only files not older than x minutes, apply an extra filter if ($NewerThanMinutes -gt 0) { $sourceFiles = #($sourceFiles | Where-Object { $_.LastWriteTime -gt (Get-Date).AddMinutes(-$NewerThanMinutes) }) } foreach ($file in $sourceFiles) { # get an array of all filenames (names only) of the files with a similar name already present in the destination folder $destFiles = #((Get-ChildItem $DestinationFolder -File -Filter "$($file.BaseName)*$($file.Extension)").Name) # for PowerShell version < 3.0 use this # $destFiles = #(Get-ChildItem $DestinationFolder -Filter "$baseName*$extension" | Where-Object { !($_.PSIsContainer) } | Select-Object -ExpandProperty Name) # construct the new filename $newName = $file.Name $count = 1 while ($destFiles -contains $newName) { $newName = "{0}({1}){2}" -f $file.BaseName, $count++, $file.Extension } # use Join-Path to create a FullName for the file $newFile = Join-Path -Path $DestinationFolder -ChildPath $newName Write-Verbose "Copying '$($file.FullName)' as '$newFile'" $file | Copy-Item -Destination $newFile -Force } } # you can change the folder paths, file pattern to filter etc. here $destFolder = Join-Path -Path 'C:\Destination' -ChildPath ('{0:yyyy-MM-dd_HH-mm}' -f (Get-Date)) Copy-Unique -SourceFolder "C:\Origin" -DestinationFolder $destFolder -Filter '*.dat' -Recurse -NewerThanMinutes 300 Changed the code to now take a datetime object to compare against rather than an amount of minutes. This perhaps makes the code easier to understand, but certainly more flexible. function Copy-Unique { # Copies files to a destination. If a file with the same name already exists in the destination, # the function will create a unique filename by appending '(x)' after the name, but before the extension. # The 'x' is a numeric sequence value. [CmdletBinding(SupportsShouldProcess)] # add support for -WhatIf switch Param( [Parameter(Mandatory = $true, ValueFromPipeline = $true, Position = 0)] [Alias("Path")] [ValidateScript({Test-Path -Path $_ -PathType Container})] [string]$SourceFolder, [Parameter(Mandatory = $true, Position = 1)] [string]$DestinationFolder, [string]$Filter = '*', [datetime]$NewerThan = [datetime]::MinValue, [switch]$Recurse ) # create the destination path if it does not exist if (!(Test-Path -Path $DestinationFolder -PathType Container)) { Write-Verbose "Creating folder '$DestinationFolder'" $null = New-Item -Path $DestinationFolder -ItemType 'Directory' -Force } # get a list of file FullNames in this source folder $sourceFiles = #(Get-ChildItem -Path $SourceFolder -Filter $Filter -File -Recurse:$Recurse) # if you want only files newer than a certain date, apply an extra filter if ($NewerThan -gt [datetime]::MinValue) { $sourceFiles = #($sourceFiles | Where-Object { $_.LastWriteTime -gt $NewerThan }) } foreach ($file in $sourceFiles) { # get an array of all filenames (names only) of the files with a similar name already present in the destination folder $destFiles = #((Get-ChildItem $DestinationFolder -File -Filter "$($file.BaseName)*$($file.Extension)").Name) # for PowerShell version < 3.0 use this # $destFiles = #(Get-ChildItem $DestinationFolder -Filter "$baseName*$extension" | Where-Object { !($_.PSIsContainer) } | Select-Object -ExpandProperty Name) # construct the new filename $newName = $file.Name $count = 1 while ($destFiles -contains $newName) { $newName = "{0}({1}){2}" -f $file.BaseName, $count++, $file.Extension } # use Join-Path to create a FullName for the file $newFile = Join-Path -Path $DestinationFolder -ChildPath $newName Write-Verbose "Copying '$($file.FullName)' as '$newFile'" $file | Copy-Item -Destination $newFile -Force } } # you can change the folder paths, file pattern to filter etc. here $destFolder = Join-Path -Path 'D:\Destination' -ChildPath ('{0:yyyy-MM-dd_HH-mm}' -f (Get-Date)) Copy-Unique -SourceFolder "C:\Origin" -DestinationFolder $destFolder -Filter '*.dat' -Recurse -NewerThan (Get-Date).AddMinutes(-300) When you have saved the above code to let's say 'C:\Scripts\Copy-Unique.ps1' you can then call it from a batch file like: Powershell.exe -NoLogo -NoProfile -File "C:\Scripts\Copy-Unique.ps1"
How to create a copy using powershell without overwriting the original file
So lets say I want to copy the file test.txt to another folder, but I want it to put create a copy and not just erase the file. I know that Copy-Item overwrites the file in the destination folder but I don't want it to do that. It also has to be a function
I think this will help you: $destinationFolder = 'PATH OF THE DESTINATION FOLDER' $sourceFile = 'FULL PATH AND FILENAME OF THE SOURCE FILE' # split the filename into a basename and an extension variable $baseName = [System.IO.Path]::GetFileNameWithoutExtension($sourceFile) $extension = [System.IO.Path]::GetExtension($sourceFile) # you could also do it like this: # $fileInfo = Get-Item -Path $sourceFile # $baseName = $fileInfo.BaseName # $extension = $fileInfo.Extension # get an array of all filenames (name only) of the files with a similar name already present in the destination folder $allFiles = #(Get-ChildItem $destinationFolder -File -Filter "$baseName*$extension" | Select-Object -ExpandProperty Name) # construct the new filename $newName = $baseName + $extension $count = 1 while ($allFiles -contains $newName) { # add a sequence number in brackets to the end of the basename until it is unique in the destination folder $newName = "{0}({1}){2}" -f $baseName, $count++, $extension } # construct the new full path and filename for the destination of your Copy-Item command $targetFile = Join-Path -Path $destinationFolder -ChildPath $newName Copy-Item -Path $sourceFile -Destination $targetFile
Even if i isn't the most elegant way, you could try something like this $src = "$PSScriptRoot" $file = "test.txt" $dest = "$PSScriptRoot\dest" $MAX_TRIES = 5 $copied = $false for ($i = 1; $i -le $MAX_TRIES; $i++) { if (!$copied) { $safename = $file -replace "`.txt", "($i).txt" if (!(Test-Path "$dest\$file")) { Copy-Item "$src\$file" "$dest\$file" $copied = $true } elseif (!(Test-Path "$dest\$safename")) { Copy-Item "$src\$file" "$dest\$safename" $copied = $true } else { Write-Host "Found existing file -> checking for $safename" } } else { break } } The for-loop will try to safely copy the file up to 5 times (determined by $MAX_TRIES) If 5 times isn't enough, nothing will happen The regEx will create test(1).txt, test(2).txt, ... to check for a "safe" filename to copy The if-statement will check if the original file can be copied The elseif-statement will try to copy with the above created $safename The else-statement is just printing a "hint" on what's going on
Powershell move file with backup (like mv --backup=numbered)
I'm looking if there's a PS command that'd be equal to mv --backup=numbered, and can't find anything. In essence, move 'file' to 'file.old', but if 'file.old' exists, 'file' should be moved to 'file.old.2'. For now the closest I found is from this link: https://www.pdq.com/blog/copy-individual-files-and-rename-duplicates/: $SourceFile = "C:\Temp\File.txt" $DestinationFile = "C:\Temp\NonexistentDirectory\File.txt" If (Test-Path $DestinationFile) { $i = 0 While (Test-Path $DestinationFile) { $i += 1 $DestinationFile = "C:\Temp\NonexistentDirectory\File$i.txt" } } Else { New-Item -ItemType File -Path $DestinationFile -Force } Copy-Item -Path $SourceFile -Destination $DestinationFile -Force It seems quite awful to have this amount of code. Is there anything simpler available?
Indeed there is no built-in function to do that. However, it should not be a problem to use a function of your own for that purpose. How about this: function Copy-FileNumbered { [CmdletBinding()] Param( [Parameter(Mandatory = $true, Position = 0)] [ValidateScript({Test-Path -Path $_ -PathType Leaf})] [string]$SourceFile, [Parameter(Mandatory = $true, Position = 1)] [string]$DestinationFile ) # get the directory of the destination file and create if it does not exist $directory = Split-Path -Path $DestinationFile -Parent if (!(Test-Path -Path $directory -PathType Container)) { New-Item -Path $directory -ItemType 'Directory' -Force } $baseName = [System.IO.Path]::GetFileNameWithoutExtension($DestinationFile) $extension = [System.IO.Path]::GetExtension($DestinationFile) # this includes the dot $allFiles = Get-ChildItem $directory | Where-Object {$_.PSIsContainer -eq $false} | Foreach-Object {$_.Name} $newFile = $baseName + $extension $count = 1 while ($allFiles -contains $newFile) { $newFile = "{0}({1}){2}" -f $baseName, $count, $extension $count++ } Copy-Item -Path $SourceFile -Destination (Join-Path $directory $newFile) -Force } This will create a new file in the destination like File(1).txt Of course, if you rather have names like File.2.txt, just change the format template "{0}({1}){2}" to "{0}.{1}{2}" Use the function like $SourceFile = "C:\Temp\File.txt" $DestinationFile = "C:\Temp\NonexistentDirectory\File.txt" Copy-FileNumbered -SourceFile $SourceFile -DestinationFile $DestinationFile
PowerShell - copying multiple files
I've written a PowerShell script to copy new files across to a shared folder on a server. I am wondering if there is a way that after I get the list of new files in the sub folders, I can copy them over together - other than using for-each and copying them one at a time - so that I can add a progress bar.
something like this could be a starting point # define source and destination folders $source = 'C:\temp\music' $dest = 'C:\temp\new' # get all files in source (not empty directories) $files = Get-ChildItem $source -Recurse -File $index = 0 $total = $files.Count $starttime = $lasttime = Get-Date $results = $files | % { $index++ $currtime = (Get-Date) - $starttime $avg = $currtime.TotalSeconds / $index $last = ((Get-Date) - $lasttime).TotalSeconds $left = $total - $index $WrPrgParam = #{ Activity = ( "Copying files $(Get-Date -f s)", "Total: $($currtime -replace '\..*')", "Avg: $('{0:N2}' -f $avg)", "Last: $('{0:N2}' -f $last)", "ETA: $('{0:N2}' -f ($avg * $left / 60))", "min ($([string](Get-Date).AddSeconds($avg*$left) -replace '^.* '))" ) -join ' ' Status = "$index of $total ($left left) [$('{0:N2}' -f ($index / $total * 100))%]" CurrentOperation = "File: $_" PercentComplete = ($index/$total)*100 } Write-Progress #WrPrgParam $lasttime = Get-Date # build destination path for this file $destdir = Join-Path $dest $($(Split-Path $_.fullname) -replace [regex]::Escape($source)) # if it doesn't exist, create it if (!(Test-Path $destdir)) { $null = md $destdir } # if the file.txt already exists, rename it to file-1.txt and so on $num = 1 $base = $_.basename $ext = $_.extension $newname = Join-Path $destdir "$base$ext" while (Test-Path $newname) { $newname = Join-Path $destdir "$base-$num$ext" $num++ } # log the source and destination files to the $results variable Write-Output $([pscustomobject]#{ SourceFile = $_.fullname DestFile = $newname }) # finally, copy the file to its new location copy $_.fullname $newname } # export a list of source files $results | Export-Csv c:\temp\copylog.csv -NoTypeInformation NOTE: that will show progress for total files regardless of size. ex: you have 2 files, one is 1 mb and the other is 50 mb. when the first file is copied, progress will be 50% because half the files are copied. if you want progress by total bytes, i highly recommend giving this function a try. just give it a source and destination. works when given a single file or a whole folder to copy https://github.com/gangstanthony/PowerShell/blob/master/Copy-File.ps1 screenshot: http://i.imgur.com/hT8yoUm.jpg
keeping default encoding type upon writing lines
I am using the following script to delete the last line of a file, the problem is that it seems to be changing the encoding type of the file to something else which makes characters such pound (£) signs unreadable. $path = "D:\Test\" $filter = "*.txt" $files = Get-ChildItem -path $path -filter $filter foreach ($item in $files) { Write-Host "Start Processing " $item.FullName -foregroundcolor "green" # Read all lines $LinesInFile = [System.IO.File]::ReadAllLines($item.FullName) # Write all lines, except for the last one, back to the file [System.IO.File]::WriteAllLines($item.FullName,$LinesInFile[0..($LinesInFile.Count - 2)]) # Clean up Remove-Variable -Name LinesInFile Write-Host "Ended Processing " $item.FullName -foregroundcolor "white" } I tried setting the encoding type upon writing files to "ANSI" which seems to be the default encoding type of the files, however, nothing happens. Encoding.GetEncoding(1252) [System.IO.File]::WriteAllLines($item.FullName,$LinesInFile[0..($LinesInFile.Count - 2)], Encoding.GetEncoding(1252))
If you have characters like that I'd recommend setting the encoding to UTF-8 or Unicode. I'd also recommend using PowerShell cmdlets instead of .NET methods if you're processing your files line-by-line anyway. foreach ($item in $files) { $path = $item.FullName Write-Host "Start Processing $path" -ForegroundColor 'green' $LinesInFile = Get-Content $path $LinesInFile[0..($LinesInFile.Count - 2)] | Set-Content $path -Encoding UTF8 Write-Host "Ended Processing $path" -ForegroundColor 'white' } Edit: If performance is an issue you could use a StreamReader and a StreamWriter in combination with a ring buffer for reading/writing the data. $path = 'D:\Test' $filter = '*.txt' $files = Get-ChildItem -Path $path -Filter $filter $encoding = [Text.Encoding]::GetEncoding(1252) # ring buffer size (== number of lines to remove from end of file) $bufferSize = 2 $tempFile = Join-Path $path 'temp.txt' foreach ($item in $files) { # create ring buffer $buffer = New-Object Object[] $bufferSize $current = 0 $reader = New-Object IO.StreamReader ($item.FullName, $encoding) $writer = New-Object IO.StreamWriter ($tempFile, $false, $encoding) while ($reader.Peek() -ge 0) { if ($buffer[$current]) { $writer.WriteLine($buffer[$current]) } $buffer[$current] = $reader.ReadLine() $current = ($current + 1) % $bufferSize } $reader.Close(); $reader.Dispose() $writer.Close(); $writer.Dispose() Remove-Item $item.FullName -Force Rename-Item $tempFile $item.Name }