I have the following function:
function DivideAndCreateFiles ([string] $file, [string] $ruleName) {
$Suffix = 0
$FullData = Get-Content $file
$MaxIP = 40
while ($FullData.count -gt 0 ){
$NewData = $FullData | Select-Object -First $MaxIP
$FullData = $FullData | Select-Object -Skip $MaxIP
$NewName = "$ruleName$Suffix"
New-Variable -name $NewName -Value $NewData
Get-Variable -name $NewName -ValueOnly | out-file "$NewName.txt"
$Suffix++
}
}
This function takes a file location, this file holds hundreds of ips. It then iterate the file and create files from it each one holding 40 IP's naming it $rulename$suffix so if $rulename=blabla
I would get blabla_0, blabla_1, and so on, each with 40 ips.
I need to convert this logic and put it inside a jenkins job.
This file is in the working dir of the job and is called ips.txt
suffix = 0
maxIP = 40
ips = readFile('ips.txt')
...
You can achieve this easily using something like the following groovy code:
def divideAndCreateFiles(path, rule_name, init_suffix = 0, max_ip = 40){
// read the file and split lines by new line separator
def ips = readFile(path).split("\n").trim()
// divide the IPs into groups according to the maximum ip range
def ip_groups = ips.collate(ips.size().intdiv(max_ip))
// Iterate over all groups and write them to the corresponding file
ip_groups.eachWithIndex { group, index ->
writeFile file : "${rule_name}${init_suffix + index}", text: group.join("\n")
}
}
From my perspective, because you are already writing a full pipeline in groovy, it is easier to handle all logic in groovy, including this function.
This is not an answer about converting to Jenkins, but a re-write of your original PowerShell function, which splits a file in a very inefficient way.
(can't do that in a comment)
This utilizes the -ReadCount parameter of Get-Content, which specifies how many lines of content are sent through the pipeline at a time.
Also, I have renamed the function to comply to the Verb-Noun naming recommendations.
function Split-File ([string]$file, [string]$ruleName, [int]$maxLines = 40) {
$suffix = 0
$path = [System.IO.Path]::GetDirectoryName($file)
Get-Content -Path $file -ReadCount $maxLines | ForEach-Object {
$_ | Set-Content -Path (Join-Path -Path $path -ChildPath ('{0}_{1}.txt' -f $ruleName, $suffix++))
}
}
Related
I created a script that lists all the folders, subfolders and files and export them to csv:
$path = "C:\tools"
Get-ChildItem $path -Recurse |select fullname | export-csv -Path "C:\temp\output.csv" -NoTypeInformation
But I would like that each folder, subfolder and file in pfad is written into separate column in csv.
Something like this:
c:\tools\test\1.jpg
Column1
Column2
Column3
tools
test
1.jpg
I will be grateful for any help.
Thank you.
You can split the Fullname property using the Split() method. The tricky part is that you need to know the maximum path depth in advance, as the CSV format requires that all rows have the same number of columns (even if some columns are empty).
# Process directory $path recursively
$allItems = Get-ChildItem $path -Recurse | ForEach-Object {
# Split on directory separator (typically '\' for Windows and '/' for Unix-like OS)
$FullNameSplit = $_.FullName.Split( [IO.Path]::DirectorySeparatorChar )
# Create an object that contains the splitted path and the path depth.
# This is implicit output that PowerShell captures and adds to $allItems.
[PSCustomObject] #{
FullNameSplit = $FullNameSplit
PathDepth = $FullNameSplit.Count
}
}
# Determine highest column index from maximum depth of all paths.
# Minus one, because we'll skip root path component.
$maxColumnIndex = ( $allItems | Measure-Object -Maximum PathDepth ).Maximum - 1
$allRows = foreach( $item in $allItems ) {
# Create an ordered hashtable
$row = [ordered]#{}
# Add all path components to hashtable. Make sure all rows have same number of columns.
foreach( $i in 1..$maxColumnIndex ) {
$row[ "Column$i" ] = if( $i -lt $item.FullNameSplit.Count ) { $item.FullNameSplit[ $i ] } else { $null }
}
# Convert hashtable to object suitable for output to CSV.
# This is implicit output that PowerShell captures and adds to $allRows.
[PSCustomObject] $row
}
# Finally output to CSV file
$allRows | Export-Csv -Path "C:\temp\output.csv" -NoTypeInformation
Notes:
The syntax Select-Object #{ Name= ..., Expression = ... } creates a calculated property.
$allRows = foreach captures and assigns all output of the foreach loop to variable $allRows, which will be an array if the loop outputs more than one object. This works with most other control statements as well, e. g. if and switch.
Within the loop I could have created a [PSCustomObject] directly (and used Add-Member to add properties to it) instead of first creating a hashtable and then converting to [PSCustomObject]. The choosen way should be faster as no additional overhead for calling cmdlets is required.
While a file with rows containing a variable number of items is not actually a CSV file, you can roll your own and Microsoft Excel can read it.
=== Get-DirCsv.ps1
Get-Childitem -File |
ForEach-Object {
$NameParts = $_.FullName -split '\\'
$QuotedParts = [System.Collections.ArrayList]::new()
foreach ($NamePart in $NameParts) {
$QuotedParts.Add('"' + $NamePart + '"') | Out-Null
}
Write-Output $($QuotedParts -join ',')
}
Use this to capture the output to a file with:
.\Get-DirCsv.ps1 | Out-File -FilePath '.\dir.csv' -Encoding ascii
I have a large CSV file and I want to split it with respect to size and the header should be in every file.
For example, I have this 1.6MB file and I want the child files shouldn't be more than 512KB. So practically the parent file should have 4 child file.
Tried with the below simple program but the file is splitting with blank child files.
function csvSplitter {
$csvFile = "D:\Test\PTest\Dummy.csv";
$split = 10;
$content = Import-Csv $csvFile;
$start = 1;
$end = 0;
$records_per_file = [int][Math]::Ceiling($content.Count / $split);
for($i = 1; $i -le $split; $i++) {
$end += $records_per_file;
$content | Where-Object {[int]$_.Id -ge $start -and [int]$_.Id -le $end} | Export-Csv -Path "D:\Test\PTest\Destination\file$i.csv" -NoTypeInformation;
$start = $end + 1;
}
}csvSplitter
The logic for the size of the file is yet to write.
Tried to add both the files but I guess there is no option to add files.
this takes a slightly different path to a solution. [grin]
it ...
loads the CSV as a plain text file
saves the 1st line as a header line
calcs the batch size from the total line count & the batch count
uses array index ranges to grab the lines for each batch
combines the header line with the current batch of lines
writes that out to a text file
the reason for such a roundabout method is to save RAM. one drawback to loading the file as a CSV is the sheer amount of RAM needed. just loading the lines of text requires noticeably less RAM.
$SourceDir = $env:TEMP
$InFileName = 'LargeFile.csv'
$InFullFileName = Join-Path -Path $SourceDir -ChildPath $InFileName
$BatchCount = 4
$DestDir = $env:TEMP
$OutFileName = 'LF_Batch_.csv'
$OutFullFileName = Join-Path -Path $DestDir -ChildPath $OutFileName
#region >>> build file to work with
# remove this region when you are ready to do this with your test data OR to do this with real data
if (-not (Test-Path -LiteralPath $InFullFileName))
{
Get-ChildItem -LiteralPath $env:APPDATA -Recurse -File |
Sort-Object -Property Name |
Select-Object Name, Length, LastWriteTime, Directory |
Export-Csv -LiteralPath $InFullFileName -NoTypeInformation
}
#endregion >>> build file to work with
$CsvAsText = Get-Content -LiteralPath $InFullFileName
[array]$HeaderLine = $CsvAsText[0]
$BatchSize = [int]($CsvAsText.Count / $BatchCount) + 1
$StartLine = 1
foreach ($B_Index in 1..$BatchCount)
{
if ($B_Index -ne 1)
{
$StartLine = $StartLine + $BatchSize + 1
}
$CurrentOutFullFileName = $OutFullFileName.Replace('_.', ('_{0}.' -f $B_Index))
$HeaderLine + $CsvAsText[$StartLine..($StartLine + $BatchSize)] |
Set-Content -LiteralPath $CurrentOutFullFileName
}
there is no output on screen, but i got 4 files named LF_Batch_1.csv thru LF_Batch_4.csv that contained the 4our parts of the source file as expected. the last file has a slightly smaller number of rows, but that is what happens when the row count is not evenly divisible by the batch count. [grin]
Try this:
Add-Type -AssemblyName System.Collections
function Split-Csv {
param (
[string]$filePath,
[int]$partsNum
)
# Use generic lists for import/export
[System.Collections.Generic.List[object]]$contentImport = #()
[System.Collections.Generic.List[object]]$contentExport = #()
# import csv-file
$contentImport = Import-Csv $filePath
# how many lines per export file
$linesPerFile = [Math]::Max( [int]($contentImport.Count / $partsNum), 1 )
# start pointer for source list
$startPointer = 0
# counter for file name
$counter = 1
# main loop
while( $startPointer -lt $contentImport.Count ) {
# clear export list
[void]$contentExport.Clear()
# determine from-to from source list to export
$endPointer = [Math]::Min( $startPointer + $linesPerFile, $contentImport.Count )
# move lines to export to export list
[void]$contentExport.AddRange( $contentImport.GetRange( $startPointer, $endPointer - $startPointer ) )
# export
$contentExport | Export-Csv -Path ($filePath.Replace('.', $counter.ToString() + '.' ) ) -NoTypeInformation -Force
# move pointer
$startPointer = $endPointer
# increase counter for filename
$counter++
}
}
Split-Csv -filePath 'test.csv' -partsNum 7
try running this script:
$sw = new-object System.Diagnostics.Stopwatch
$sw.Start()
$FilePath = $HOME +'\Documents\Projects\ADOPT\Data8277.csv'
$SplitDir = $HOME +'\Documents\Projects\ADOPT\Split\'
CSV-FileSplitter -Path $FilePath -PartSizeBytes 35MB -SplitDir $SplitDir #-Verbose
$sw.Stop()
Write-Host "Split complete in " $sw.Elapsed.TotalSeconds "seconds"
I created this for files larger than 50GB files
I have a bunch of .csv files and I'm trying to add in some new column headers and their values (which are all blank anyway) and then output this to a new .csv file. My script currently runs and works fine but it takes about 5 minutes to complete the operation on a 60MB file with about 70,000 rows - I have about 100 files to do this on so it will take a while using this script.
My code is below, it's quite simple but clearly inefficient!
Import-Csv $strFilePath |
Select-Object *, #{Name='NewHeader';Expression={''}},
#{Name='NewHeader2';Expression={''}},
#{Name='NewHeader3';Expression={''}},
#{Name='NewHeader4';Expression={''}} |
Export-Csv $($strFilePath + ".new") -NoTypeInformation
As pointed out in the comments, I think it would be better to treat it as a simple text without the useless conversion.
$path = 'C:\test'
$newHeaders = 'NewHeader1','NewHeader2','NewHeader3','NewHeader4'
$files = Get-ChildItem -LiteralPath $path -Filter *.csv
$newHeadersString = #(''; $newHeaders | foreach { '"{0}"' -f $_ }) -join ','
$newColmunsString = ',""' * $newHeaders.Count
foreach ($file in $files) {
$sr = $file.OpenText()
$outfile = New-Item ($file.FullName + '.new') -Force
$sw = [IO.StreamWriter]::new($outfile.FullName)
$sw.WriteLine($sr.ReadLine() + $newHeadersString)
while(!$sr.EndOfStream) { $sw.WriteLine($sr.ReadLine() + $newColmunsString) }
$sr.Close()
$sw.Close()
}
I would really appreciate your help with this
I should first mention that I have been unable to find any specific solutions and I am very new to programming with powershell, hence my request
I wish to write (and later schedule) a script in powershell that looks for a file with a specific name - RFUNNEL and then renames this to R0000001. There will only be one of such 'RFUNELL' files in the folder at any time. However when next the script is run and finds a new RFUNNEL file I will this to be renamed to R0000002 and so on and so forth
I have struggled with this for some weeks now and the seemingly similar solutions that I have come across have not been of much help - perhaps because of my admittedly limited experience with powershell.
Others might be able to do this with less syntax, but try this:
$rootpath = "C:\derp"
if (Test-Path "$rootpath\RFUNNEL.txt")
{ $maxfile = Get-ChildItem $rootpath | ?{$_.BaseName -like "R[0-9][0-9][0-9][0-9][0-9][0-9][0-9]"} | Sort BaseName -Descending | Select -First 1 -Expand BaseName;
if (!$maxfile) { $maxfile = "R0000000" }
[int32]$filenumberint = $maxfile.substring(1); $filenumberint++
[string]$filenumberstring = ($filenumberint).ToString("0000000");
[string]$newName = ("R" + $filenumberstring + ".txt");
Rename-Item "$rootpath\RFUNNEL.txt" $newName;
}
Here's an alternative using regex:
[cmdletbinding()]
param()
$triggerFile = "RFUNNEL.txt"
$searchPattern = "R*.txt"
$nextAvailable = 0
# If the trigger file exists
if (Test-Path -Path $triggerFile)
{
# Get a list of files matching search pattern
$files = Get-ChildItem "$searchPattern" -exclude "$triggerFile"
if ($files)
{
# store the filenames in a simple array
$files = $files | select -expandProperty Name
$files | Write-Verbose
# Get next available file by carrying out a
# regex replace to extract the numeric part of the file and get the maximum number
$nextAvailable = ($files -replace '([a-z])(.*).txt', '$2' | measure-object -max).Maximum
}
# Add one to either the max or zero
$nextAvailable++
# Format the resulting string with leading zeros
$nextAvailableFileName = 'R{0:000000#}.txt' -f $nextAvailable
Write-Verbose "Next Available File: $nextAvailableFileName"
# rename the file
Rename-Item -Path $triggerFile -NewName $nextAvailableFileName
}
I am trying to just remove the first line of about 5000 text files before importing them.
I am still very new to PowerShell so not sure what to search for or how to approach this. My current concept using pseudo-code:
set-content file (get-content unless line contains amount)
However, I can't seem to figure out how to do something like contains.
While I really admire the answer from #hoge both for a very concise technique and a wrapper function to generalize it and I encourage upvotes for it, I am compelled to comment on the other two answers that use temp files (it gnaws at me like fingernails on a chalkboard!).
Assuming the file is not huge, you can force the pipeline to operate in discrete sections--thereby obviating the need for a temp file--with judicious use of parentheses:
(Get-Content $file | Select-Object -Skip 1) | Set-Content $file
... or in short form:
(gc $file | select -Skip 1) | sc $file
It is not the most efficient in the world, but this should work:
get-content $file |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $file -Force
Using variable notation, you can do it without a temporary file:
${C:\file.txt} = ${C:\file.txt} | select -skip 1
function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
if ( -not (Test-Path $path -PathType Leaf) ) {
throw "invalid filename"
}
ls $path |
% { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}
I just had to do the same task, and gc | select ... | sc took over 4 GB of RAM on my machine while reading a 1.6 GB file. It didn't finish for at least 20 minutes after reading the whole file in (as reported by Read Bytes in Process Explorer), at which point I had to kill it.
My solution was to use a more .NET approach: StreamReader + StreamWriter.
See this answer for a great answer discussing the perf: In Powershell, what's the most efficient way to split a large text file by record type?
Below is my solution. Yes, it uses a temporary file, but in my case, it didn't matter (it was a freaking huge SQL table creation and insert statements file):
PS> (measure-command{
$i = 0
$ins = New-Object System.IO.StreamReader "in/file/pa.th"
$outs = New-Object System.IO.StreamWriter "out/file/pa.th"
while( !$ins.EndOfStream ) {
$line = $ins.ReadLine();
if( $i -ne 0 ) {
$outs.WriteLine($line);
}
$i = $i+1;
}
$outs.Close();
$ins.Close();
}).TotalSeconds
It returned:
188.1224443
Inspired by AASoft's answer, I went out to improve it a bit more:
Avoid the loop variable $i and the comparison with 0 in every loop
Wrap the execution into a try..finally block to always close the files in use
Make the solution work for an arbitrary number of lines to remove from the beginning of the file
Use a variable $p to reference the current directory
These changes lead to the following code:
$p = (Get-Location).Path
(Measure-Command {
# Number of lines to skip
$skip = 1
$ins = New-Object System.IO.StreamReader ($p + "\test.log")
$outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
try {
# Skip the first N lines, but allow for fewer than N, as well
for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
$ins.ReadLine()
}
while( !$ins.EndOfStream ) {
$outs.WriteLine( $ins.ReadLine() )
}
}
finally {
$outs.Close()
$ins.Close()
}
}).TotalSeconds
The first change brought the processing time for my 60 MB file down from 5.3s to 4s. The rest of the changes is more cosmetic.
$x = get-content $file
$x[1..$x.count] | set-content $file
Just that much. Long boring explanation follows. Get-content returns an array. We can "index into" array variables, as demonstrated in this and other Scripting Guys posts.
For example, if we define an array variable like this,
$array = #("first item","second item","third item")
so $array returns
first item
second item
third item
then we can "index into" that array to retrieve only its 1st element
$array[0]
or only its 2nd
$array[1]
or a range of index values from the 2nd through the last.
$array[1..$array.count]
I just learned from a website:
Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }
Or you can use the aliases to make it short, like:
gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }
Another approach to remove the first line from file, using multiple assignment technique. Refer Link
$firstLine, $restOfDocument = Get-Content -Path $filename
$modifiedContent = $restOfDocument
$modifiedContent | Out-String | Set-Content $filename
skip` didn't work, so my workaround is
$LinesCount = $(get-content $file).Count
get-content $file |
select -Last $($LinesCount-1) |
set-content "$file-temp"
move "$file-temp" $file -Force
Following on from Michael Soren's answer.
If you want to edit all .txt files in the current directory and remove the first line from each.
Get-ChildItem (Get-Location).Path -Filter *.txt |
Foreach-Object {
(Get-Content $_.FullName | Select-Object -Skip 1) | Set-Content $_.FullName
}
For smaller files you could use this:
& C:\windows\system32\more +1 oldfile.csv > newfile.csv | out-null
... but it's not very effective at processing my example file of 16MB. It doesn't seem to terminate and release the lock on newfile.csv.