Slow Excel automation? - powershell

I'm using the following Powershell script to remove the NumberFormat of the cells in a column for a lot of files so all the fractions will be displayed. The column may have decimal, text or date, etc; only these decimal/currency formatted (with format of 0* or *#*) cells need to be applied
However it's slow (check/update two or three cells every second). Is there a better/faster way to do it?
$WorkBook = $Excel.Workbooks.Open($fileName)
$WorkSheet = $WorkBook.Worksheets.Item(1)
$cell = $WorkSheet.Cells
$ColumnIndex = 10 # The column may have decimal, text or date, etc.
$i = 2
while ($cell.Item($i, 1).value2 -ne $Null)
# Replace it to find the last row# of column 1 may cut the time in half? How?
{
$c = $cell.Item($i, $ColumnIndex)
if (($c.NumberFormat -like "0*") -or $c.NumberFormat -like "*#*")
{
"$i : $($c.NumberFormat) $($c.value2) "
$c.NumberFormat = $Null
}
$i++
}
Update:
Will the .Net Microsoft.Office.Interop.Excel much faster?
Or convert the files to xlsx format and use System.IO.Package.IO?

I improved the speed after read the comment. Thanks all.
Try to reduce the access of the cells as much as possible. I deleted the output line "$i : $($c.NumberFormat) $($c.value2) " and change
if (($c.NumberFormat -like "0*") -or $c.NumberFormat -like "*#*")
to
$f = $c.NumberFormat
if ($f -like "0*" -or $f -like "*#*")
I also use $lastRow = $cell.SpecialCells(11, 1).Row to get the last row number and change the loop to while ($i -le $lastRow).
$Excel.ScreenUpdating = False also helped reduced some time.

Related

PowerShell script efficiency advice

I have a telephony .csv with compiled data from January 2020 and some days of February, each row has the date and time spent on each status, since someone uses different status over the day the file has one row for each status, my script is supposed to go through the file, find the minimum date and then start saving on new files all the data for the same day, so I'll end with one file for 01-01-2020, 02-01-2020 and so on, but it has 15 hours running and it's still at 1/22.
The column I'm using for the dates is called "DateFull" and this is the script
write-host "opening file"
$AT= import-csv “C:\Users\xxxxxx\Desktop\SignOnOff_20200101_20200204.csv”
write-host "parsing and sorting file"
$go= $AT| ForEach-Object {
$_.DateFull= (Get-Date $_.DateFull).ToString("M/d/yyyy")
$_
}
Write-Host "prep day"
$min = $AT | Measure-Object -Property Datefull -Minimum
Write-Host $min
$dateString = [datetime] $min.Minimum
Write-host $datestring
write-host "Setup dates"
$start = $DateString - $today
$start = $start.Days
For ($i=$start; $i -lt 0; $i++) {
$date = get-date
$loaddate = $date.AddDays($i)
$DateStr = $loadDate.ToString("M/d/yyyy")
$now = Get-Date -Format HH:mm:ss
write-host $datestr " " $now
#Install-Module ImportExcel #optional import if you dont have the module already
$Check = $at | where {$_.'DateFull' -eq $datestr}
write-host $check.count
if ($check.count -eq 0 ){}
else {$AT | where {$_.'DateFull' -eq $datestr} | Export-Csv "C:\Users\xxxxx\Desktop\signonoff\SignOnOff_$(get-date (get-date).addDays($i) -f yyyyMMdd).csv" -NoTypeInformation}
}
$at = ''
The first loop doesn't make much sense. It loops through CSV contents and converts each row's date into different a format. Afterwards, $go is never used.
$go= $AT| ForEach-Object {
$_.DateFull= (Get-Date $_.DateFull).ToString("M/d/yyyy")
$_
}
Later, there is an attempt to calculate a value from uninitialized a variable. $today is never defined.
$start = $DateString - $today
It looks, however, like you'd like to calculate, in days, how old eldest record is.
Then there's a loop that counts from negative days to zero. During each iteration, the whole CSV is searched:
$Check = $at | where {$_.'DateFull' -eq $datestr}
If there are 30 days and 15 000 rows, there are 30*15000 = 450 000 iterations. This has complexity of O(n^2), which means runtime will go sky high for even relative small number of days and rows.
The next part is that the same array is processed again:
else {$AT | where {$_.'DateFull' -eq $datestr
Well, the search condition is exactly the same, but now results are sent to a file. This has a side effect of doubling your work. Still, O(2n^2) => O(n^2), so at least the runtime isn't growing in cubic or worse.
As for how to fix this, there are a few things. If you sort the CSV based on date, it can be processed afterwards in just a single run.
$at = $at | sort -Property datefull
Then, iterate each row. Since the rows are in ascending order, the first is the oldest. For each row, check if date has changed. If not, add it to buffer. If it has, save the old buffer and create a new one.
The sample doesn't convert file names in yyyyMMdd format, and it assumes there are only two columns foo and datefull like so,
$sb = new-object text.stringbuilder
# What's the first date?
$current = $at[0]
# Loop through sorted data
for($i = 0; $i -lt $at.Count; ++$i) {
# Are we on next date?
if ($at[$i].DateFull -gt $current.datefull) {
# Save the buffer
$file = $("c:\temp\OnOff_{0}.csv" -f ($current.datefull -replace '/', '.') )
set-content $file $sb.tostring()
# Pick the current date
$current = $at[$i]
# Create new buffer and save data there
$sb = new-object text.stringbuilder
[void]$sb.AppendLine(("{0},{1}" -f $at[$i].foo, $at[$i].datefull))
} else {
[void]$sb.AppendLine(("{0},{1}" -f $at[$i].foo, $at[$i].datefull))
}
}
# Save the final buffer
$file = $("c:\temp\OnOff_{0}.csv" -f ($current.datefull -replace '/', '.') )
set-content $file $sb.tostring()

Changing the Delimiter in a large CSV file using Powershell

I am in need of a way to change the delimiter in a CSV file from a comma to a pipe. Because of the size of the CSV files (~750 Mb to several Gb), using Import-CSV and/or Get-Content is not an option. What I'm using (and what works, albeit slowly) is the following code:
$reader = New-Object Microsoft.VisualBasic.FileIO.TextFieldParser $source
$reader.SetDelimiters(",")
While(!$reader.EndOfData)
{
$line = $reader.ReadFields()
$details = [ordered]#{
"Plugin ID" = $line[0]
CVE = $line[1]
CVSS = $line[2]
Risk = $line[3]
}
$export = New-Object PSObject -Property $details
$export | Export-Csv -Append -Delimiter "|" -Force -NoTypeInformation -Path "C:\MyFolder\Delimiter Change.csv"
}
This little loop took nearly 2 minutes to process a 20 Mb file. Scaling up at this speed would mean over an hour for the smallest CSV file I'm currently working with.
I've tried this as well:
While(!$reader.EndOfData)
{
$line = $reader.ReadFields()
$details = [ordered]#{
# Same data as before
}
$export.Add($details) | Out-Null
}
$export | Export-Csv -Append -Delimiter "|" -Force -NoTypeInformation -Path "C:\MyFolder\Delimiter Change.csv"
This is MUCH FASTER but doesn't provide the right information in the new CSV. Instead I get rows and rows of this:
"Count"|"IsReadOnly"|"Keys"|"Values"|"IsFixedSize"|"SyncRoot"|"IsSynchronized"
"13"|"False"|"System.Collections.Specialized.OrderedDictionary+OrderedDictionaryKeyValueCollection"|"System.Collections.Specialized.OrderedDictionary+OrderedDictionaryKeyValueCollection"|"False"|"System.Object"|"False"
"13"|"False"|"System.Collections.Specialized.OrderedDictionary+OrderedDictionaryKeyValueCollection"|"System.Collections.Specialized.OrderedDictionary+OrderedDictionaryKeyValueCollection"|"False"|"System.Object"|"False"
So, two questions:
1) Can the first block of code be made faster?
2) How can I unwrap the arraylist in the second example to get to the actual data?
EDIT: Sample data found here - http://pastebin.com/6L98jGNg
This is simple text-processing, so the bottleneck should be disk read speed:
1 second per 100 MB or 10 seconds per 1GB for the OP's sample (repeated to the mentioned size) as measured here on i7. The results would be worse for files with many/all small quoted fields.
The algo is simple:
Read the file in big string chunks e.g. 1MB.
It's much faster than reading millions of lines separated by CR/LF because:
less checks are performed as we mostly/primarily look only for doublequotes;
less iterations of our code executed by the interpreter which is slow.
Find the next doublequote.
Depending on the current $inQuotedField flag decide whether the found doublequote starts a quoted field (should be preceded by , + some spaces optionally) or ends the current quoted field (should be followed by any even number of doublequotes, optionally spaces, then ,).
Replace delimiters in the preceding span or to the end of 1MB chunk if no quotes were found.
The code makes some reasonable assumptions but it may fail to detect an escaped field if its doublequote is followed or preceded by more than 3 spaces before/after field delimiter. The checks won't be too hard to add, and I might've missed some other edge case, but I'm not that interested.
$sourcePath = 'c:\path\file.csv'
$targetPath = 'd:\path\file2.csv'
$targetEncoding = [Text.UTF8Encoding]::new($false) # no BOM
$delim = [char]','
$newDelim = [char]'|'
$buf = [char[]]::new(1MB)
$sourceBase = [IO.FileStream]::new(
$sourcePath,
[IO.FileMode]::open,
[IO.FileAccess]::read,
[IO.FileShare]::read,
$buf.length, # let OS prefetch the next chunk in background
[IO.FileOptions]::SequentialScan)
$source = [IO.StreamReader]::new($sourceBase, $true) # autodetect encoding
$target = [IO.StreamWriter]::new($targetPath, $false, $targetEncoding, $buf.length)
$bufStart = 0
$bufPadding = 4
$inQuotedField = $false
$fieldBreak = [char[]]#($delim, "`r", "`n")
$out = [Text.StringBuilder]::new($buf.length)
while ($nRead = $source.Read($buf, $bufStart, $buf.length-$bufStart)) {
$s = [string]::new($buf, 0, $nRead+$bufStart)
$len = $s.length
$pos = 0
$out.Clear() >$null
do {
$iQuote = $s.IndexOf([char]'"', $pos)
if ($inQuotedField) {
$iDelim = if ($iQuote -ge 0) { $s.IndexOf($delim, $iQuote+1) }
if ($iDelim -eq -1 -or $iQuote -le 0 -or $iQuote -ge $len - $bufPadding) {
# no closing quote in buffer safezone
$out.Append($s.Substring($pos, $len-$bufPadding-$pos)) >$null
break
}
if ($s.Substring($iQuote, $iDelim-$iQuote+1) -match "^(""+)\s*$delim`$") {
# even number of quotes are just quoted quotes
$inQuotedField = $matches[1].length % 2 -eq 0
}
$out.Append($s.Substring($pos, $iDelim-$pos+1)) >$null
$pos = $iDelim + 1
continue
}
if ($iQuote -ge 0) {
$iDelim = $s.LastIndexOfAny($fieldBreak, $iQuote)
if (!$s.Substring($iDelim+1, $iQuote-$iDelim-1).Trim()) {
$inQuotedField = $true
}
$replaced = $s.Substring($pos, $iQuote-$pos+1).Replace($delim, $newDelim)
} elseif ($pos -gt 0) {
$replaced = $s.Substring($pos).Replace($delim, $newDelim)
} else {
$replaced = $s.Replace($delim, $newDelim)
}
$out.Append($replaced) >$null
$pos = $iQuote + 1
} while ($iQuote -ge 0)
$target.Write($out)
$bufStart = 0
for ($i = $out.length; $i -lt $s.length; $i++) {
$buf[$bufStart++] = $buf[$i]
}
}
if ($bufStart) { $target.Write($buf, 0, $bufStart) }
$source.Close()
$target.Close()
Still not what I would call fast, but this is considerably faster than what you have listed by using the -Join operator:
$reader = New-Object Microsoft.VisualBasic.fileio.textfieldparser $source
$reader.SetDelimiters(",")
While(!$reader.EndOfData){
$line = $reader.ReadFields()
$line -join '|' | Add-Content C:\Temp\TestOutput.csv
}
That took a hair under 32 seconds to process a 20MB file. At that rate your 750MB file would be done in under 20 minutes, and bigger files should go at about 26 minutes per gig.

Is there a "split" equivalent in Powershell?

I am looking for a PowerShell equivalent to "split" *NIX command, such as seen here : http://www.computerhope.com/unix/usplit.htm
split outputs fixed-size pieces of input INPUT to files named
PREFIXaa, PREFIXab, ...
This is NOT referring to .split() like for strings. This is to take a LARGE array from pipe and then be stored into X number of files of each with the same number of lines.
In my use case, the content getting piped is list of over 1Million files...
Get-ChildItem $rootPath -Recurse | select -ExpandProperty FullName | foreach{ $_.Trim()} | {...means of splitting file here...}
I don't think it exists a CmdLet doing exactly what you want. but you can quickly build a function doing that.
It's a kind of duplicate of How can I split a text file using PowerShell? and you will find more scripts solutions if you google "powershell split a text file into smaller files"
Here is a peace of code to begin, my advice is to use the .NET class System.IO.StreamReader to handle more efficiently big files.
$sourcefilename = "D:\temp\theFiletosplit.txt"
$desFolderPathSplitFile = "D:\temp\TFTS"
$maxsize = 2 # The number of lines per file
$filenumber = 0
$linecount = 0
$reader = new-object System.IO.StreamReader($sourcefilename)
while(($line = $reader.ReadLine()) -ne $null)
{
Add-Content $desFolderPathSplitFile$filenumber.txt $line
$linecount ++
If ($linecount -eq $maxsize)
{
$filenumber++
$linecount = 0
}
}
$reader.Close()
$reader.Dispose()

Conditional Multiplication in Loop

I've got a script that goes through a CSV file with two formats of data (XY:ZABC or 0.xyz). The values are then saved in a CSV file with one column and variable number of rows. I am trying to setup my script such that, for numbers of value 0.xyz, it will multiply by 1440 and then store it in $Values. The numbers of format XY:ZABC will be stored as they are in $Values as well.
$Values = #(Get-Content *\source.csv -Raw) -split '\s+' |
Where-Object {$_ -like '*:*' -or '0.*'}
"UniqueActiveFaults" | Out-File *\IdealOutput.csv
$Values | Sort-Object -Unique | Out-File *\IdealOutput.csv
I've tried to do this by adding the following code:
foreach ($i in $Values) {
if ($i -lt 1) {$i*1440}
}
I've also tried to do it with a do {$i*1440} while ($I -lt 1) loop, but the result is the number 0.xyz shown 1440 times. I believe it's due to the type of data that $Values is taking, but not sure.
Sample data:
0.12345
00:9090 90:4582
0.12346
0.1145
0.145654
0.5648
01:9045 90:4500
90:4546
BA: 1117 BA:2525
In your code, $Values is an array of strings. The "multiply" operation on a string is to repeat it. To treat it like a number, cast to float before multiplying.
foreach ($i in $Values) {
if ($i -lt 1) {[float]$i * 1440}
}
As Tony Hinkle pointed out, this loop will simply output the result of the operation to the caller (or the console if you don't pipe it). If you want to your array to reflect the change, you have to store it back.
for ($i = 0; $i -lt $Values.length; $i++) {
if ($Values[$i] -lt 1) { [float]$Values[$i] *= 1440 }
}
Be aware this will leave some of your values array as strings and some as floats. Depending on what you do with it, you might have to do further casts.
When you use $i*1440 that is simply telling Powershell to multiply the two values and return the product. If you want to change the value of $i, you need to use $i = $1 * 1440.
You may have other issues as well, but this is assuming that you are getting the correct values assigned to $i from the input.

Powershell search through two lines

I have following Input lines in my notepad file.
example 1 :
//UNION TEXT=firststring,FRIEND='ABC,Secondstring,ABAER'
example 2 :
//UNION TEXT=firststring,
// FRIEND='ABC,SecondString,ABAER'
Basically, one line can span over two or three lines. If last character is , then it is treated as continuation character.
In example 1 - Text is in one line.
In example 2 - same Text is in two lines.
In example 1, I can probably write below code. However, I do not know how to do this if 'Input text' spans over two or three lines based on continuation character ,
$result = Get-Content $file.fullName | ? { ($_ -match firststring) -and ($_ -match 'secondstring')}
I think I need a way so that I can search text in multipl lines with '-and' condition. something like that...
Thanks!
You could read the entire content of the file, join the continued lines, and then split the text line-wise:
$text = [System.IO.File]::ReadAllText("C:\path\to\your.txt")
$text -replace ",`r`n", "," -split "`r`n" | ...
# get the full content as one String
$content = Get-Content -Path $file.fullName -Raw
# join continued lines, split content and filter
$content -replace '(?<=,)\s*' -split '\r\n' -match 'firststring.+secondstring'
If file is large and you want to avoid loading entire file into memory you might want to use good old .NET ReadLine:
$reader = [System.IO.File]::OpenText("test.txt")
try {
$sb = New-Object -TypeName "System.Text.StringBuilder";
for(;;) {
$line = $reader.ReadLine()
if ($line -eq $null) { break }
if ($line.EndsWith(','))
{
[void]$sb.Append($line)
}
else
{
[void]$sb.Append($line)
# You have full line at this point.
# Call string match or whatever you find appropriate.
$fullLine = $sb.ToString()
Write-Host $fullLine
[void]$sb.Clear()
}
}
}
finally {
$reader.Close()
}
If file is not large (let's say < 1G) Ansgar Wiechers answer should do the trick.