Read CSV sheet in batches using Powershell

Read CSV sheet in batches using Powershell - powershell

Recently I had written script which read data from csv and do some check, as csv data is huge i want to run it in batches so first 50 lines and execute them and write it to one folder and then execute next 50 lines and write output in another folder
below is the line i used to import and export csv file
$P = Import-Csv -Path .\Processes.csv
and export using
Export-Csv -Path "Data"

To compliment the helpful answer from Ranadip Dutta and answer the question in the comment: "how should make sure my next count is next 50 and so on as records may be 200?"
You might use this Create-Batch function, see also: Slice a PowerShell array into groups of smaller arrays:
Install-Script -Name Create-Batch
Example:
$BatchNr = 1
Import-Csv -Path .\Processes.csv |Create-Batch -Size 50 |ForEach-Object {
$_ |ForEach-Object {
$_ # do something with each item in the batch of 50
} |Export-Csv ".\Batch$BatchNr.csv"
$BatchNr++
}

TotalCount is the best way to deal these scenarios. Instead of importing the csv, my recommendation would be to use Get-content and pick the necessary lines required:
Get-Content .\Processes.csv -TotalCount 50 | Out-File .\Processes_first50.csv
Another recommendation would be to use pipeline and then Select -First
Get-Content .\Processes.csv | select -First 50 | Out-File .\Processes_first50.csv
The last option is to use the -head parameter:
Get-Content .\Processes.csv -Head 50 > .\Processes_first50.csv
The > is the redirecting the output to a file which is similar to outfile but much more elegant.
Hope it helps.

Related

Why Import-Csv's Sort-Object is slow for 1 million records

I need to sort first column (column may differ) of csv files.
As my csv files have more than a million records, for executing below command , it is taking 10 minutes.
is there any other way to optimize the code to speed up the execution?
$CsvFile = "D:\Performance\10_lakh_records.csv"
$OutputFile ="D:\Performance\output.csv"
Import-Csv $CsvFile | Sort-Object { $_.psobject.Properties.Value[1] } | Export-Csv -Encoding default -Path $OutputFile -NoTypeInformation

You could try using the [array]::Sort() static method which might prove faster than Sort-Object, although it does take an extra step to first get a one-dimensional array of all values to sort upon..
Try
$CsvFile = "D:\Performance\10_lakh_records.csv"
$OutputFile = "D:\Performance\output.csv"
# import the data
$data = Import-Csv -Path $CsvFile
# determine the column name to sort on. In this demo the first column
# of course, if you know the column name you don't need that and can simply use the name as-is
$column = $data[0].PSObject.Properties.Name[0]
# use the Sort(Array, Array) overload method to sort the data by the
# values of the column you have chosen.
# see https://learn.microsoft.com/en-us/dotnet/api/system.array.sort?view=net-5.0#System_Array_Sort_System_Array_System_Array_
[array]::Sort($data.$column, $data)
$data | Export-Csv -Encoding default -Path $OutputFile -NoTypeInformation

Read CSV row 1 columns and save them to variables

I would like to read data from csv or another txt files. Data should been read only from row 1 and few columns on row 1 and save them to variables and after saving delete the row. Now I have done it like this:
Get-ChildItem -Path C:\path | ForEach-Object -Process {
$YourContent = Get-Content -Path $_.FullName
$YourVariable = $YourContent | Select-Object -First 1
$YourContent | Select-Object -Skip 1 | Set-Content -Path $_.FullName
My problem is that my variable prints out like this :
Elvis;867.5390;elvis#geocities.com
So I would like to save each variable to its own column. Example what csv could look:
Elvis | 867.5309 | Elvis#Geocities.com
Sammy | 555.1234 | SamSosa#Hotmail.com

Use Import-Csv instead of Get-Content:
Import-Csv file.csv -Delimiter ";" -Header A, B, C

here's one way to do what i think you want.
the 1st 8 lines make a file to work with. [grin]
line 10 reads in that file
lines 11-13 convert the 1st line into an object & remove the unwanted property
lines 14-15 grab all BUT the 1st line & send it to overwrite the source file
the remaining lines show what was done [grin]
Code:
$FileName = "$env:TEMP\Pimeydentimo.txt"
# create a file to work with
#'
Alfa;123.456;Some unwanted info;Alfa#example.com
Bravo;234.567;More info that can be dropped;Bravo#example.com
Charlie;345.678;This is also ignoreable;Charlie#example.com
'# | Set-Content -LiteralPath $FileName
$InStuff = Get-Content -LiteralPath $FileName
$TempObject = $InStuff[0] |
ConvertFrom-Csv -Delimiter ';' -Header 'Name', 'Number', 'DropThisOne', 'Email' |
Select-Object -Property * -ExcludeProperty DropThisOne
$InStuff[1..$InStuff.GetUpperBound(0)] |
Set-Content -LiteralPath $FileName
$InStuff
'=' * 30
$TempObject
'=' * 30
Get-Content -LiteralPath $FileName
output ...
Alfa;123.456;Some unwanted info;Alfa#example.com
Bravo;234.567;More info that can be dropped;Bravo#example.com
Charlie;345.678;This is also ignoreable;Charlie#example.com
==============================
Name Number Email
---- ------ -----
Alfa 123.456 Alfa#example.com
==============================
Bravo;234.567;More info that can be dropped;Bravo#example.com
Charlie;345.678;This is also ignoreable;Charlie#example.com

Thanks for the answers!
I try to clarify a bit more what i was trying to do. Answers might do it already, but I'm not yet that good in Powershell and learning still a alot.
If I have csv or any other txt file, i would want to read the first row of the file. The row contains more than one piece of information. I want also save each piece of information to Variables. After saving information to variables, I would like to delete the row.
Example:
Car Model Year
Ford Fiesta 2015
Audi A6 2018
In this example, i would like to save Ford, Fiesta and 2015 to variables (row 1)($Card, $Model, $Year) and after it delete the row. The 2nd row should not be deleted, because it is used later on

Adding columns and manipulating existing column values in csv file using powershell

I have a lot of csv files with values arranged like so:
X1,Y1
X2,Y2
...,...
Xn,Yn
I find it very tedious processing these with excel, so I want to setup a batch script to process these files such that they appear like this:
#where N is a specified value like 65536
X1,N-Y1,1
X2,N-Y2,2
...,...,...
Xn,N-Yn,n
I have only recently started using powershell for image processing (really simple scripts) and file name appending, so I am not certain how to go about this. A lot of the scripts I have encountered looking to answer this question use csv files with titles per column whereas my files are just arrays of values without object titles in the first row. I would like to avoid running multiple scripts to add titles.
My bonus question is something I have yet to find a good answer to at all, and is the most tedious part of processing. Using excels sort function, I usually change the order of the Yn values in Col2 such that they are sorted in the exported csv like so:
X1,N-Yn,n
...,...,...
Xn-1,N-Y2,2
Xn,N-Y1,1
Using the Col3 values as the sorting order (largest to smallest), then I delete this column so that the final saved csv only contains the first two columns (crucial step). Any help at all would be greatly appreciated, I apologize for the long-winded-ness of this question.

I have encountered looking to answer this question use csv files with titles per column whereas my files are just arrays of values without object titles in the first row.
The -Header parameter of Import-Csv is for adding column headers when the file does not contain them. It takes an array of strings, of however many columns there are.
I would like to avoid running multiple scripts to add titles.
If you couldn't use -Header, you could read the lines with Get-Content into memory, add a header in memory, and then use ConvertFrom-CSV all in one script.
That said, if I'm reading it rightly, you want:
No headers in the input file, and I imagine no headers in the output file
The whole point of adding the third column and sorting and removing it is just to reverse the lines?
The only column you keep is column 1?
I wouldn't use Import-Csv for this, it won't make it much nicer.
$n = 65536
# Read lines into a list, and reverse it
$lines = [Collections.Generic.List[String]](Get-Content -LiteralPath 'c:\test\test.csv')
$lines.Reverse()
# Split each line into two, create a new line with X and N-Y
# write new lines to an output file
$lines | ForEach-Object {
$x, $y = $_.split(',')
"$x,$($n - [int]$y)"
} | Set-Content -LiteralPath 'c:\test\output.csv' -Encoding Ascii
If you do want to use CSV handling, then:
$n = 65536
$counter = 1
Import-Csv -LiteralPath 'C:\test\test.csv' -Header 'ColX', 'ColY' |
Add-Member -MemberType ScriptProperty -Name 'ColN-Y' -Value {$n - $_.ColY} -PassThru |
Add-Member -MemberType ScriptProperty -Name 'N' -Value {$script:counter++} -PassThru |
Sort-Object -Property 'N' -Descending |
Select-Object -Property 'ColX', 'ColN-Y' |
Export-Csv -LiteralPath 'c:\test\output.csv' -NoTypeInformation
But the output will have CSV headers and double-quoted values.

I would try something like, by extending the original table with a calculatable script-property as a new column:
#Your N number
$N = 65536
# Import CSV file without header columns
$table = Import-Csv -Header #("colX","colY") `
-Delimiter ',' `
-Path './numbers.csv'
Write-Host "Original table"
$table | Format-Table
# Manipulate table
$newtable = $table |
Add-Member -MemberType ScriptProperty -Name colNX -Value { $N-$this.colX } - PassThru
Write-Host "New table"
$newtable | Format-Table

powershell: Write specific rows from files to formatted csv

The following code gives me the correct output to console. But I would need it in a csv file:
$array = #{}
$files = Get-ChildItem "C:\Temp\Logs\*"
foreach($file in $files){
foreach($row in (Get-Content $file | select -Last 2)){
if($row -like "Total peak job memory used:*"){
$sp_memory = $row.Split(" ")[5]
$array.Add(($file.BaseName),([double]$sp_memory))
break
}
}
}
$array.GetEnumerator() | sort Value -Descending |Format-Table -AutoSize
current output (console):
required output (csv):
In order to increase performance I would like to avoid the array and write output directly to csv (no append).
Thanks in advance!

Change your last line to this -
$array.GetEnumerator() | sort Value -Descending | select #{l='FileName'; e={$_.Name}}, #{l='Memory (MB)'; e={$_.Value }} | Export-Csv -path $env:USERPROFILE\Desktop\Output.csv -NoTypeInformation
This will give you a csv file named Output.csv on your desktop.
I am using Calculated properties to change the column headers to FileName and Memory (MB) and piping the output of $array to Export-Csv cmdlet.
Just to let you know, your variable $array is of type Hashtable which won't store duplicate keys. If you need to store duplicate key/value pairs, you can use arrays. Just suggesting! :)

Add Column to CSV Windows PowerShell

I have a fairly standard csv file with headers I want to add a new column & set all the rows to the same data.
Original:
column1, column2
1,b
2,c
3,5
After
column1, column2, column3
1,b, setvalue
2,c, setvalue
3,5, setvalue
I can't find anything on this if anybody could point me in the right direction that would be great. Sorry very new to Power Shell.

Here's one way to do that using Calculated Properties:
Import-Csv file.csv |
Select-Object *,#{Name='column3';Expression={'setvalue'}} |
Export-Csv file.csv -NoTypeInformation
You can find more on calculated properties here: http://technet.microsoft.com/en-us/library/ff730948.aspx.
In a nutshell, you import the file, pipe the content to the Select-Object cmdlet, select all exiting properties (e.g '*') then add a new one.

The ShayLevy's answer also works for me!
If you don't want to provide a value for each object yet the code is even easier...
Import-Csv file.csv |
Select-Object *,"column3" |
Export-Csv file.csv -NoTypeInformation

None of the scripts I've seen are dynamic in nature, so they're fairly limited in their scope & what you can do with them.. that's probably because most PS Users & even Power Users aren't programmers. You very rarely see the use of arrays in Powershell. I took Shay Levy's answer & improved upon it.
Note here: The Import needs to be consistent (two columns for instance), but it would be fairly easy to modify this to dynamically count the columns & generate headers that way too. For this particular question, that wasn't asked. Or simply don't generate a header unless it's needed.
Needless to say the below will pull in as many CSV files that exist in the folder, add a header, and then later strip it. The reason I add the header is for consistency in the data, it makes manipulating the columns later down the line fairly straight forward too (if you choose to do so). You can modify this to your hearts content, feel free to use it for other purposes too. This is generally the format I stick with for just about any of my Powershell needs. The use of a counter basically allows you to manipulate individual files, so there's a lot of possibilities here.
$chargeFiles = 'C:\YOURFOLDER\BLAHBLAH\'
$existingReturns = Get-ChildItem $chargeFiles
for ($i = 0; $i -lt $existingReturns.count; $i++)
{
$CSV = Import-Csv -Path $existingReturns[$i].FullName -Header Header1,Header2
$csv | select *, #{Name='Header3';Expression={'Header3 Static'}}
| select *, #{Name='Header4';Expression={'Header4 Static Tet'}}
| select *, #{Name='Header5';Expression={'Header5 Static Text'}}|
CONVERTTO-CSV -DELIMITER "," -NoTypeInformation |
SELECT-OBJECT -SKIP 1 | % {$_ -replace '"', ""} |
OUT-FILE -FilePath $existingReturns[$i].FullName -FORCE -ENCODING ASCII
}

You could also use Add-Member:
$csv = Import-Csv 'input.csv'
foreach ($row in $csv)
{
$row | Add-Member -NotePropertyName 'MyNewColumn' -NotePropertyValue 'MyNewValue'
}
$csv | Export-Csv 'output.csv' -NoTypeInformation

For some applications, I found that producing a hashtable and using the .values as the column to be good (it would allow for cross reference validation against another object that was being enumerated).
In this case, #powershell on freenode brought my attention to an ordered hashtable (since the column header must be used).
Here is an example without any validation the .values
$newcolumnobj = [ordered]#{}
#input data into a hash table so that we can more easily reference the `.values` as an object to be inserted in the CSV
$newcolumnobj.add("volume name", $currenttime)
#enumerate $deltas [this will be the object that contains the volume information `$volumedeltas`)
# add just the new deltas to the newcolumn object
foreach ($item in $deltas){
$newcolumnobj.add($item.volume,$item.delta)
}
$originalcsv = #(import-csv $targetdeltacsv)
#thanks to pscookiemonster in #powershell on freenode
for($i=0; $i -lt $originalcsv.count; $i++){
$originalcsv[$i] | Select-Object *, #{l="$currenttime"; e={$newcolumnobj.item($i)}}
}
Example is related to How can I perform arithmetic to find differences of values in two CSVs?

create a csv file with nothin in it
$csv >> "$PSScriptRoot/dpg.csv"
define the csv file's path. here $psscriptroot is the root of the script
$csv = "$PSScriptRoot/dpg.csv"
now add columns to it
$csv | select vds, protgroup, vlan, ports | Export-Csv $csv

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Read CSV sheet in batches using Powershell - powershell

Related

Why Import-Csv's Sort-Object is slow for 1 million records

Read CSV row 1 columns and save them to variables

Adding columns and manipulating existing column values in csv file using powershell

powershell: Write specific rows from files to formatted csv

Add Column to CSV Windows PowerShell

Categories

Resources