powershell 2 working with multiple csv files

powershell 2 working with multiple csv files - powershell

I have 2 csv files. One with names and the other with a list of phone numbers. i'm trying to loop through each csv file and run a script on the user and name.
file examples
Name Number
a 1
b 2
c 3
So I need to run a script like a + 1, b + 2, c + 3
trying to use a foreach loop but it not working correctly it's nested incorrectly and can't figure it out.
if ($users){
foreach ($u in $users)
{
$username= ($u.'Mail')
foreach ($n in $numbers)
{
$number = ($n.'ID')
}

You're close, in that you need to use a loop, but you want one loop to get data from both files, so in this case you're best off using a For loop like this:
$Combined = For($i = 0; $i -lt $users.count; $i++) {
$Props = #{
Mail = $users[$i].Mail
ID = $numbers[$i].ID
}
New-Object PSObject -Prop $Props
}
$Combined | Export-Csv C:\Path\To\Combined.csv -NoTypeInfo

Related

Break Excel File by column value

I have an Excel data source file which contains some customer records.
Now I would like to break the large file into small batches.
I would like to break Excel file into four batches by the customer name column by not just evenly break it down.
I have a source file that have a column called "Customer Name", which I would like use as an indicator to break the source file. Currently I write a Power Shell script but I got stuck ,the current method I use is
Get the column value on the customer name column.
Unique the customer array on the customer name
Filter the excel by the customer name and break down into batches
Below is my script but I stuck on do not know how to filter the items by the company name array.
# Load the Microsoft Excel Com Object
$Excel = New-Object -ComObject Excel.Application
# Open the workbook
$Workbook = $Excel.Workbooks.Open("XXXXX.xlsx")
# Get the first worksheet
$Worksheet = $Workbook.Sheets.Item(1)
# Get the range of cells that contain the customer names
$Range = $Worksheet.Range("D1:D1000")
# Get the values of the cells and store them in a variable
$Values = $Range.Value2
# Sort the values and remove duplicates
$UniqueValues = ($Values | Sort-Object) | Select-Object -Unique
# Clear the original range of cells
$Range.Clear()
Write-Output $UniqueValues
$ArrayLength = $UniqueValues.Length
$numberofrecordperbatch=$UniqueValues.Length/4
Write-Output $numberofrecordperbatch
#--------Divided into 4 batches-------------
function DivideList {
param(
[object[]]$list,
[int]$chunkSize
)
$j=1
$batch= #()
for ($i = 0; $i -lt $list.Count; $i += $chunkSize) {
$j+=1
$batch =($list | select -Skip $i -First $chunkSize)
#$companynamearray|Export-Excel -Path "XXXX.xlsx"
Write-Output $i
Write-Output $j
Write-Output $chunkSize
}
}
DivideList -list $UniqueValues -chunkSize $numberofrecordperbatch| foreach { $_ -join ',' }
Write-Output "Start"
#---------Filter------------
# Get the first worksheet
# Get the range of cells that contain the customer names
$Range2 = $Worksheet.Range("A1:X1000")
# Get the values of the cells and store them in a variable
$Value2 = $Range2.Value2
Write-Output $Value2

Reducing amout of lines in variable within loop in Powershell

I have a txt file containing 10000 lines. Each line is an ID.
Within every loop iteration I want to select 100 lines, put them in a special format and do something. I want to do this until the document is finished.
The txt looks like this:
406232C1331283
4062321N022075
4062321H316457
Current approach:
$liste = get-content "C:\x\input.txt"
foreach ($item in $liste) {
azcopy copy $source $target --include-pattern "*$item*" --recursive=true
}
The system will go throug the TXT file and make a copy request for every name it finds in the TXT file. Now the system is able to handle like 300 search-patterns in one request. like
azcopy copy $source $target --include-pattern "*id1*;*id2*;*id3*"
How can I extract 300 items from the document at once, separate them with semicolon and embedd them in wildcard? I tried to pipe everyting in a variable and work with -skip.
But it seems not easy to handle :(

Use the -ReadCount parameter to Get-Content to send multiple lines down the pipeline:
Get-Content "C:\x\input.txt" -ReadCount 300 | ForEach-Object {
$wildCards = ($_ | ForEach-Object { "*$_*" } -join ';'
azcopy copy $source $target --include-pattern $wildCards --recursive=true
}

Do you want 100 or 300 at a time? ;-)
I'm not sure if I really got what the endgoal is but to slice a given amount of elements in chunks of a certain size you can use a for loop like this:
$liste = Get-Content -Path 'C:\x\input.txt'
for ($i = 0; $i -lt $Liste.Count; $i += 100) {
$Liste[$i..$($i + 99)]
}
Now if I got it right you want to join these 100 elements and surround them with certain cahrachters ... this might work:
'"*' + ($Liste[$i..$($i + 99)] -join '*;*') + '*"'
Together it would be this:
$liste = Get-Content -Path 'C:\x\input.txt'
for ($i = 0; $i -lt $Liste.Count; $i += 100) {
'"*' + ($Liste[$i..$($i + 99)] -join '*;*') + '*"'
}

There's many ways, here's one of them...
First I would split array to chunks of 100 elements each, using this helper function:
Function Split-Array ($list, $count) {
$aggregateList = #()
$blocks = [Math]::Floor($list.Count / $count)
$leftOver = $list.Count % $count
for($i=0; $i -lt $blocks; $i++) {
$end = $count * ($i + 1) - 1
$aggregateList += #(,$list[$start..$end])
$start = $end + 1
}
if($leftOver -gt 0) {
$aggregateList += #(,$list[$start..($end+$leftOver)])
}
$aggregateList
}
For example to split your list into chunks of 100 do this:
$Splitted = Split-Array $liste -count 100
Then use foreach to iterate each chunk and join its elements for the pattern you need:
foreach ($chunk in $Splitted)
{
$Pattern = '"' + (($chunk | % {"*$_*"}) -join ";") + '"'
azcopy copy $source $target --include-pattern $Pattern --recursive=true
}

How to sort 30Million csv records in Powershell

I am using oledbconnection to sort the first column of csv file. Oledb connection is executed up to 9 million records within 6 min duration successfully. But when am executing 10 million records, getting following alert message.
Exception calling "ExecuteReader" with "0" argument(s): "The query cannot be completed. Either the size of the query result is larger than the maximum size of a database (2 GB), or
there is not enough temporary storage space on the disk to store the query result."
is there any other solution to sort 30 million using Powershell?
here is my script
$OutputFile = "D:\Performance_test_data\output1.csv"
$stream = [System.IO.StreamWriter]::new( $OutputFile )
$sb = [System.Text.StringBuilder]::new()
$sw = [Diagnostics.Stopwatch]::StartNew()
$conn = New-Object System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source='D:\Performance_test_data\';Extended Properties='Text;HDR=Yes;CharacterSet=65001;FMT=Delimited';")
$cmd=$conn.CreateCommand()
$cmd.CommandText="Select * from 1crores.csv order by col6"
$conn.open()
$data = $cmd.ExecuteReader()
echo "Query has been completed!"
$stream.WriteLine( "col1,col2,col3,col4,col5,col6")
while ($data.read())
{
$stream.WriteLine( $data.GetValue(0) +',' + $data.GetValue(1)+',' + $data.GetValue(2)+',' + $data.GetValue(3)+',' + $data.GetValue(4)+',' + $data.GetValue(5))
}
echo "data written successfully!!!"
$stream.close()
$sw.Stop()
$sw.Elapsed
$cmd.Dispose()
$conn.Dispose()

You can try using this:
$CSVPath = 'C:\test\CSVTest.csv'
$Delimiter = ';'
# list we use to hold the results
$ResultList = [System.Collections.Generic.List[Object]]::new()
# Create a stream (I use OpenText because it returns a streamreader)
$File = [System.IO.File]::OpenText($CSVPath)
# Read and parse the header
$HeaderString = $File.ReadLine()
# Get the properties from the string, replace quotes
$Properties = $HeaderString.Split($Delimiter).Replace('"',$null)
$PropertyCount = $Properties.Count
# now read the rest of the data, parse it, build an object and add it to a list
while ($File.EndOfStream -ne $true)
{
# Read the line
$Line = $File.ReadLine()
# split the fields and replace the quotes
$LineData = $Line.Split($Delimiter).Replace('"',$null)
# Create a hashtable with the properties (we convert this to a PSCustomObject later on). I use an ordered hashtable to keep the order
$PropHash = [System.Collections.Specialized.OrderedDictionary]#{}
# if loop to add the properties and values
for ($i = 0; $i -lt $PropertyCount; $i++)
{
$PropHash.Add($Properties[$i],$LineData[$i])
}
# Now convert the data to a PSCustomObject and add it to the list
$ResultList.Add($([PSCustomObject]$PropHash))
}
# Now you can sort this list using Linq:
Add-Type -AssemblyName System.Linq
# Sort using propertyname (my sample data had a prop called "Name")
$Sorted = [Linq.Enumerable]::OrderBy($ResultList, [Func[object,string]] { $args[0].Name })
Instead of using import-csv I've written a quick parser which uses a streamreader and parses the CSV data on the fly and puts it in a PSCustomObject.
This is then added to a list.
edit: fixed the linq sample

Putting the performance aside and at least come to a solution that works (meaning one that doesn't hang due to memory shortage) I would rely on the PowerShell pipeline. The issue is thou that for sorting an object you will need to stall te pipeline as the last object might potentially become the first object.
To resolve this part, I would do a coarse division on the first character(s) of the concern property first. Once that is done, fine sort each coarse division and append the results:
Function Sort-BigObject {
[CmdletBinding()] param(
[Parameter(ValueFromPipeLine = $True)]$InputObject,
[Parameter(Position = 0)][String]$Property,
[ValidateRange(1,9)]$Coarse = 1,
[System.Text.Encoding]$Encoding = [System.Text.Encoding]::Default
)
Begin {
$TemporaryFiles = [System.Collections.SortedList]::new()
}
Process {
if ($InputObject.$Property) {
$Grain = $InputObject.$Property.SubString(0, $Coarse)
if (!$TemporaryFiles.Contains($Grain)) { $TemporaryFiles[$Grain] = New-TemporaryFile }
$InputObject | Export-Csv $TemporaryFiles[$Grain] -Encoding $Encoding -Append
} else { $InputObject.$Property }
}
End {
Foreach ($TemporaryFile in $TemporaryFiles.Values) {
Import-Csv $TemporaryFile -Encoding $Encoding | Sort-Object $Property
Remove-Item -LiteralPath $TemporaryFile
}
}
}
Usage
(Don't assign the stream to a variable and don't use parenthesis.)
Import-Csv .\1crores.csv | Sort-BigObject <PropertyName> | Export-Csv .\output.csv
If the temporary files still get too big to handle, you might need to increase the -Coarse parameter
Caveats (improvement considerations)
Objects with an empty sort property will be immediately outputted
The sort column is presumed to be a (single) string column
I presume the performance is poor (I didn't do a full test on 30 million records, but 10.000 records take about 8 second which means about 8 hours). Consider replacing native PowerShell cmdlets with .Net streaming methods. buffer/cache file input and outputs, parallel processing?

You could try SQLite:
$OutputFile = "D:\Performance_test_data\output1.csv"
$sw = [Diagnostics.Stopwatch]::StartNew()
sqlite3 output1.db '.mode csv' '.import 1crores.csv 1crores' '.headers on' ".output $OutputFile" 'Select * from 1crores order by 最終アクセス日時'
echo "data written successfully!!!"
$sw.Stop()
$sw.Elapsed

I have added a new answer as this is a complete different approach to tackle this issue.
Instead of creating temporary files (which presumable causes a lot of file opens and closures), you might consider to create a ordered list of indices and than go over the input file (-FilePath) multiple times and each time, process a selective number of lines (-BufferSize = 1Gb, you might have to tweak this "memory usage vs. performance" parameter):
Function Sort-Csv {
[CmdletBinding()] param(
[string]$InputFile,
[String]$Property,
[string]$OutputFile,
[Char]$Delimiter = ',',
[System.Text.Encoding]$Encoding = [System.Text.Encoding]::Default,
[Int]$BufferSize = 1Gb
)
Begin {
if ($InputFile.StartsWith('.\')) { $InputFile = Join-Path (Get-Location) $InputFile }
$Index = 0
$Dictionary = [System.Collections.Generic.SortedDictionary[string, [Collections.Generic.List[Int]]]]::new()
Import-Csv $InputFile -Delimiter $Delimiter -Encoding $Encoding | Foreach-Object {
if (!$Dictionary.ContainsKey($_.$Property)) { $Dictionary[$_.$Property] = [Collections.Generic.List[Int]]::new() }
$Dictionary[$_.$Property].Add($Index++)
}
$Indices = [int[]]($Dictionary.Values | ForEach-Object { $_ })
$Dictionary = $Null # we only need the sorted index list
}
Process {
$Start = 0
$ChunkSize = [int]($BufferSize / (Get-Item $InputFile).Length * $Indices.Count / 2.2)
While ($Start -lt $Indices.Count) {
[System.GC]::Collect()
$End = $Start + $ChunkSize - 1
if ($End -ge $Indices.Count) { $End = $Indices.Count - 1 }
$Chunk = #{}
For ($i = $Start; $i -le $End; $i++) { $Chunk[$Indices[$i]] = $i }
$Reader = [System.IO.StreamReader]::new($InputFile, $Encoding)
$Header = $Reader.ReadLine()
$i = $Start
$Count = 0
For ($i = 0; ($Line = $Reader.ReadLine()) -and $Count -lt $ChunkSize; $i++) {
if ($Chunk.Contains($i)) { $Chunk[$i] = $Line }
}
$Reader.Dispose()
if ($OutputFile) {
if ($OutputFile.StartsWith('.\')) { $OutputFile = Join-Path (Get-Location) $OutputFile }
$Writer = [System.IO.StreamWriter]::new($OutputFile, ($Start -ne 0), $Encoding)
if ($Start -eq 0) { $Writer.WriteLine($Header) }
For ($i = $Start; $i -le $End; $i++) { $Writer.WriteLine($Chunk[$Indices[$i]]) }
$Writer.Dispose()
} else {
$Start..$End | ForEach-Object { $Header } { $Chunk[$Indices[$_]] } | ConvertFrom-Csv -Delimiter $Delimiter
}
$Chunk = $Null
$Start = $End + 1
}
}
}
Basic usage
Sort-Csv .\Input.csv <PropertyName> -Output .\Output.csv
Sort-Csv .\Input.csv <PropertyName> | ... | Export-Csv .\Output.csv
Note that for 1Crones.csv it will probably just export the full file in once unless you set the -BufferSize to a lower amount e.g. 500Kb.

I downloaded gnu sort.exe from here: http://gnuwin32.sourceforge.net/packages/coreutils.htm It also requires libiconv2.dll and libintl3.dll from the dependency zip. I basically did this within cmd.exe, and it used a little less than a gig of ram and took about 5 minutes. It's a 500 meg file of about 30 million random numbers. This command can also merge sorted files with --merge. You can also specify begin and end key position for sorting --key. It automatically uses temp files.
.\sort.exe < file1.csv > file2.csv
Actually it works in a similar way with the windows sort from the cmd prompt. The windows sort also has a /+n option to specify what character column to start the sort by.
sort.exe < file1.csv > file2.csv

Why Isn't This Counting Correctly | PowerShell

Right now, I have a CSV file which contains 3,800+ records. This file contains a list of server names, followed by an abbreviation stating if the server is a Windows server, Linux server, etc. The file also contains comments or documentation, where each line starts with "#", stating it is a comment. What I have so far is as follows.
$file = Get-Content .\allsystems.csv
$arraysplit = #()
$arrayfinal = #()
[int]$windows = 0
foreach ($thing in $file){
if ($thing.StartsWith("#")) {
continue
}
else {
$arraysplit = $thing.Split(":")
$arrayfinal = #($arraysplit[0], $arraysplit[1])
}
}
foreach ($item in $arrayfinal){
if ($item[1] -contains 'NT'){
$windows++
}
else {
continue
}
}
$windows
The goal of this script is to count the total number of Windows servers. My issue is that the first "foreach" block works fine, but the second one results in "$Windows" being 0. I'm honestly not sure why this isn't working. Two example lines of data are as follows:
example:LNX
example2:NT

if the goal is to count the windows servers, why do you need the array?
can't you just say something like
foreach ($thing in $file)
{
if ($thing -notmatch "^#" -and $thing -match "NT") { $windows++ }
}

$arrayfinal = #($arraysplit[0], $arraysplit[1])
This replaces the array for every run.
Changing it to += gave another issue. It simply appended each individual element. I used this post's info to fix it, sort of forcing a 2d array: How to create array of arrays in powershell?.
$file = Get-Content .\allsystems.csv
$arraysplit = #()
$arrayfinal = #()
[int]$windows = 0
foreach ($thing in $file){
if ($thing.StartsWith("#")) {
continue
}
else {
$arraysplit = $thing.Split(":")
$arrayfinal += ,$arraysplit
}
}
foreach ($item in $arrayfinal){
if ($item[1] -contains 'NT'){
$windows++
}
else {
continue
}
}
$windows
1
I also changed the file around and added more instances of both NT and other random garbage. Seems it works fine.

I'd avoid making another ForEach loop for bumping count occurrences. Your $arrayfinal also rewrites everytime, so I used ArrayList.
$file = Get-Content "E:\Code\PS\myPS\2018\Jun\12\allSystems.csv"
$arrayFinal = New-Object System.Collections.ArrayList($null)
foreach ($thing in $file){
if ($thing.StartsWith("#")) {
continue
}
else {
$arraysplit = $thing -split ":"
if($arraysplit[1] -match "NT" -or $arraysplit[1] -match "Windows")
{
$arrayfinal.Add($arraysplit[1]) | Out-Null
}
}
}
Write-Host "Entries with 'NT' or 'Windows' $($arrayFinal.Count)"
I'm not sure if you want to keep 'Example', 'example2'... so I have skipped adding them to arrayfinal, assuming the goal is to count "NT" or "Windows" occurrances

The goal of this script is to count the total number of Windows servers.
I'd suggest the easy way: using cmdlets built for this.
$csv = Get-Content -Path .\file.csv |
Where-Object { -not $_.StartsWith('#') } |
ConvertFrom-Csv
#($csv.servertype).Where({ $_.Equals('NT') }).Count
# Compatibility mode:
# ($csv.servertype | Where-Object { $_.Equals('NT') }).Count
Replace servertype and 'NT' with whatever that header/value is called.

Compare two csv entries in powershell

I'm just learning powershell, but have run up on this error that has me stumped. The objective is to take a csv file and ascertain how many groups there are in the file. There will be multiple entries under each group. The end result would be to split the file into different arrays/dictionaries/whathaveyou (i started with powershell 3hrs ago..) based on the groups, keeping the TaxonIDs intact, then export those to separate files. But for now, I'm just at the comparison step.
My practice data look like
Group,TaxonID
AA,1
AA,2
BB,3
BB,4
and true data look like:
Group,TaxonID
Bilateria_Ropsin,Mus_musculus_Rhabdomeric_MEL
Bilateria_Ropsin,ROp_OG3TodaroP
To do this, I tried to compare the group in one row with the group in the next row. if they differ, then I add one to a variable for later use. Here's what i've got to do the comparison:
Set-Variable -name book -value C:\Users\XXX\Documents\Book1.csv
$work = Import-Csv $book
$numgroups = 0
$i=0
foreach ($Group in $work) {
$Ogroup = $work[$i] | select-object{$_.Group}
$nextGroup = $work[$i+1] | select-object{$_.Group}
$compare = $Ogroup.Equals($nextGroup)
$compare
$i++
}
When i print out $Ogroup and $nextGroup, they give me the proper pairs (AA AA, AA BB, BB BB), but $compare always prints out false. Using Compare-Object gives me an error about $nextGroup being null..so I opted for using .Equals(). CompareTo() throws an error about it not being a valid method.
I'm stumped and need help.

You shoud use Group-Object CmdLet
Set-Variable -name book -value C:\Users\XXX\Documents\Book1.csv
$work = Import-Csv $book
$groups = $work | Group-Object -Property Group
$groups.count
$groups[0].name
$groups[0].Group[0]
for ($i=0 ; $i -lt $groups.count ; $i++) { $groups[$i].name}
for ($i = 0 ; $i -lt $groups.count ; $i++)
{
$groups[$i].name;
for ($j=0 ; $j -lt $groups[$i].count;$j++)
{
$groups[$i].group[$j]
}
}

Comparing isn't needed. Split the groups out first:
$book = "$env:USERPROFILE\Documents\Book1.csv"
$work = Import-CSV $book
$Groups = $work | Group-Object Group
ForEach($Group in $Groups){
"Group Name: "+$Group.name
"Taxon IDs: "
$Group.group.TaxonID
}
If you want to save each group to a file, within that ForEach loop you can do $Group|export-csv "c:\path\$($Group.name).csv" -notypeinfo
Edit: Sorry, was working on an answer when security called me over the intercom and said a water main burst under my truck and they needed me to go move my vehicle. Figured not losing my vehicle to a sinkhole was more important than this.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

powershell 2 working with multiple csv files - powershell

Related

Break Excel File by column value

Reducing amout of lines in variable within loop in Powershell

How to sort 30Million csv records in Powershell

Why Isn't This Counting Correctly | PowerShell

Compare two csv entries in powershell

Categories

Resources