I have a txt files (semicolon separated) containing over 3 million records where columns 1 to 4 have some general information. Columns 5 and 6 have detailed information. There can be up to 4 different detailed information for the same general information in columns 1 to 4.
My sample input:
Server;Owner;Company;Username;Property;Value
Srv1;Dave;Sandbox;kwus91;Memory;4GB
Srv1;Dave;Sandbox;kwus91;Processes;135
Srv1;Dave;Sandbox;kwus91;Storage;120GB
Srv1;Dave;Sandbox;kwus91;Variant;16
Srv2;Pete;GWZ;aiwq71;Memory;8GB
Srv2;Pete;GWZ;aiwq71;Processes;234
Srv3;Micael;P12;mxuq01;Memory;16GB
Srv3;Micael;P12;mxuq01;Processes;239
Srv3;Micael;P12;mxuq01;Storage;160GB
Srv4;Stefan;MTC;spq61ep;Storage;120GB
Desired output:
Server;Owner;Company;Username;Memory;Processes;Storage;Variant
Srv1;Dave;Sandbox;kwus91;4GB;135;120GB;16
Srv2;Pete;GWZ;aiwq71;8GB;234;;
Srv3;Micael;P12;mxuq01;16GB;239;160GB;
Srv4;Stefan;MTC;spq61ep;;;120GB;
If a values doesn't exist for general information (Columns 1-4) it has to stay blank.
My current code:
$a = Import-csv .\Input.txt -Delimiter ";"
$a | FT -AutoSize
$b = #()
foreach ($Server in $a.Server | Select -Unique) {
$Props = [ordered]#{ Server = $Server }
$Owner = ($a.where({ $_.Server -eq $Server})).Owner | Select -Unique
$Company = ($a.where({ $_.Server -eq $Server})).Company | Select -Unique
$Username = ($a.where({ $_.Server -eq $Server})).Username | Select -Unique
$Props += #{Owner = $Owner}
$Props += #{Company = $Company}
$Props += #{Username = $Username}
foreach ($Property in $a.Property | Select -Unique){
$Value = ($a.where({ $_.Server -eq $Server -and
$_.Property -eq $Property})).Value
$Props += #{ $Property = $Value }
}
$b += New-Object -TypeName PSObject -Property $Props
}
$b | FT -AutoSize
$b | Export-Csv .\Output.txt -NoTypeInformation -Delimiter ";"
After a lot of trying and getting errors: My script works.
But it takes a lot of time.
Is there a possibility to make performance better for around 3 Million lines in txt file? I'm calculating with more or less 2.5 Million unique values for $Server.
I'm running Windows 7 64bit with PowerShell 4.0.
try Something like this:
#Import Data and create empty columns
$List=import-csv "C:\temp\file.csv" -Delimiter ";"
#get all properties name with value not empty
$ListProperty=($List | where Value -ne '' | select property -Unique).Property
#group by server
$Groups=$List | group Server
#loop every rows and store data by group and Property Name
$List | %{
$Current=$_
#Take value not empty and group by Property Name
$Group=($Groups | where Name -eq $Current.Server).Group | where Value -ne '' | group Property
#Add all property and first value not empty
$ListProperty | %{
$PropertyName=$_
$PropertyValue=($Group | where Name -eq $PropertyName | select -first 1).Group.Value
$Current | Add-Member -Name $PropertyName -MemberType NoteProperty -Value $PropertyValue
}
$Current
} | select * -ExcludeProperty Property, Value -unique | export-csv "c:\temp\result.csv" -notype -Delimiter ";"
Related
I am trying to find Latency for a datastore.
below is the code
$vmName = ""
$stat = "datastore.totalReadLatency.average","datastore.totalWriteLatency.average"
$entity = Get-VM -Name $vmName | select -Unique
$start = (Get-Date).AddHours(-1)
$dsTab = #{}
$dsTab = Get-Datastore | Where {$_.Type -eq "VMFS"} | %{
$key = $_.ExtensionData.Info.Vmfs.Uuid
if(!$dsTab.ContainsKey($key)){
$dsTab.Add($key,$_.Name)
}
else{
"Datastore $($_.Name) with UUID $key already in hash table"
}
}
Get-Stat -Entity $entity -Stat $stat -Start $start |
Group-Object -Property {$_.Entity.Name} | %{
$vmName = $_.Values[0]
$VMReadLatency = $_.Group |
where {$_.MetricId -eq "datastore.totalReadLatency.average"} |
Measure-Object -Property Value -Average |
Select -ExpandProperty Average
$VMWriteLatency = $_.Group |
where {$_.MetricId -eq "datastore.totalWriteLatency.average"} |
Measure-Object -Property Value -Average |
Select -ExpandProperty Average
$VMReadIOPSAverage = $_.Group |
where {$_.MetricId -eq "datastore.numberReadAveraged.average"} |
Measure-Object -Property Value -Average |
Select -ExpandProperty Average
$VMWriteIOPSAverage = $_.Group |
where {$_.MetricId -eq "datastore.numberWriteAveraged.average"} |
Measure-Object -Property Value -Average |
Select -ExpandProperty Average
$_.Group | Group-Object -Property Instance | %{
New-Object PSObject -Property #{
VM = $vmName
Host = $_.Group[0].Entity.Host.Name
Datastore = $dsTab[$($_.Values[0])]
Start = $start
DSReadLatencyAvg = [math]::Round(($_.Group |
where {$_.MetricId -eq "datastore.totalReadLatency.average"} |
Measure-Object -Property Value -Average |
Select -ExpandProperty Average),2)
DSWriteLatencyAvg = [math]::Round(($_.Group |
where {$_.MetricId -eq "datastore.totalWriteLatency.average"} |
Measure-Object -Property Value -Average |
Select -ExpandProperty Average),2)
VMReadLatencyAvg = [math]::Round($VMReadLatency,2)
VMWriteLatencyAvg = [math]::Round($VMWriteLatency,2)
VMReadIOPSAvg = [math]::Round($VMReadIOPSAverage,2)
VMWriteIOPSAvg = [math]::Round($VMWriteIOPSAverage,2)
}
}
} | Export-Csv c:\report.csv -NoTypeInformation -UseCulture
When I check with any datastore, I am not able to find stat "datastore.totalReadLatency.average","datastore.totalWriteLatency.average"
Please let me know what is wrong I am doing or is there anything which needs to be done ( any update/Installation )
Running your
Get-Stat -Entity $entity -Stat $stat -Start $start
in PowerCLI v12.4 I see errors like below and no data returned
The metric counter "datastore.totalreadlatency.average" doesn't exist for
entity <$vmname>
which led me to this thread where the solution code looks close to yours. Anyway, using -Realtime seems to fix that piece for your last-hour scenario. Offhand I'm not sure if something changed in recent versions, or what.
Not finding this documented, but Get-Stat appears to ignore -Start when -Realtime is specified. So try
Get-Stat -Entity $entity -Stat $stat -Realtime
(Maybe you grabbed the $stat definition from the first answer in that thread, but grabbed the remaining code from the accepted answer? You're attempting to parse results for numberReadAveraged & numberWriteAveraged without gathering those results.)
To look beyond realtime/last-hour, you'd need collection level 3 or higher for totalReadLatency and totalWriteLatency. Increasing stat levels will grow the vCenter db.
I have a system.Array in this format
A_Site B_Site
----------- -----------
BN6 BIO70
BY21 BN6
BY4 BY21
CBR20 BY4
is there a way to sort this is such a way like this ? idea is to know if there are missing site codes in column A or missing site codes in column B..
A_Site B_Site
----------- -----------
BN6 BN6
BY21 BY21
BY4 BY4
CBR20
BIO70
If the two columns in your sample input are stored in separate arrays, the following yields the desired output (PSv3+):
$arr1 = "BN6", "BY21", "BY4", "CBR20"
$arr2 = "BIO70", "BN6", "BY21", "BY4"
Compare-Object $arr1 $arr2 -IncludeEqual |
Select-Object #{ n='A_Site'; e={ if ($_.SideIndicator -in '==', '<=') { $_.InputObject } } },
#{ n='B_Site'; e={ if ($_.SideIndicator -in '==', '=>') { $_.InputObject } } }
Other method:
$arr1 = "BN6", "BY21", "BY4", "CBR20"
$arr2 = "BIO70", "BN6", "BY21", "BY4"
$arr1+ $arr2 | select -Unique | %{
$Value=$_;
[pscustomobject]#{
Value=$_
IsInArray1=(($arr1 | where {$_ -eq $Value} | select -First 1) -ne $null)
IsInArray2=(($arr2 | where {$_ -eq $Value} | select -First 1) -ne $null)
}
}
I have 2 .csv files, they have matching columns, I am trying to compare the two and have a final output .csv that only has the differences
corpold.csv is a previous imported file.
corpnew.csv is the new import file.
I need to export a CSV that includes all items that are not in corpold.csv, only changed items that exist in both CSVs and exclude any rows that exist in corpold.csv but not in corpnew.csv.
$reference = Import-Csv -Path D:\corpold.csv
$lookup = $reference | Group-Object -AsHashTable -AsString -Property EMPID
$results = Import-Csv -Path D:\corpnew.csv | foreach {
$email = $_.EMAIL_ADDRESS
$status = $_.ACTIVE
$fs = $_.FIRST_NAME
$ls = $_.LAST_NAME
$id = $_.EMPID
$title = $_.JOB_TITLE
$code = $_.JOB_CODE
$type = $_.USER_TYPE
$designee = $_.DESIGNEE
$stores = $_.STORES
$hiredate = $_.HIRE_DATE
$dept = $_.DEPARTMENT
$grp = $_.GROUP
if ($lookup.ContainsKey($id)) {
# if exists in yesterdays file
# trying to figure out how to compare and only provide results into
# the Export-Csv that have changed while excluding any items in
# corpold that do not exist in corpnew
} else {
# if it does not exist update all fields
[PSCustomObject]#{
ACTIVE = $status
EMAIL_ADDRESS = $email
FIRST_NAME = $fs
LAST_NAME = $ls
EMPID = $id
JOB_TITLE = $title
JOB_CODE = $code
USER_TYPE = $type
DESIGNEE = $designee
STORES = $stores
HIRE_DATE = $hiredate
DEPARTMENT = $dept
GROUP = $grp
}
}
}
# Sample outputs
$results
$results | Export-Csv -Path D:\delta.csv -NoTypeInformation
There are two operations to be done here: find differences and compare objects which exist in both files.
Compare objects and find new/deleted entries
To compare object you can utilize Compare-Object cmdlet like this:
Compare-Object -ReferenceObject $reference -DifferenceObject $results -Property EMPID -IncludeEqual
This will give you the list of EMPID and SideIndicator showing whether object exist only in first (<=), only in second (=>) or in both (==). You can filter by SideIndicatorand then process it accordingly.
Alternative way is to use Where-Object like this:
$reference | Where-Object empid -NotIn $results.empid
$reference | Where-Object empid -In $results.empid
$results | Where-Object empid -NotIn $reference.empid
First one will give you entries only in first file, second one results existing in both, last one will give you entries only existing in second file.
Find edited entries
What you basically have to do is to iterate all the entries and then check if any of the columns has been changed. If yes, add it to $changedEntries.
Example of script:
$IDsInBoth = $results | Where-Object empid -In $reference.empid | Select-Object -ExpandProperty EMPID
$AllProperties = $results | Get-Member | Where-Object MemberType -eq "NoteProperty" | Select-Object -ExpandProperty Name
$changedEntries = #()
$IDsInBoth | ForEach-Object {
$changed = $false
$newEntry = $results | Where-Object EMPID -eq $_
$oldEntry = $reference | Where-Object EMPID -eq $_
foreach ($p in $AllProperties) {
if ($oldEntry."$p" -ne $newEntry."$p") {
$changed = $true
}
}
if ($changed) {
$changedEntries += $newEntry
}
}
I'm trying to create a PowerShell script that transpose a CSV file from column to rows.
I found examples of doing the opposite (converting row based CSV to column) but I found nothing on column to rows. My problem being that I don't know exactly how many column I'll have. I tried adapting the row to column to column to rows but unsuccessfully.
$a = Import-Csv "input.csv"
$a | FT -AutoSize
$b = #()
foreach ($Property in $a.Property | Select -Unique) {
$Props = [ordered]#{ Property = $Property }
foreach ($Server in $a.Server | Select -Unique){
$Value = ($a.where({ $_.Server -eq $Server -and
$_.Property -eq $Property })).Value
$Props += #{ $Server = $Value }
}
$b += New-Object -TypeName PSObject -Property $Props
}
$b | FT -AutoSize
$b | Out-GridView
$b | Export-Csv "output.csv" -NoTypeInformation
For example my CSV can look like this:
"ID","DATA1"
"12345","11111"
"54321","11111"
"23456","44444"
or this (number of column can vary):
"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555",
and I would like the script to convert it like this:
"ID","DATA"
"12345","11111"
"12345","22222"
"12345","33333"
"54321","11111"
"23456","44444"
"23456","55555"
The trick is to query the members of the table to get the column names. Once you do that then the rest is straightforward:
function Flip-Table ($Table) {
Process {
$Row = $_
# Get all the columns names, excluding the ID field.
$Columns = ($Row | Get-Member -Type NoteProperty | Where-Object Name -ne ID).Name
foreach ($Column in $Columns) {
if ($Row.$Column) {
$Properties = [Ordered] #{
"ID" = $Row.ID
"DATA" = $Row.$Column
}
New-Object PSObject -Property $Properties
}
}
# Garbage collection won't kick in until the end of the script, so
# invoke it every 100 input rows.
$Count++;
if (($Count % 100) -eq 0) {
[System.GC]::GetTotalMemory('forceFullCollection') | out-null
}
}
}
Import-Csv input.csv | Flip-Table | Export-Csv -NoTypeInformation output.csv
Well, here is mine. I'm not as fancy as the rest:
$in = Get-Content input.csv | Select -Skip 1
$out = New-Object System.Collections.ArrayList
foreach($row in $in){
$parts = $row.Split(',')
$id = $parts[0]
foreach($data in $parts[1..$parts.Count]){
if($data -ne '' -AND $data -ne $null){
$temp = New-Object PSCustomObject -Property #{'ID' = $id;
'Data' = $data}
$out.Add($temp) | Out-Null
}
}
}
$out | Export-CSV output.csv -NoTypeInformation
You can do something like this
# Convert csv to object
$csv = ConvertFrom-Csv #"
"ID","DATA1","DATA2","DATA3"
"12345","11111","22222","33333"
"54321","11111",,
"23456","44444","55555"
"#
# Ignore common members and the ID property
$excludedMembers = #(
'GetHashCode',
'GetType',
'ToString',
'Equals',
'ID'
)
$results = #()
# Iterate around each csv row
foreach ($row in $csv) {
$members = $row | Get-Member
# Iterate around each member from the 'row object' apart from our
# exclusions and empty values
foreach ($member in $members |
Where { $excludedMembers -notcontains $_.Name -and $row.($_.Name)}) {
# add to array of objects
$results += #{ ID=$row.ID; DATA=$row.($member.Name)}
}
}
# Write the csv string
$outstring = "ID,DATA"
$results | foreach { $outstring += "`n$($_.ID),$($_.DATA)" }
# New csv object
$csv = $outstring | ConvertFrom-Csv
Probably not the most elegant solution, but should do what you need
I left some comments explaining what it does
If you only want to accept a limited number DATA columns (e.g. 5), you could do:
ForEach ($i in 1..5) {$CSV | ? {$_."Data$i"} | Select ID, #{N='Data'; E={$_."Data$i"}}}
And if you have a potential unlimited number of DATA columns:
ForEach ($Data in ($CSV | Select "Data*" -First 1).PSObject.Properties.Name) {
$CSV | ? {$_.$Data} | Select ID, #{N='Data'; E={$_.$Data}}
}
I am trying to convert a 80K row data csv with 30 columns to sorted and filtered CSV based on specific column data from orignal CSV.
For Example My Data is in below format:
PatchName MachineName IPAddress DefaultIPGateway Domain Name USERID UNKNOWN NOTAPPLICABLE INSTALLED APPLICABLE REBOOTREQUIRED FAILED
KB456982 XXX1002 xx.yy.65.148 xx.yy.64.1 XYZ.NET XYZ\ayzuser YES
KB589631 XXX1003 xx.yy.65.176 xx.yy.64.1 XYZ.NET XYZ\cdfuser YES
KB456982 ABC1004 xx.zz.83.56 xx.zz.83.1 XYZ.NET XYZ\mnguser YES
KB456982 8797XCV xx.yy.143.187 xx.yy.143.184 XYZ.NET WPX\abcuser YES
Here MachineName would be filtered to Uniq and PatchName would transpose to Last Columns headers with holding "UNKNOWN, NOAPPLICABLE, INSTALLED, FAILED, REBOOTREQUIRED columns Values if YES occurred -
Expected Result:
MachineName IPAddress DefaultIPGateway Domain Name USERID KB456982 KB589631
XXX1002 xx.yy.65.148 xx.yy.64.1 XYZ.NET XYZ\ayzuser UNKNOWN
XXX1003 xx.yy.65.176 xx.yy.64.1 XYZ.NET XYZ\cdfuser NOTAPPLICATBLE
ABC1004 xx.zz.83.56 xx.zz.83.1 XYZ.NET XYZ\mnguser UNKNOWN
8797XCV xx.yy.143.187 xx.yy.143.184 XYZ.NET WPX\abcuser FAILED
Looking for help to achieve this, so far I am able to transpose PathcName rows to columns but not able to include all the columns along with and apply the condition. [It takes 40 Minutes to process this]
$b = #()
foreach ($Property in $a.MachineName | Select -Unique) {
$Props = [ordered]#{ MachineName = $Property }
foreach ($Server in $a.PatchName | Select -Unique){
$Value = ($a.where({ $_.PatchName -eq $Server -and $_.MachineName -eq $Property })).NOTAPPLICABALE
$Props += #{ $Server = $Value }
}
$b += New-Object -TypeName PSObject -Property $Props
}
This is what I came up with:
$data = Import-Csv -LiteralPath 'C:\path\to\data.csv'
$lookup = #{}
$allPatches = $data.PatchName | Select-Object -Unique
# Make 1 lookup entry for each computer, to keep the username and IP and so on.
# Add the patch details from the current row (might add more than one patch per computer)
foreach ($row in $data)
{
if (-not $lookup.ContainsKey($row.MachineName))
{
$lookup[$row.MachineName] = ($row | Select-Object -Property MachineName, IPAddress, DefaultIPGateway, DomainName, UserID)
}
$patchStatus = $row.psobject.properties |
Where-Object {
$_.name -in #('applicable', 'notapplicable', 'installed', 'rebootrequired', 'failed', 'unknown') -and
-not [string]::IsNullOrWhiteSpace($_.value)
} |
Select-Object -ExpandProperty Name
$lookup[$row.MachineName] | Add-Member -NotePropertyName $row.PatchName -NotePropertyValue $patchStatus
}
# Pull the computer details out of the lookup, and add all the remaining patches
# so they will convert to CSV properly, then export to CSV
$lookup.Values | ForEach-Object {
$computer = $_
foreach ($patch in $allPatches | where-object {$_ -notin $computer.psobject.properties.name})
{
$computer | Add-Member -NotePropertyName $patch -NotePropertyValue ''
}
$computer
} | Export-Csv -LiteralPath 'c:\path\to\output.csv' -NoTypeInformation