Powershell sort two fields and and get latest from CSV - powershell

I am trying to find a way to sort a CSV by two fields and retrieve only the latest item.
CSV fields: time, computer, type, domain.
Item that works is below but is slow due to scale of CSV and I feel like there is a better way.
$sorted = $csv | Group-Object {$_.computer} | ForEach {$_.Group | Sort-Object Time -Descending | Select-Object -First 1}

As Lee_Dailey suggests, you'll probably have better luck with a hashtable instead, Group-Object (unless used with the -NoElement parameter) is fairly slow and memory-hungry.
The fastest way off the top of my head would be something like this:
# use the call operator & instead of ForEach-Object to avoid overhead from pipeline parameter binding
$csv |&{
begin{
# create a hashtable to hold the newest object per computer
$newest = #{}
}
process{
# test if the object in the pipeline is newer that the one we have
if(-not $newest.ContainsKey($_.Computer) -or $newest[$_.Computer].Time -lt $_.Time){
# update our hashtable with the newest object
$newest[$_.Computer] = $_
}
}
end{
# return the newest-per-computer object
$newest.Values
}
}

Related

Powershell Pipeline - return a new Object, that was created within pipline

I keep running into the same problem again, and i have my default way of handling it, but it keeps bugging me.
Isn't there any better way?
So basicly i have a pipline running, do stuff within the pipline, and want to return a Key/Value Pair from within the pipline.
I want the whole pipline to return a object of type psobject (or pscustomobject).
Here is the way i do it everytime.
I create a hashtable at the beginning of the pipline and add key/Value Pairs from within the pipline to this hashtable using the .Add() method.
Afterwards i create a psobject by passing the hashtbale to New-Object`s -Property Parameter. This gives me the desired result.
Get-Process | Sort -Unique Name | ForEach-Object -Begin { $ht = #{} } -Process {
# DO STUFF
$key = $_.Name
$val = $_.Id
# Add Entry to Hashtable
$ht.Add($key,$val)
}
# Create PSObject from Hashtable
$myAwesomeNewObject = New-Object psobject -Property $ht
# Done - returns System.Management.Automation.PSCustomObject
$myAwesomeNewObject.GetType().FullName
But this seems a bit cluncky, isn't there a more elegant way of doing it?
Something like this:
[PSObject]$myAwesomeNewObject = Get-Process | Sort -Unique Name | ForEach-Object -Process {
# DO STUFF
$key = $_.Name
$val = $_.Id
# return Key/Val Pair
#{$key=$val}
}
# Failed - returns System.Object[]
$myAwesomeNewObject.GetType().FullName
This unfortunally dosn't work, since the pipe returns an array of hashtables, but i hope you know now what iam trying to achieve.
Thanks
Not sure if this is more elegant but just another way of doing it, this uses an anonymous function so $ht will no longer be available after execution, and casts to [pscustomobject] instead of using New-Object:
[pscustomobject] (Get-Process | Sort -Unique Name | & {
begin { $ht = #{ } }
process {
# DO STUFF
$key = $_.Name
$val = $_.Id
# Add Entry to Hashtable
$ht.Add($key, $val)
}
end { $ht }
})
You can also use the -End parameter to convert the final hash table to a pscustomobject as part of the pipeline, without needing to set the whole thing to a variable
$ht[$key]=$val is also a nice shorthand for $ht.Add($key,$val):
Get-Process |
Sort -Unique Name |
Foreach -Begin { $ht = #{} } -Process {
$ht[$_.Name] = $_.Id
} -End {[pscustomobject]$ht} |
## continue pipeline with pscustomobject
Thanks to #Santiago Squarzon and #Cpt.Whale answers, i were able to combine them to create a solution that pleases me:
$myAwesomeNewObject = `
Get-Process | Sort -Unique Name | & {
begin { $ht = #{} }
process {
# DO STUFF
$key = $_.Name
$val = $_.Id
# Add Entry to Hashtable
$ht[$key]=$val
}
end {[pscustomobject]$ht}
}
# Success - System.Management.Automation.PSCustomObject
$myAwesomeNewObject.Gettype().FullName
# And helper Hashtable is NULL thanks to the
# anonym function
$null -eq $ht
Thanks alot Guys
Alternatively you may create a hashtable using Group-Object -AsHashTable:
# Store the PIDs of all processes into a PSCustomObject, keyed by the process name
$processes = [PSCustomObject] (Get-Process -PV proc |
Select-Object -Expand Id |
Group-Object { $proc.Name } -AsHashtable)
# List all PIDs of given process
$processes.chrome
Notes:
Common parameter -PV (alias of -PipelineVariable) makes sure that we can still access the full process object from within the calculated property of the Group-Object command, despite that we have a Select-Object command in between.
The values of the properties are arrays, which store the process IDs of all instances of each process. E. g. $processes.chrome outputs a list of PIDs of all instances of the chrome process.

How to merge 2 x CSVs with the same column but overwrite not append?

I've got this one that has been baffling me all day, and I can't seem to find any search results that match exactly what I am trying to do.
I have 2 CSV files, both of which have the same columns and headers. They look like this (shortened for the purpose of this post):
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","none48","H006"
"1013740016604537004556","3835265","A007"
"1013740016604537004556","3835269","B007"
"1013740016604537004556","3835271","C007"
Each of the 2 CSVs only have some actual Lab IDs, and the 'nonexx' are just fillers for the importing software. There is no duplication ie each 'well' is only referenced once across the 2 files.
What I need to do is merge the 2 CSVs, for example the second CSV might have a Lab ID for well H006 but the first will not. I need the lab ID from the second CSV imported into the first, overwriting the 'nonexx' currently in that column.
Here is my current code:
$CSVB = Import-CSV "$RootDir\SymphonyOutputPending\$plateID`A_Header.csv"
Import-CSV "$RootDir\SymphonyOutputPending\$plateID`_Header.csv" | ForEach-Object {
$CSVData = [PSCustomObject]#{
labid = $_.labid
well = $_.well
}
If ($CSVB.well -match $CSVData.wellID) {
write-host "I MATCH"
($CSVB | Where-Object {$_.well -eq $CSVData.well}).labid = $CSVData.labid
}
$CSVB | Export-CSV "$RootDir\SymphonyOutputPending\$plateID`_final.csv" -NoTypeInformation
}
The code runs but doesn't 'merge' the data, the final CSV output is just a replication of the first input file. I am definitely getting a match as the string "I MATCH" appears several times when debugging as expected.
Based on the responses in the comments of your question, I believe this is what you are looking for. This assumes that the both CSVs contain the exact same data with labid being the only difference.
There is no need to modify csv2 if we are just grabbing the labid to overwrite the row in csv1.
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
# Loop through csv1 rows
Foreach($line in $csv1) {
# If Labid contains "none"
If($line.labid -like "none*") {
# Set rows labid to the labid from csv2 row that matches plate/well
# May be able to remove the plate section if well is a unique value
$line.labid = ($csv2 | Where {$_.well -eq $line.well -and $_.plate -eq $line.plate}).labid
}
}
# Export to CSV - not overwrite - to confirm results
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation
Since you need to do a bi-directional comparison of the 2 Csvs you could create a new array of both and then group the objects by their well property, for this you can use Group-Object, then filter each group if their Count is equal to 2 where their labid property does not start with none else return the object as-is.
Using the following Csvs for demonstration purposes:
Csv1
"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","3835265","A007"
"newrowuniquecsv1","none123","X001"
Csv2
"plate","labid","well"
"1013740016604537004556","none48","A007"
"1013740016604537004556","3835269","F006"
"1013740016604537004556","3835271","G006"
"newrowuniquecsv2","none123","X002"
Code
Note that this code assumes there will be a maximum of 2 objects with the same well property and, if there are 2 objects with the same well, one of them must have a value not starting with none.
$mergedCsv = #(
Import-Csv pathtocsv1.csv
Import-Csv pathtocsv2.csv
)
$mergedCsv | Group-Object well | ForEach-Object {
if($_.Count -eq 2) {
return $_.Group.Where{ -not $_.labid.StartsWith('none') }
}
$_.Group
} | Export-Csv pathtomerged.csv -NoTypeInformation
Output
plate labid well
----- ----- ----
1013740016604537004556 3835265 A007
1013740016604537004556 3835269 F006
1013740016604537004556 3835271 G006
newrowuniquecsv1 none123 X001
newrowuniquecsv2 none123 X002
If the lists are large, performance might be an issue as Where-Object (or any other where method) and Group-Object do not perform very well for embedded loops.
By indexing the second csv file (aka creating a hashtable), you have quicker access to the required objects. Indexing upon two (or more) items (plate and well) is issued here: Does there exist a designated (sub)index delimiter? and resolved by #mklement0 and zett42 with a nice CaseInsensitiveArrayEqualityComparer class.
To apply this class on Drew's helpful answer:
$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv
$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())
$csv2.ForEach{ $dict.($_.plate, $_.well) = $_ }
Foreach($line in $csv1) {
If($line.labid -like "none*") {
$line.labid = $dict.($line.plate, $line.well).labid
}
}
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation

Passing date time as variable instead of where-object filter

I am trying to use below code with date range as Where-Object filter but that is slowing down my output speed as well as unable to export to csv.
$Prids = get-content -Path C:\Temp\sqltest.txt
foreach ($prid in $prids){
$filterDate = [datetime]::Today.AddDays(-22)
Get-CdPac2000Problems -PId $Prid | Where-Object {$_.ClosedDate.Date -ge $filterDate} |ft PID,ClosedDate,ClosedByELID,ResponsibleGroup,ReferredDate -autosize
}
How can I change Where-Object to parameter that looks something like -closedate $variable like I did with -PID? The biggest struggle for me is to creating a datetime variable.
First of, you should move the $filterDateout of the loop, second you should NEVER use Format-Table, and ABSOLUTELY NEVER in a loop.
Try this:
$filterDate = [datetime]::Today.AddDays(-22)
$Prids = get-content -Path C:\Temp\sqltest.txt
$Result = foreach ($prid in $prids){
Get-CdPac2000Problems -PId $Prid | Where-Object {(Get-Date $_.ClosedDate.Date) -ge $filterDate}
}
$Result | Format-Table PID,ClosedDate,ClosedByELID,ResponsibleGroup,ReferredDate -autosize
One of the most beautyfull things in PowerShell is, that you can interfere directly with all objects. All methods and properties are preserved, when passing the objects to a variable or to the next command in a pipe. Format-Table converts objects to a table - something human readable and something stripped from methods and life.

In Powershell, is there a better way to store/find data in an n-dimensional array than a custom object

I find myself continually faced with the need to store mixed-type data in some kind of a structure for later lookup.
For a recent example, I am performing data migration and I will store the old UUID, new UUID, source environment, target environment, and schema for an unknown number of entries.
I have been meeting this need by creating an array and inserting System.Objects with NoteProperty members for each of the columns of data.
This strikes me as a very clumsy approach but I feel like I may be limited by Powershell's functionality. If I need to, for example, locate all entries that used a particular schema, I write a foreach loop that sticks each entry with a matching schema name in a whole new array that I can return. I would really like the ability to more easily search for all objects that contain a member matching a particular value, modify existing members, etc.
Is there a better built-in data structure that will suit my needs, or is creating a custom object the right thing to do?
For reference, I'm doing something like this to create my structure:
$objectArray= #();
foreach(thing to process){
$tempObj = New-Object System.Object;
$tempObj | Add-Member -MemberType NoteProperty -Name "membername" -Value xxxxx
....repeat for each member...
$objectArray += $tempObj
}
If I need to find something in it, I then have to:
$matchingObjs = #()
foreach ($obj in $objectArray){
if($obj.thing -eq value){$matchingObjs += $obj}
}
This really sucks and I know there has to be a more elegant way. I'm still fairly new to powershell so I don't know what utilities it has to help me. I'm using v5.
With PowerShell 3.0 you could use a [PSCustomObject], here's an article on the different object creation methods.
Also setting the array equal to the output of the foreach loop will be more efficient than repeatedly recreating an array with +=.
$objectArray = foreach ($item in $collection) {
[pscustomobject]#{
"membername" = "xxxxx"
}
}
The Where-Object cmdlet or the .where() method looks like what you need in your second loop.
$matchingObjs = $objectArray | Where-Object {$_.thing -eq "value"}
It also sounds like you could use Where-Object/.where() to filter the initial data and just create an object which matches what you are looking for. For example:
$matchingObjs = $InputData |
Where-Object {$_.thing -eq "value"} |
ForEach-Object {
[pscustomobject]#{
"membername" = xxxxx
}
}
If your data can be expressed as key value pairs, then a hashtable will be the most efficient, see about_Hash_Tables for more info.
There is no built-in way to do what you are asking. One way is to segment your data into separate hashtables so you can do easy lookups by a common key, say the ID.
# Create a hastable for the IDs
$ids = #{};
foreach(thing to process){
$ids.Add($uid, 'Value')
}
# Find the $uid exists
$keyExists = $ids.Keys -Contains $uid
# Find value of stored for $uid
$keyValue = $ids[$uid]
As a side note, you don't have to create Syste.Object, you can simple do this:
$objectArray = #();
gci | % {
$objectArray += #{
'Key1' = 'Value 1'
'Key2' = 'Value 2'
}
}
If you need to compare complex objects, you can build them with #{} and then use Compare-Object on the two objects, just another idea.
For example, this will get a file listing of two different directories, and tell me what file exists or doesn't exist between the two directories:
$packages = (gci $boxStarterRepo -Recurse *.nuspec | Select-Object -ExpandProperty Name) -replace '.nuspec', ''
$packages += (gci $boxStarterPrivateRepo -Recurse *.nuspec | Select-Object -ExpandProperty Name) -replace '.nuspec', ''
$packages = $packages | Sort-Object
Compare-Object $packages $done

Using Powershell to compare two files and then output only the different string names

So I am a complete beginner at Powershell but need to write a script that will take a file, compare it against another file, and tell me what strings are different in the first compared to the second. I have had a go at this but I am struggling with the outputs as my script will currently only tell me on which line things are different, but it also seems to count lines that are empty too.
To give some context for what I am trying to achieve, I would like to have a static file of known good Windows processes ($Authorized) and I want my script to pull a list of current running processes, filter by the process name column so to just pull the process name strings, then match anything over 1 character, sort the file by unique values and then compare it against $Authorized, plus finally either outputting the different process strings found in $Processes (to the ISE Output Pane) or just to output the different process names to a file.
I have spent today attempting the following in Powershell ISE and also Googling around to try and find solutions. I heard 'fc' is a better choice instead of Compare-Object but I could not get that to work. I have thus far managed to get it to work but the final part where it compares the two files it seems to compare line by line, for which would always give me false positives as the line position of the process names in the file supplied would change, furthermore I only want to see the changed process names, and not the line numbers which it is reporting ("The process at line 34 is an outlier" is what currently gets outputted).
I hope this makes sense, and any help on this would be very much appreciated.
Get-Process | Format-Table -Wrap -Autosize -Property ProcessName | Outfile c:\users\me\Desktop\Processes.txt
$Processes = 'c:\Users\me\Desktop\Processes.txt'
$Output_file = 'c:\Users\me\Desktop\Extracted.txt'
$Sorted = 'c:\Users\me\Desktop\Sorted.txt'
$Authorized = 'c:\Users\me\Desktop\Authorized.txt'
$regex = '.{1,}'
select-string -Path $Processes -Pattern $regex |% { $_.Matches } |% { $_.Value } > $Output_file
Get-Content $Output_file | Sort-Object -Unique > $Sorted
$dif = Compare-Object -ReferenceObject $(Get-Content $Sorted) -DifferenceObject $(get-content $Authorized) -IncludeEqual
$lineNumber = 1
foreach ($difference in $dif)
{
if ($difference.SideIndicator -ne "==")
{
Write-Output "The Process at Line $linenumber is an Outlier"
}
$lineNumber ++
}
Remove-Item c:\Users\me\Desktop\Processes.txt
Remove-Item c:\Users\me\Desktop\Extracted.txt
Write-Output "The Results are Stored in $Sorted"
From the length and complexity of your script, I feel like I'm missing something, but your description seems clear
Running process names:
$ProcessNames = #(Get-Process | Select-Object -ExpandProperty Name)
.. which aren't blank: $ProcessNames = $ProcessNames | Where-Object {$_ -ne ''}
List of authorised names from a file:
$AuthorizedNames = Get-Content 'c:\Users\me\Desktop\Authorized.txt'
Compare:
$UnAuthorizedNames = $ProcessNames | Where-Object { $_ -notin $AuthorizedNames }
optional output to file:
$UnAuthorizedNames | Set-Content out.txt
or in the shell:
#(gps).Name -ne '' |? { $_ -notin (gc authorized.txt) } | sc out.txt
1 2 3 4 5 6 7 8
1. #() forces something to be an array, even if it only returns one thing
2. gps is a default alias of Get-Process
3. using .Property on an array takes that property value from every item in the array
4. using an operator on an array filters the array by whether the items pass the test
5. ? is an alias of Where-Object
6. -notin tests if one item is not in a collection
7. gc is an alias of Get-Content
8. sc is an alias of Set-Content
You should use Set-Content instead of Out-File and > because it handles character encoding nicely, and they don't. And because Get-Content/Set-Content sounds like a memorable matched pair, and Get-Content/Out-File doesn't.