Guidance with developing PowerShell script - powershell

I am not an advanced scripter by any means, but I have a task which I need to accomplish for work. The task is to create a script which looks at two pieces of information (date and capacity utilized in bytes) from each report file that is contained in a directory. These two pieces of information are located in the same place in each report. Then, using the date value, the script can report which was the highest capacity utilized value for each month. I am thinking of having the final output be in HTML format.
There are two options for acquiring the date value. The report contains the date in the format mm/dd/yyyy in the 3rd line of text and the time is included in the file name as the Epoch time.
So far, I have put together a PowerShell script that parses the date and the capacity utilized from the body of the report. This information is then added to an array.
I am looking for guidance on which date value would be better to use (Epoch time from file name or date from body of report) and what method would be best to utilize for looking at the data for each month and reporting the highest capacity utilization per month.
Here is my script so far:
#Construct an array to use for data export
$fileDirectory = "c:\Temp"
$Array1 = #()
foreach ($file in Get-ChildItem $fileDirectory)
{
#Obtain path to each file in directory
$filePath = $fileDirectory + "\" + $file
#Get content of each file during the loop
$data = Get-Content $filePath
#Create object to enter data into Array1
$myobj = "" | Select "Date","Capacity"
$dateStr = ($data[2].Split(" "))[3]
[long]$capacityStr = ($data[19].Split(","))[2]
[single]$CapacityConv = $capacityStr
$capacityConv = ($capacityConv /= 1099511627776)
#Fill the object myobj
$myobj.date = $dateStr
$myobj.capacity = $capacityConv
#Add the object to Array1
$Array1 += $myobj
#Wipe the object
$myobj = $null
}
#After the loop, export the array to CSV file
$Array1 | export-csv "c:\Scripts\test-output.csv"
$Array1
pause

For the date, it's really up to you. If they're equally accurate then it's a matter of opinion.
For the capacity, I'm assuming these are daily reports given the date format, and you want the highest capacity for a given month.
Since you're creating an object containing a Date and a Capacity property for each report, using the returned array of all those values, you should be able to get the information you need like this:
$Array1 | Group-Object {([DateTime]$_.Date).ToString('MMMM')} | Select-Object Name,#{Name='MaxCap';Expression={ $_.Group.Capacity | Measure-Object -Maximum | Select-Object -ExpandProperty Maximum }}
Now this is kind of a lot so let's break it down.
Group-Object groups your array based on a property. In this case, we want to group by month, but you don't have a month property, so instead of a property name we're passing in a script block to calculate the property on the fly:
([DateTime]$_.Date).ToString('MMMM')
This casts your Date property (which is a [String]) into a [DateTime] object. Then we use .ToString('MMMM') to format it into a month name.
The result will be an array of group objects, where the Name property is the name of the group (in this case it will be the month name) and the Group property will contain all of the original objects that belonged to that group.
Piping this into Select-Object, we want the Name (the month), and then we're creating a new property on the fly named MaxCap:
$_.Group.Capacity | Measure-Object -Maximum | Select-Object -ExpandProperty Maximum
So we take the current Group (the array of all the objects for that month), then expand its Capacity property so now we have an array of all the capacities for the group.
Pipe that into Measure-Object -Maximum to get the max value, then Select-Object -ExpandProperty Maximum (because Measure-Object returns an object with a Maximum property and we just want the value).
The end result is an object with the month and the maximum capacity for that month.

Related

Getting unique elements from a column returns one value

I have the following function defined in a ps1 file, using the DLL from the latest taglib release. I downloaded the nuget package and ran expand-archive on it and copied the DLL to the correct place.
[System.Reflection.Assembly]::LoadFrom((Resolve-Path ($PSScriptRoot + "\TagLibSharp.dll")))
function Get-Image {
[CmdletBinding()]
param (
[Parameter(ValueFromPipelineByPropertyName)] $Name
)
process {
$fullPath = (Resolve-Path $Name).Path
$image = [taglib.file]::create($fullpath)
return $image
}
}
function Get-ImmediatePhotos {
Get-ChildItem | Where-Object {$_.Extension -eq ".jpg" -or $_.Extension -eq ".png"} | Get-Image
}
When I run this command to extract the years from the EXIF data in photos in a given directory I get a table like this:
> $years = Get-ImmediatePhotos | select-object {$_.Tag.DateTime.Year}
> $years
$_.Tag.DateTime.Year
--------------------
2020
2020
2020
2020
2020
2020
2020
2020
2020
2020
2021
2021
2021
2021
If I then try to extract unique years with sort-object I only get one year!
> $years | sort-object -unique
$_.Tag.DateTime.Year
--------------------
2020
If I try to group years with group-object I get this error:
$years | group-object
Group-Object: Cannot compare "#{$_.Tag.DateTime.Year=2020}" to
"#{$_.Tag.DateTime.Year=2020}" because the objects are not the same type or the object
"#{$_.Tag.DateTime.Year=2020}" does not implement "IComparable".
This seems to be telling me that the type of values in the column are some sort of anonymous thing which can't be compared.
How do I use the results of select as values, such as strings? My end goal is to automatically sort and categorize photos into directories in /year/month format.
Give your calculated property a proper name so that you can reference it later when using Sort-Object or Group-Object:
$years = Get-ImmediatePhotos | select-object #{Name='YearTaken';Expression={$_.Tag.DateTime.Year}}
$years |Sort-Object YearTaken -Unique
# or
$years |Group-Object YearTaken
Altenatively, use ForEach-Object instead of Select-Object - ForEach-Object will spit out the raw Year values (as opposed to an object having the value stored in a nested property, which is what Select-Object gives you):
$years = Get-ImmediatePhotos | ForEach-Object {$_.Tag.DateTime.Year}
# `$years` is just an array of raw Year values now, no need to specify a key property
$years |Sort-Object -Unique
# or
$years |Group-Object
To complement Mathias R. Jessen's helpful answer with why what you tried didn't work:
The immediate problem is that Sort-Object-Unique considers all [pscustomobject] instances to be equal, even if they have different property values, and even if they have different sets of properties.
Note that your Select-Object call, due to use of the (positionally implied) -Property parameter, does not extract just the years from its input objects, but creates a [pscustomobject] wrapper object with a property containing the year.
Normally, -ExpandProperty is used to extract just the values of a single property, but with a calculated property, such as in your case, this doesn't work. However, passing the exact same script block to ForEach-Object does work.
Therefore, to make your original code work, you need to pass the name of the properties whose values should be sorted and used as the basis for the uniqueness determination:
$years | sort-object -unique -property '$_.Tag.DateTime.Year'
Note the strange property name, which results from your Select-Object call having used just a script block ({ ... }) as a calculated property, in which case the script block's literal source code becomes the property name (the result of calling .ToString() on the script block).
Typically, a hashtable is used to define a calculated property, which allows naming the property, as shown in Mathias' answer.
As an aside: If you want to merely sort by year while passing the original image objects through, you can use your script block as-is directly with Sort-Object's (positionally implied) -Property parameter:
# Sort by year, but output the image objects.
Get-ImmediatePhotos | Sort-Object {$_.Tag.DateTime.Year}
Note that in this use of a calculated property, direct use of a script block is appropriate and usually sufficient, as the property is purely used for sorting, and doesn't require a name.
However, Sort-Object too supports hashtable-based calculated properties, but such hashtables do not support a name / label entry (because it would be meaningless), but you can use either ascending or descending as a Boolean entry to control the sort order on a per-property basis.
That is, the following is the verbose equivalent of the command above:
Get-ImmediatePhotos | Sort-Object #{ ascending=$true; expression={$_.Tag.DateTime.Year} }

Sum various columns to get subtotal depending on a criteria from a row using Powershell

I have a csv file, that contains the next data:
Pages,Pages BN,Pages Color,Customer
145,117,28,Report_Alexis
46,31,15,Report_Alexis
75,27,48,Report_Alexis
145,117,28,Report_Jack
46,31,15,Report_Jack
75,27,48,Report_Jack
145,117,28,Report_Amy
46,31,15,Report_Amy
75,27,48,Report_Amy
So what i need to do , is sum each column based on the report name and the export to another csv file like this
Pages,Pages BN,Pages Color,Customer
266,175,91,Report_Alexis
266,175,91,Report_Jack
266,175,91,Report_Amy
How can i do this?
I tried with this:
$coutnpages = Import-Csv "C:\temp\testcount\final file2.csv" |where {$_.Filename -eq 'Report_Jack'} | Measure-Object -Property Pages -Sum
then
$Countpages.Sum | Set-Content -Path "C:\temp\testcount\final file3.csv"
But this is just one, and then i dont know how to follow.
Can you please help me?
Working code
$IdentityColumns = #('Customer')
$ColumnsToSum = #('Pages', 'Pages BN', 'Pages Color')
$CSVFileInput = 'S:\SCRIPTS\1.csv'
Import-Csv -Path $CSVFileInput |
Group-Object -Property $IdentityColumns |
ForEach-Object {
$resultHT = #{ Customer = $_.Name } # This is result HashTable (Key-Value collection). We add here sum's next line.
#($_.Group | Measure-Object -Property $ColumnsToSum -Sum ) | # Run calculating of sum for all $ColumnsToSum`s in one line
ForEach-Object { $resultHT[$_.Property] = $_.Sum } # For each calculated property we set property in result HashTable
return [PSCustomObject]$resultHT # Convert HashTable to PSCustomObject. This better.
} | # End of ForEach-Object by groups
Select #($ColumnsToSum + $IdentityColumns) | # This sets order of columns. It may be important.
Out-GridView # Or replace with Export-Csv
#Export-Csv ...
Explanation:
Use Group-Object to make collection of groups. Groups have 4 properties:
Name - Name of group, equals to stingified values of property(-ies) you're grouping by
Values - Collection of values of properties you're grouping by (not stringified)
Count - Count of elements grouped into this group
Group - Values of elements grouped into this group
For grouping by single string properties (in this case it is ok), you can easily use Name of group, otherwise, always use Values.
So after Group-Object, you iterate not on collection-of-rows of CSV, but on collection-of-collections-of-rows grouped by some condition.
Measure-Object can process more than one propertiy for single pass (not mixing between values from different properties), we use this actively. This results in array of objects with attribute Property equal to passed to Measure-Object and value (Sum in our case). We move those Property=Sum pairs to hashtable.
[PSCustomObject] converts hashtable to object. Objects are always better for output.

Powershell Group-Object - filter using multiple objects

I have a CSV of devices that are missing security updates along with the date the update was released and kb number.
devicename,date,kb
Desktop1,9/12/17,KB4011055
Desktop1,9/12/17,KB4038866
Desktop2,9/12/17,KB4011055
Desktop2,6/13/17,KB3203467
I am trying to compile a list of devices that are missing updates that have been released in the past 30 days but exclude devices that are also missing older updates. So in the example above, the only device I want is Desktop 1.
I know I could do something like this to see devices that are under that 30 day window but that would still include devices that have other entries which are greater than 30 days.
$AllDevices | Where-Object {[datetime]$_.date_released -gt ((get-date).adddays(-30))}
I was thinking I could use Group-Object devicename to group all the devices together but I'm not sure how to check the dates from there.
Any ideas?
The assumption is that $AllDevices was assigned the output from something like
Import-Csv c:\path\to\some.csv and that PSv3+ is used.
$AllDevices | Group-Object devicename | Where-Object {
-not ([datetime[]] $_.Group.date -le (Get-Date).AddDays(-30))
} | Select-Object #{ l='devicename'; e='Name' }, #{ l='kbs'; e={ $_.Group.kb } }
With the sample input, this yields:
devicename kbs
---------- ---
Desktop1 {KB4011055, KB4038866}
Explanation:
Group-Object devicename groups all input objects by device name, which outputs a collection of [Microsoft.PowerShell.Commands.GroupInfo] instances each representing all input objects sharing a given device name (e.g., Desktop1) - see Get-Help Group-Object.
The Where-Object call is then used to weed out groups that contain objects whose date is older than 30 days.
[datetime[]] $_.Group.date creates an array of date-time objects [datetime[]] from the date-time strings (.date) of every member of the group $_.Group.
Note that $_.Group is the collection of input objects that make up the group, and even though .date is applied directly to $_.Group, the .date property is accessed on each collection member and the results are collected in an array - this handy shortcut syntax is called member-access enumeration and was introduced in PSv3.
-le (Get-Date).AddDays(-30) filters that array to only return members whose dates are older than 30 days; note that -le applied to an array-valued LHS returns a filtered subarray, not a Boolean.
-not negates the result of the -le comparison, which forces interpretation of the filtered array as a Boolean, which evaluates to $False if the array is empty, and $True otherwise; in other words: if one or more group members have dates older than 30 days, the -le comparison evaluates to $True as a Boolean, which -not negates.
This results in groups (and therfore devices) containing at least 1 older-than-30-days date getting removed from further pipeline processing.
Select-Object then receives only those group objects whose members all have dates that fall within the last 30 days, and uses calculated properties (via hashtable literals (#{...}) with standardized entries) to construct the output objects:
A group object's .Name property contains the value of the grouping property/ies passed to Group-Object, which in this case is the input objects' devicename property; #{ l='devicename'; e='Name' } simply renames the .Name property back to devicename.
#{ l='kbs'; e={ $_.Group.kb } } then constructs a kbs property that contains the array of kb values from the members of each group, retrieved by member-access enumeration via a script block { ... }
Note that Select-Object outputs [pscustomobject] instances containing only the explicitly defined properties; in this case, devicename and kbs.
I propose other solution:
import-csv "C:\temp\test.csv" |
select *, #{N="Date";E={[DateTime]$_.Date}} -ExcludeProperty "Date" |
group devicename |
%{
if (($_.Group | where Date -le (Get-Date).AddDays(-30)).Count -eq 0)
{
$LastUpdate=$_.Group | sort Date, kb -Descending | select -First 1
[pscustomobject]#{
DeviceName=$LastUpdate.DeviceName
DateLastUpdate=$LastUpdate.Date
LastUpdate=$LastUpdate.Kb
UpdateList=$_.Group.Kb -join ', '
Group=$_.Group
}
}
}

PowerShell ForEach removes leading zeros

I am kind of new with PowerShell and programming in general, so I hope you have some patience while reading this. Before I explain my problem, I feel like I have to first tell you some background information:
I have all my transactions saved in $Transactions. Each transaction has Receiver, Date and Amount.
I have grouped the yearly transactions into $TransactionsPerYear the following way:
$TransactionsPerYear = $Transactions | Group-Object { [int]($_.date -replace '.*\.') }
(Btw. Could someone explain the regex in the end for me, what each character does?)
Next thing I am doing is grouping yearly income and expenses into separate variables. After this I am trying to extract the months from each year and save them into $Months. The date is in the following format dd.MM.yyyy
Question 1:
Here's how I can get all the dates, but how do I extract just the months?
$TransactionsPerYear | Select -ExpandProperty Group | Select -ExpandProperty date | Select -Unique
Question 2:
Because I don't know how to extract the months, I've tried it the following way:
[String[]]$Months = "01","02","03","04","05","06","07","08","09","10","11","12"
When I have each month in $Months I am trying to get monthly transactions and save them into new variables:
ForEach($Month in $Months){
New-Variable -Name "Transactions_$Month$Year" -Value ($Transactions | Where {$_.Date -like "*.$Month.$Year"} | Group-Object 'Receiver' | Select-Object Count, Name, #{L="Total";E={$_ | Select -ExpandProperty Group | Measure-Object Amount -Sum | Select -ExpandProperty Sum}} | Sort-Object {[double]$_.Total})
}
The problem that I am facing here is that ForEach removes the leading zero from each month, and when this happens, this part in ForEach doesn't match with anything, and the new variable is null:
Where {$_.Date -like "*.$Month.$Year"}
Let me know if you need more info. I'd be really thankful if anyone could help me.
The date looks like: 25.02.2016
From your post, it looks like you've jumped further down the rabbithole than necessary.
Instead of trying to do string manipulation every time you need to interact with the Date property, simply turn it into a DateTime object!
$Transactions = $Transactions |Select-Object *,#{Name='DateParsed';Expression={[datetime]::ParseExact($_.Date, 'dd.MM.yyyy', $null)}}
The DateTime.ParseExact() method allows us to specify the format (eg. dd.MM.yyyy), and parse a string representation of a date.
Now you can group on year simply by:
$TransactionsPerYear = $Transactions |Group-Object { $_.DateParsed.Year }
To group by both Year and then Month, I'd create a nested hashtable, like so:
# Create a hashtable, containing one key per year
$MonthlyTransactions = #{}
foreach($Year in $Transactions |Group {$_.DateParsed.Year})
{
# Create another hashtable, containing a key for each month in that year
$MonthlyTransactions[$Year.Name] = #{}
foreach($Month in $Year.Group |Group {$_.DateParsed.Month})
{
# Add the transactions to the Monthly hashtable
$MonthlyTransactions[$Year.Name][$Month.Name] = $Month.Group
}
}
Now you can calculate the transaction value for a specific month by doing:
$TotalValueMay2010 = ($MonthlyTransactions[2010][5] |Measure-Object Amount -Sum).Sum
(Btw. Could someone explain the regex in the end for me, what each character does?)
Sure:
. # match any character
* # zero of more times
\. # match a literal . (dot)
Taking your own example input string 25.02.2016, the first group (.*) will match on 25.02, and \. will match on the . right after, so the only thing left is 2016.
Do you mean this?
$dates = ([DateTime] "1/1/2016"),([DateTime] "1/2/2016"),
([DateTime] "2/1/2016"),([DateTime] "3/1/2016")
$uniqueMonths = $dates | ForEach-Object { $_.Month } | Sort-Object -Unique
# $uniqueMonths contains 1,2,3

How can I extract the latest rows from a log file based on latest date using Powershell

I'm a relatively new Powershell user, and have what I thought was a simple question. I have spent a bit of time looking for similar scenarios and surprisingly haven't found any. I would post my failed attempts, but I can't even get close!
I have a log file with repetitive data, and I want to extract the latest event for each "unique" entry. The problem lies in the fact that each entry is unique due to the individual date stamp. The "unique" criteria is in Column 1.
Example:
AE0440,1,2,3,30/08/2012,12:00:01,XXX
AE0441,1,2,4,30/08/2012,12:02:01,XXX
AE0442,1,2,4,30/08/2012,12:03:01,XXX
AE0440,1,2,4,30/08/2012,12:04:01,YYY
AE0441,1,2,4,30/08/2012,12:06:01,XXX
AE0442,1,2,4,30/08/2012,12:08:01,XXX
AE0441,1,2,5,30/08/2012,12:10:01,ZZZ
Therefore the output I want would be (order not relevant):
AE0440,1,2,4,30/08/2012,12:04:01,YYY
AE0442,1,2,4,30/08/2012,12:08:01,XXX
AE0441,1,2,5,30/08/2012,12:10:01,ZZZ
How can I get this data/discard old data?
Try this, it may look a bit cryptic for first time user. It reads the content of the file, groups the lines by the unique value (now we have 3 groups), each group is sorted by parsing the date time value (again by splitting) and the first value is returned.
Get-Content .\log.txt | Group-Object { $_.Split(',')[0] } | ForEach-Object {
$_.Group | Sort-Object -Descending { [DateTime]::ParseExact(($_.Split(',')[-3,-2] -join ' '),'dd/MM/yyyy HH:mm:ss',$null) } | Select-Object -First 1
}
AE0440,1,2,4,30/08/2012,12:04:01,YYY
AE0441,1,2,5,30/08/2012,12:10:01,ZZZ
AE0442,1,2,4,30/08/2012,12:08:01,XXX
Assuming your data looks exactly like your example:
# you can give more meaningful names to the columns if you want. just make sure the number of columns matches
$data = import-csv .\data.txt -Header Col1,Col2,Col3,Col4,Col5,Col6,Col7
# sort all data by the timestamp, then group by the label in column 1
$grouped = $data | sort {[DateTime]::ParseExact("$($_.Col6) $($_.Col5)", 'HH:mm:ss dd/MM/yyyy', $Null)} -Desc | group Col1
# read off the first element of each group (element with latest timestamp)
$grouped |%{ $_.Group[0] }
This also assumes your timestamps are on a 24-hr clock. i.e. all of your sample data is close to 12 noon, not 12 midnight. One second after midnight would need to be specified '00:00:01'