Analyzing a CSV in Powershell - powershell

I am very new to Powershell(about 1 day to be more precise) and I am having what I assume are some syntax issues with some variables. I have a CSV file spreadsheet(converted from an Excel xlsx) with around 21 columns and 74,000 rows. The four columns of interest to me are columns having to do with an employees start date, a termination date, a department name, and a vice president they report to. I am trying to write a script that will return all employees whom have reached their start date, have not been terminated, work in a department that contains 'HR' in the name, and report to a specific VP. I will elaborate on my specific issues after the block of code.
$Lawson = Import-Csv .\Documents\Lawson_HR.csv
PS C:\Users\louiez> $startDate = $Lawson | where {$_.'LAW HIRE DATE' -le (Get-Date -format M-DD-YYYY)}
PS C:\Users\louiez> $endDate = $startDate | where {$_.'LAW TERM DATE' -eq ''}
PS C:\Users\louiez> $HR = $endDate | where {$_.'LAW DEPT NAME' -contains 'HR'}
PS C:\Users\louiez> $VP = $endDate | where {$_.'VICE PRESIDENT' -contains 'Croner'}
PS C:\Users\louiez> $startdate | Measure-Object
Count : 51641
Average :
Sum :
Maximum :
Minimum :
Property :
PS C:\Users\louiez> $enddate | Measure-Object
Count : 19428
Average :
Sum :
Maximum :
Minimum :
Property :
PS C:\Users\louiez> $HR | Measure-Object
Count : 0
Average :
Sum :
Maximum :
Minimum :
Property :
First, the startDate variable does not count the correct amount of items. I would like for it to count all rows in which the employee hire date is before today's date. the code in its current for returns about 51k items, it should be around 73k. (The endDate variable functions as it should.)
Second the HR variable returns 0 items, it should be several hundred. I would like for it to search the Dept Name field in each row for any instance on the letters 'HR'. Similarly I would like for the VP variable to return all items in which the Vice President column has a given name(in this case, Croner).
As I said, I am incredibly new to Powershell and have some very limited programming experience so I am not sure what in the syntax is causing these errors.

There are a couple of flaws in your design, the easy one:
$_.'LAW DEPT NAME' -contains 'HR'
$_.'VICE PRESIDENT' -contains 'Croner'
-contains is un-intuitive, it does not match text content, it matches items in a container of multiple items. Help about_Comparison_Operators for details. Use -match instead.
$_.'LAW DEPT NAME' -match 'HR'
$_.'VICE PRESIDENT' -match 'Croner'
The second is more complex:
$_.'LAW HIRE DATE' -le (Get-Date -format M-DD-YYYY)
$_.'LAW HIRE DATE' is probably going to return you a string of text, get-date with the -format parameter will return you a string of text, and -le will do alphabetical order sorting (with adjustments), which will be completely unreliable, saying the 1st Feb comes before 2nd Jan because it starts with 1.
Alphabetical order sorting is more workable on a date format like yyyy-MM-dd, but as wOxxOm comments, the right way is to process the date in the CSV into
a [datetime] object and then compare that with the current date as a [datetime] object. This will work more reliably for comparisons (give or take timezone and daylight savings time considerations).
[datetime]::ParseExact($_.'LAW HIRE DATE', 'dd-MM-yyyy', $null) -le (Get-Date)
Assuming that the LAW HIRE DATE is always and exactly in the format dd-MM-yyyy, otherwise this will fall over and you'll have to adjust to fit your data - or adjust your spreadsheet to fit your code.

Related

Powershell | filter array with get-date

I have an array of staff data($employees) and in that array their is a field called start_date which is listed as a string. I am trying to filter out any staff member who's start_date is equal to the current month. Initially I thought I could had just wrote:
$employee | Where-Object {$_.start_date -ge Get-Date()}
Which didn't work.
I then tried to convert the Get-Date to a string with:
$date = Get-Date -f '01/MM/yyyy'
$dateString = $date.toString
$employee | Where-Object {$_.start_date -ge $dateString}
Which also did not give me the results I was after also.
My Colleague who is fairly business with his work states, I might be best writing a foreEach-Object but I have always struggled how to write this, and don't wish to pull him away from his work to assist me on this task.
Example Array:
preferred_name : John
name_suffixes :
gender : M
date_of_birth : 09/05/1596
surname : Doe
supervisor : PRNMA01
start_date : 28/11/2020
school_email : j.doe#email.com
teacher_code :
termination_date :
employee_photo : #{file_info=}
supervisor_2 :
title : Mr
driver_licence_no :
supplier_code :
given_names : John Jane
Task:
Get the list of new staff and email the list to the department who looks after new staff training.
Any help on this would be good. Please ensure you explain in detail as i am not very bright in this field.
Thanks in advance.
You can convert the start_date into datetime using parseexact and then compare the datetime objects directly.
$employee | Where-Object {[datetime]::parseexact($_.start_date, 'dd/MM/yyyy', $null) -ge (Get-Date)}

Comparing dates of a CSV column in PowerShell

I have a CSV file spreadsheet (converted from an Excel xlsx) with around 21 columns and 74,000 rows. The four columns of interest to me are columns having to do with an employees start date, a termination date, a department name, and a vice president they report to.
I am trying to write a script that will return all employees whom have reached their start date, have not been terminated, work in a department that contains 'HR' in the name, and report to a specific VP. I will elaborate on my specific issues after the block of code.
$Lawson = Import-Csv .\Documents\Lawson_HR.csv
$startDate = $Lawson | where [datetime]::ParseExact($_.'LAW HIRE DATE', 'dd-MM-yyyy', $null) -le (Get-Date)
$endDate = $startDate | where {$_.'LAW TERM DATE' -eq ''}
$HR = $endDate | where {$_.'LAW DEPT NAME' -match 'HR'}
$VP = $endDate | where {$_.'VICE PRESIDENT' -match 'Croner'}
First, the $startDate variable does not work, I am unsure of the syntax needed to compare a given date (from the CSV) to today's date. (The $endDate variable functions as it should, but I was told that the method used is unreliable.)
Also, I would like to search the Dept Name column in each row for any instance of the letters 'HR' (note: dept names could be things like 'HR - Career Services' or 'HR - Diversity'. I want all rows that have 'HR' anywhere in the Dept Name field). I get the feeling the -match operator is not the way to do that, but I'm not certain.
Similarly, I would like for the $VP variable to return all items in which the Vice President column has a given name (in this case, Croner).
This line needs curly braces { } but looks otherwise OK to me:
$startDate = $Lawson | where { [datetime]::ParseExact($_.'LAW HIRE DATE', 'dd-MM-yyyy', $null) -le (Get-Date) }
To do a simple partial match you're better off using -Like and a wildcard character as -Match uses regex (although should work).
Also I just noticed you were piping the $enddate variable not $lawson:
$HR = $Lawson | where {$_.'LAW DEPT NAME' -like '*HR*'}
If you're trying to do all of these criteria together, just combine them with -and:
$Lawson | where { [datetime]::ParseExact($_.'LAW HIRE DATE', 'dd-MM-yyyy', $null) -le (Get-Date) -and $_.'LAW TERM DATE' -eq '' -and $_.'LAW DEPT NAME' -like '*HR*' -and $_.'VICE PRESIDENT' -match 'Croner'}

PowerShell ForEach removes leading zeros

I am kind of new with PowerShell and programming in general, so I hope you have some patience while reading this. Before I explain my problem, I feel like I have to first tell you some background information:
I have all my transactions saved in $Transactions. Each transaction has Receiver, Date and Amount.
I have grouped the yearly transactions into $TransactionsPerYear the following way:
$TransactionsPerYear = $Transactions | Group-Object { [int]($_.date -replace '.*\.') }
(Btw. Could someone explain the regex in the end for me, what each character does?)
Next thing I am doing is grouping yearly income and expenses into separate variables. After this I am trying to extract the months from each year and save them into $Months. The date is in the following format dd.MM.yyyy
Question 1:
Here's how I can get all the dates, but how do I extract just the months?
$TransactionsPerYear | Select -ExpandProperty Group | Select -ExpandProperty date | Select -Unique
Question 2:
Because I don't know how to extract the months, I've tried it the following way:
[String[]]$Months = "01","02","03","04","05","06","07","08","09","10","11","12"
When I have each month in $Months I am trying to get monthly transactions and save them into new variables:
ForEach($Month in $Months){
New-Variable -Name "Transactions_$Month$Year" -Value ($Transactions | Where {$_.Date -like "*.$Month.$Year"} | Group-Object 'Receiver' | Select-Object Count, Name, #{L="Total";E={$_ | Select -ExpandProperty Group | Measure-Object Amount -Sum | Select -ExpandProperty Sum}} | Sort-Object {[double]$_.Total})
}
The problem that I am facing here is that ForEach removes the leading zero from each month, and when this happens, this part in ForEach doesn't match with anything, and the new variable is null:
Where {$_.Date -like "*.$Month.$Year"}
Let me know if you need more info. I'd be really thankful if anyone could help me.
The date looks like: 25.02.2016
From your post, it looks like you've jumped further down the rabbithole than necessary.
Instead of trying to do string manipulation every time you need to interact with the Date property, simply turn it into a DateTime object!
$Transactions = $Transactions |Select-Object *,#{Name='DateParsed';Expression={[datetime]::ParseExact($_.Date, 'dd.MM.yyyy', $null)}}
The DateTime.ParseExact() method allows us to specify the format (eg. dd.MM.yyyy), and parse a string representation of a date.
Now you can group on year simply by:
$TransactionsPerYear = $Transactions |Group-Object { $_.DateParsed.Year }
To group by both Year and then Month, I'd create a nested hashtable, like so:
# Create a hashtable, containing one key per year
$MonthlyTransactions = #{}
foreach($Year in $Transactions |Group {$_.DateParsed.Year})
{
# Create another hashtable, containing a key for each month in that year
$MonthlyTransactions[$Year.Name] = #{}
foreach($Month in $Year.Group |Group {$_.DateParsed.Month})
{
# Add the transactions to the Monthly hashtable
$MonthlyTransactions[$Year.Name][$Month.Name] = $Month.Group
}
}
Now you can calculate the transaction value for a specific month by doing:
$TotalValueMay2010 = ($MonthlyTransactions[2010][5] |Measure-Object Amount -Sum).Sum
(Btw. Could someone explain the regex in the end for me, what each character does?)
Sure:
. # match any character
* # zero of more times
\. # match a literal . (dot)
Taking your own example input string 25.02.2016, the first group (.*) will match on 25.02, and \. will match on the . right after, so the only thing left is 2016.
Do you mean this?
$dates = ([DateTime] "1/1/2016"),([DateTime] "1/2/2016"),
([DateTime] "2/1/2016"),([DateTime] "3/1/2016")
$uniqueMonths = $dates | ForEach-Object { $_.Month } | Sort-Object -Unique
# $uniqueMonths contains 1,2,3

Compare Date from CSV Import To Get-Date

I have a CSV that contains a number of columns but I want to import just the description column, a department column and a date column. I then want to create a new object with the description, department and date information but only for items that have a date 45 days or older. I know that the Import-Csv is bringing in the "Item Date" column as a string so that I need to use something like Get-Date or datetime to get it to a date format for comparison.
$data = import-csv .\items.csv | select "Description", "Department", "Item Date"
$CheckDate = (Get-Date).AddDays(-45)
$data2 | Foreach {get-date $_."Item Date"} |
select "Description", "Department", "Item Date"
$newdata = $data2 | where {$data."Item Date" -lt $CheckDate}
There may be an easier way to do this or there may be a way to get this to work but I am having trouble.
Definitely some room for simplification here.
$CheckDate = (Get-Date).AddDays(-45)
$data = Import-Csv .\items.csv |
Where-Object {
($_."Item Date" -as [DateTime]) -lt $CheckDate
}
Just cast the "Item Date" string as a [DateTime] with the -as operator and then compare that to your $CheckDate in the Where-Object call.
Depending on the date format used in the CSV and the computer's regional settings simply casting the string to a DateTime value may or may not work. If you find that it doesn't use the ParseExact() method instead. And perhaps a calculated property, since you're selecting columns anyway.
$fmt = 'dd\/mm\/yyyy'
$culture = [Globalization.CultureInfo]::InvariantCulture
$data = Import-Csv .\items.csv |
Select-Object Description, Department, #{n='Item Date';e={
[DateTime]::ParseExact($_.'Item Date', $fmt, $culture)
}} |
Where-Object { $_.'Item Date' -lt $CheckDate }
Note that forward slashes in the format string must be escaped if you need to match literal forward slashes, otherwise they will match whatever date separator character is configured in the computer's regional settings.

Guidance with developing PowerShell script

I am not an advanced scripter by any means, but I have a task which I need to accomplish for work. The task is to create a script which looks at two pieces of information (date and capacity utilized in bytes) from each report file that is contained in a directory. These two pieces of information are located in the same place in each report. Then, using the date value, the script can report which was the highest capacity utilized value for each month. I am thinking of having the final output be in HTML format.
There are two options for acquiring the date value. The report contains the date in the format mm/dd/yyyy in the 3rd line of text and the time is included in the file name as the Epoch time.
So far, I have put together a PowerShell script that parses the date and the capacity utilized from the body of the report. This information is then added to an array.
I am looking for guidance on which date value would be better to use (Epoch time from file name or date from body of report) and what method would be best to utilize for looking at the data for each month and reporting the highest capacity utilization per month.
Here is my script so far:
#Construct an array to use for data export
$fileDirectory = "c:\Temp"
$Array1 = #()
foreach ($file in Get-ChildItem $fileDirectory)
{
#Obtain path to each file in directory
$filePath = $fileDirectory + "\" + $file
#Get content of each file during the loop
$data = Get-Content $filePath
#Create object to enter data into Array1
$myobj = "" | Select "Date","Capacity"
$dateStr = ($data[2].Split(" "))[3]
[long]$capacityStr = ($data[19].Split(","))[2]
[single]$CapacityConv = $capacityStr
$capacityConv = ($capacityConv /= 1099511627776)
#Fill the object myobj
$myobj.date = $dateStr
$myobj.capacity = $capacityConv
#Add the object to Array1
$Array1 += $myobj
#Wipe the object
$myobj = $null
}
#After the loop, export the array to CSV file
$Array1 | export-csv "c:\Scripts\test-output.csv"
$Array1
pause
For the date, it's really up to you. If they're equally accurate then it's a matter of opinion.
For the capacity, I'm assuming these are daily reports given the date format, and you want the highest capacity for a given month.
Since you're creating an object containing a Date and a Capacity property for each report, using the returned array of all those values, you should be able to get the information you need like this:
$Array1 | Group-Object {([DateTime]$_.Date).ToString('MMMM')} | Select-Object Name,#{Name='MaxCap';Expression={ $_.Group.Capacity | Measure-Object -Maximum | Select-Object -ExpandProperty Maximum }}
Now this is kind of a lot so let's break it down.
Group-Object groups your array based on a property. In this case, we want to group by month, but you don't have a month property, so instead of a property name we're passing in a script block to calculate the property on the fly:
([DateTime]$_.Date).ToString('MMMM')
This casts your Date property (which is a [String]) into a [DateTime] object. Then we use .ToString('MMMM') to format it into a month name.
The result will be an array of group objects, where the Name property is the name of the group (in this case it will be the month name) and the Group property will contain all of the original objects that belonged to that group.
Piping this into Select-Object, we want the Name (the month), and then we're creating a new property on the fly named MaxCap:
$_.Group.Capacity | Measure-Object -Maximum | Select-Object -ExpandProperty Maximum
So we take the current Group (the array of all the objects for that month), then expand its Capacity property so now we have an array of all the capacities for the group.
Pipe that into Measure-Object -Maximum to get the max value, then Select-Object -ExpandProperty Maximum (because Measure-Object returns an object with a Maximum property and we just want the value).
The end result is an object with the month and the maximum capacity for that month.