Powershell replace one instance in a csv - powershell

I need to keep track of port assignments for users. I have a csv that contains this:
USERNAME,GUI,DCS,SRS,LATC,TRSP
joeblow,8536,10631,5157,12528,14560
,8118,10979,5048,12775,14413
,8926,10303,5259,12371,14747
,8351,10560,5004,12049,14530
johndoe,8524,10267,5490,12809,14493
,8194,10191,5311,12275,14201
,8756,10813,5714,12560,14193
,8971,10006,5722,12078,14378
janblow,8410,10470,5999,12123,14610
bettydoe,8611,10448,5884,12040,14923
,8581,10965,5832,12400,14230
,8708,10005,5653,12111,14374
,8493,10016,5464,12827,14115
I need to be able to add users and remove users which will leave the csv looking as it does now. I have the remove part with this bit of code:
[io.file]::readalltext("c:\scripts\RNcsv.csv").replace("$username","") | Out-File c:\scripts\RNcsv.csv -Encoding ascii -Force
I tried the reverse of the code above but it does not want to work with empty value in that context. I have been unsuccessful finding a way to add $username to a single record. The first record with an empty name column to be precise. So when joeshmo comes along he ends up in the record below joeblow. This csv represents that people have come and gone.

I would take an object oriented approach using Import-Csv and a re-usable function that takes the input from pipeline:
function Add-User {
param(
[Parameter(Mandatory)]
[string] $Identity,
[Parameter(Mandatory, ValueFromPipeline, DontShow)]
[object] $InputObject
)
begin { $processed = $false }
process {
# if the user has already been added or the UserName column is populated
if($processed -or -not [string]::IsNullOrWhiteSpace($InputObject.UserName)) {
# output this object as-is and go to the next object
return $InputObject
}
# if above condition was not met we can assume this is an empty value in the
# UserName column, so set the new Identity to this row
$InputObject.UserName = $Identity
# output this object
$InputObject
# and set this variable to `$true` to skip further updates on the csv
$processed = $true
}
}
Adding a new user to the Csv would be:
(Import-Csv .\test.csv | Add-User -Identity santiago) | Export-Csv .\test.csv -NoTypeInformation
Note that, since the above is reading and writing to the same file in a single pipeline, the use of the Grouping operator ( ) is mandatory to consume all output from Import-Csv and hold the object in memory. Without it you would end up with an empty file.
Otherwise just break it into 2 steps (again, this is only needed if reading and writing to the same file):
$csv = Import-Csv .\test.csv | Add-User -Identity santiago
$csv | Export-Csv .\test.csv -NoTypeInformation
Adding this slight modification to the function posted above allowing the ability to add multiple users in one function call. All credits to iRon for coming up with a clever and and concise solution.
function Add-User {
param(
[Parameter(Mandatory)]
[string[]] $Identity,
[Parameter(Mandatory, ValueFromPipeline, DontShow)]
[object] $InputObject
)
begin { [System.Collections.Queue] $queue = $Identity }
process {
# if there are no more Identities in Queue or the UserName column is populated
if(-not $queue.Count -or -not [string]::IsNullOrWhiteSpace($InputObject.UserName)) {
# output this object as-is and go to the next object
return $InputObject
}
# if above condition was not met we can assume this is an empty value in the
# UserName column, so dequeue this Identity and set it to this row
$InputObject.UserName = $queue.Dequeue()
# output this object
$InputObject
}
}
(Import-Csv .\test.csv | Add-User -Identity Santiago, 4evernoob, mrX, iRon) | Export-Csv ...

In addition to where you ask for and #Santiago's helpful answer (and note), you might want to be able to add multiple usernames at once to avoid that you need to recreate the whole file for each user you want to add.
$Csv = ConvertFrom-Csv #'
USERNAME, GUI, DCS, SRS, LATC, TRSP
joeblow, 8536, 10631, 5157, 12528, 14560
, 8118, 10979, 5048, 12775, 14413
, 8926, 10303, 5259, 12371, 14747
, 8351, 10560, 5004, 12049, 14530
johndoe, 8524, 10267, 5490, 12809, 14493
, 8194, 10191, 5311, 12275, 14201
, 8756, 10813, 5714, 12560, 14193
, 8971, 10006, 5722, 12078, 14378
janblow, 8410, 10470, 5999, 12123, 14610
bettydoe, 8611, 10448, 5884, 12040, 14923
, 8581, 10965, 5832, 12400, 14230
, 8708, 10005, 5653, 12111, 14374
, 8493, 10016, 5464, 12827, 14115
'#
$NewUser = 'Santiago', '4evernoob', 'mrX', 'iRon'
$Csv |ForEach-Object { $i = 0 } {
if (!$_.USERNAME) { $_.USERNAME = $NewUser[$i++] }
$_
} |Format-Table
USERNAME GUI DCS SRS LATC TRSP
-------- --- --- --- ---- ----
joeblow 8536 10631 5157 12528 14560
Santiago 8118 10979 5048 12775 14413
4evernoob 8926 10303 5259 12371 14747
mrX 8351 10560 5004 12049 14530
johndoe 8524 10267 5490 12809 14493
iRon 8194 10191 5311 12275 14201
8756 10813 5714 12560 14193
8971 10006 5722 12078 14378
janblow 8410 10470 5999 12123 14610
bettydoe 8611 10448 5884 12040 14923
8581 10965 5832 12400 14230
8708 10005 5653 12111 14374
8493 10016 5464 12827 14115
Note that an outbound index (as e.g. NewUser[99]) returns a $Null (which is casted to an empty string) by default. This feature will produce an error if you set the StricMode to a higher level.
To overcome this, you might also do something like this instead:
if (!$_.USERNAME -and $i -lt #($NewUser).Count) { ...

Related

is there a simple way to output to xlsx?

I am trying to output a query from a DB to a xlsx but it takes so much time to do this because there about 20,000 records to process, is there a simpler way to do this?
I know there is a way to do it for csv but im trying to avoid that, because if the records had any comma is going to take it as a another column and that would mess with the info
this is my code
$xlsObj = New-Object -ComObject Excel.Application
$xlsObj.DisplayAlerts = $false
$xlsWb = $xlsobj.Workbooks.Add(1)
$xlsObj.Visible = 0 #(visible = 1 / 0 no visible)
$xlsSh = $xlsWb.Worksheets.Add([System.Reflection.Missing]::Value, $xlsWb.Worksheets.Item($xlsWb.Worksheets.Count))
$xlsSh.Name = "QueryResults"
$DataSetTable= $ds.Tables[0]
Write-Output "DATA SET TABLE" $DataSetTable
[Array] $getColumnNames = $DataSetTable.Columns | SELECT *
Write-Output "COLUMN NAMES" $DataSetTable.Rows[0]
[Int] $RowHeader = 1
foreach ($ColH in $getColumnNames)
{
$xlsSh.Cells.item(1, $RowHeader).font.bold = $true
$xlsSh.Cells.item(1, $RowHeader) = $ColH.ColumnName
Write-Output "Nombre de Columna"$ColH.ColumnName
$RowHeader++
}
[Int] $rowData = 2
[Int] $colData = 1
foreach ($rec in $DataSetTable.Rows)
{
foreach ($Coln in $getColumnNames)
{
$xlsSh.Cells.NumberFormat = "#"
$xlsSh.Cells.Item($rowData, $colData) = $rec.$($Coln.ColumnName).ToString()
$ColData++
}
$rowData++; $ColData = 1
}
$xlsRng = $xlsSH.usedRange
[void] $xlsRng.EntireColumn.AutoFit()
#Se elimina la pestaña Sheet1/Hoja1.
$xlsWb.Sheets(1).Delete() #Versión 02
$xlsFile = "directory of the file"
[void] $xlsObj.ActiveWorkbook.SaveAs($xlsFile)
$xlsObj.Quit()
Start-Sleep -Milliseconds 700
While ([System.Runtime.Interopservices.Marshal]::ReleaseComObject($xlsRng)) {''}
While ([System.Runtime.Interopservices.Marshal]::ReleaseComObject($xlsSh)) {''}
While ([System.Runtime.Interopservices.Marshal]::ReleaseComObject($xlsWb)) {''}
While ([System.Runtime.Interopservices.Marshal]::ReleaseComObject($xlsObj)) {''}
[gc]::collect() | Out-Null
[gc]::WaitForPendingFinalizers() | Out-Null
$oraConn.Close()
I'm trying to avoid [CSV files], because if the records had any comma is going to take it as a another column and that would mess with the info
That's only the case if you try to construct the output format manually. Builtin commands like Export-Csv and ConvertTo-Json will automatically quote the values as necessary:
PS C:\> $customObject = [pscustomobject]#{ID = 1; Name = "Solis, Heber"}
PS C:\> $customObject
ID Name
-- ----
1 Solis, Heber
PS C:\> $customObject |ConvertTo-Csv -NoTypeInformation
"ID","Name"
"1","Solis, Heber"
Notice, in the example above, how:
The string value assigned to $customObject.Name does not contain any quotation marks, but
In the output from ConvertTo-Csv we see values and headers clearly enclosed in quotation marks
PowerShell automatically enumerates the row data when you pipe a [DataTable] instance, so creating a CSV might (depending on the contents) be as simple as:
$ds.Tables[0] |Export-Csv table_out.csv -NoTypeInformation
What if you want TAB-separated values (or any other non-comma separator)?
The *-Csv commands come with a -Delimiter parameter to which you can pass a user-defined separator:
# This produces semicolon-separated values
$data |Export-Csv -Path output.csv -Delimiter ';'
I usually try and refrain from recommending specific modules libraries, but if you insist on writing to XSLX I'd suggest checking out ImportExcel (don't let the name fool you, it does more than import from excel, including exporting and formatting data from PowerShell -> XSLX)

PowerShell script to group records by overlapping start and end date

I am working on a CSV file which have start and end date and the requirement is group records by dates when the dates overlap each other.
For example, in below table Bill_Number 177835 Start_Date and End_Date is overlapping with 178682,179504, 178990 Start_Date and End_Date so all should be grouped together and so on for each and every record.
Bill_Number,Start_Date,End_Date
177835,4/14/20 3:00 AM,4/14/20 7:00 AM
178682,4/14/20 3:00 AM,4/14/20 7:00 AM
179504,4/14/20 3:29 AM,4/14/20 6:29 AM
178662,4/14/20 4:30 AM,4/14/20 5:30 AM
178990,4/14/20 6:00 AM,4/14/20 10:00 AM
178995,4/15/20 6:00 AM,4/15/20 10:00 AM
178998,4/15/20 6:00 AM,4/15/20 10:00 AM
I have tried different combination like "Group-by" and "for loop" but not able to produce result.
With the above example of CSV, the expected result is;
Group1: 177835,178682,179504, 178990
Group2: 177835,178682,179504, 178662
Group3: 178995, 178998
Currently i have below code in hand.
Any help on this will be appreciated,thanks in advance.
$array = #(‘ab’,’bc’,’cd’,’df’)
for ($y = 0; $y -lt $array.count) {
for ($x = 0; $x -lt $array.count) {
if ($array[$y]-ne $array[$x]){
Write-Host $array[$y],$array[$x]
}
$x++
}
$y++
}
You can do something like the following. There is likely a cleaner solution, but that could take a lot of time.
$csv = Import-Csv file.csv
# Creates all inclusive groups where times overlap
$csvGroups = foreach ($row in $csv) {
$start = [datetime]$row.Start_Date
$end = [datetime]$row.End_Date
,($csv | where { ($start -ge [datetime]$_.Start_Date -and $start -le [datetime]$_.End_Date) -or ($end -ge [datetime]$_.Start_Date -and $end -le [datetime]$_.End_Date) })
}
# Removes duplicates from $csvGroups
$groups = $csvGroups | Group {$_.Bill_number -join ','} |
Foreach-Object { ,$_.Group[0] }
# Compares current group against all groups except itself
$output = for ($i = 0; $i -lt $groups.count; $i++) {
$unique = $true # indicates if the group's bill_numbers are in another group
$group = $groups[$i]
$list = $groups -as [system.collections.arraylist]
$list.RemoveAt($i) # Removes self
foreach ($innerGroup in $list) {
# If current group's bill_numbers are in another group, skip to next group
if ((compare $group.Bill_Number $innergroup.Bill_Number).SideIndicator -notcontains '<=') {
$unique = $false
break
}
}
if ($unique) {
,$group
}
}
$groupCounter = 1
# Output formatting
$output | Foreach-Object { "Group{0}:{1}" -f $groupCounter++,($_.Bill_Number -join ",")}
Explanation:
I added comments to give an idea as to what is going on.
The ,$variable syntax uses the unary operator ,. It converts the output into an array. Typically, PowerShell unrolls an array as individual items. The unrolling becomes a problem here because we want the groups to stays as groups (arrays). Otherwise, there would be a lot of duplicate bill numbers, and we'd lose track between groups.
An arraylist is used for $list. This is so we can access the RemoveAt() method. A typical array is of fixed size and can't be manipulated in that fashion. This can effectively be done with an array, but the code is different. You either have to select the index ranges around the item you want to skip or create a new array using some other conditional statement that will exclude the target item. An arraylist is just easier for me (personal preference).
So a very dirty approach. I think there are a coup of ways to determine if there's overlap for a specific comparison, one record to another. However you may need a list of bill numbers each bill date range collides with. using a function call in a Select-Object statement/expression I added a collisions property to your objects.
The function is wordy and probably be improved, but the gist is that for each record it will compare to all other records and report that bill number in it's collision property if either the start or end date falls within the other records range.
This is of course just demo code, I'm sure it can be made better for your purposes, but may be a starting point for you.
Obviously change the path to the CSV file.
Function Get-Collisions
{
Param(
[Parameter(Mandatory = $true)]
[Object]$ReferenceObject,
[Parameter( Mandatory = $true )]
[Object[]]$CompareObjects
) # End Parameter Block
ForEach($Object in $CompareObjects)
{
If( !($ReferenceObject.Bill_Number -eq $Object.Bill_Number) )
{
If(
( $ReferenceObject.Start_Date -ge $Objact.StartDate -and $ReferenceObject.Start_Date -le $Objact.End_Date ) -or
( $ReferenceObject.End_Date -ge $Object.Start_Date -and $ReferenceObject.End_Date -le $Object.End_Date ) -or
( $ReferenceObject.Start_Date -le $Object.Start_Date -and $ReferenceObject.End_Date -ge $Object.Start_Date )
)
{
$Object.Bill_Number
}
}
}
} # End Get-Collisions
$Objects = Import-Csv 'C:\temp\DateOverlap.CSV'
$Objects |
ForEach-Object{
$_.Start_Date = [DateTime]$_.Start_Date
$_.End_Date = [DateTime]$_.End_Date
}
$Objects = $Objects |
Select-object *,#{Name = 'Collisions'; Expression = { Get-Collisions -ReferenceObject $_ -CompareObjects $Objects }}
$Objects | Format-Table -AutoSize
Let me know how it goes. Thanks.
#Shan , I saw your comments so I wanted to respond with some additional code and discussion. I may have gone overboard, but you expressed a desire to learn, such that you can maintain these code pieces in the future. So, I put a lot of time into this.
I may mention some of #AdminOfThings work too. That is not criticism, but collaboration. His example is clever and dynamic in terms of getting the job done and pulling in the right tools as he worked his way to the desired output.
I originally side-stepped the grouping question because I didn't feel like naming/numbering the groups had any meaning. For example: "Group 1" indicates all its members have overlap in their billing periods, but no indication of what or when the overlap is. Maybe I rushed through it… I may have been reading too much into it or perhaps even letting my own biases get in the way. At any rate, I elected to create a relationship from the perspective of each bill number, and that resulted in my first answer.
Since then, and because of your comment, I put effort into extending and documenting the first example I gave. The revised code will be Example 1 below. I've heavily commented it and most of the comments will apply to the original example as well. There are some differences that were forced by the extended grouping functionality, but the comments should reflect those situations.
Note: You'll also see I stopped calling them "collisions" and termed them "overlaps" instead.
Example 1:
Function Get-Overlaps
{
<#
.SYNOPSIS
Given an object (reference object) compare to a collection of other objects of the same
type. Return an array of billing numbers for which the billing period overlaps that of
the reference object.
.DESCRIPTION
Given an object (reference object) compare to a collection of other objects of the same
type. Return an array of billing numbers for which the billing period overlaps that of
the reference object.
.PARAMETER ReferenceObject
This is the current object you wish to compare to all other objects.
.PARAMETER
The collection of objects you want to compare with the reference object.
.NOTES
> The date time casting could probably have been done by further preparing
the objects in the calling code. However, givin this is for a
StackOverflow question I can polish that later.
#>
Param(
[Parameter(Mandatory = $true)]
[Object]$ReferenceObject,
[Parameter( Mandatory = $true )]
[Object[]]$CompareObjects
) # End Parameter Block
[Collections.ArrayList]$Return = #()
$R_StartDate = [DateTime]$ReferenceObject.Start_Date
$R_EndDate = [DateTime]$ReferenceObject.End_Date
ForEach($Object in $CompareObjects)
{
$O_StartDate = [DateTime]$Object.Start_Date
$O_EndDate = [DateTime]$Object.End_Date
# The first if statement skips the reference object's bill_number
If( !($ReferenceObject.Bill_Number -eq $Object.Bill_Number) )
{
# This logic can use some explaining. So far as I could tell there were 2 cases to look for:
# 1) Either or both the start and end dates fell inside the the timespan of the comparison
# object. This cases is handle by the first 2 conditions.
# 2) If the reference objects timespan covers the entire timespan of the comparison object.
# Meaning the start date is before and the end date is after, fitting the entire
# comparison timespan is within the bounds of the reference timespan. I elected to use
# the 3rd condition below to detect that case because once the start date is earlier I
# only have to care if the end date is greater than the start date. It's a little more
# inclusive and partially covered by the previous conditions, but whatever, you gotta
# pick something...
#
# Note: This was a deceptively difficult thing to comprehend, I missed that last condition
# in my first example (later corrected) and I think #AdminOfThings also overlooked it.
If(
( $R_StartDate -ge $O_StartDate -and $R_StartDate -le $O_EndDate ) -or
( $R_EndDate -ge $O_StartDate -and $R_EndDate -le $O_EndDate ) -or
( $R_StartDate -le $O_StartDate -and $R_EndDate -ge $O_StartDate )
)
{
[Void]$Return.Add( $Object.Bill_Number )
}
}
}
Return $Return
} # End Get-Overlaps
$Objects =
Import-Csv 'C:\temp\DateOverlap.CSV' |
ForEach-Object{
# Consider overlap as a relationship from the perspective of a given Object.
$Overlaps = [Collections.ArrayList]#(Get-overlaps -ReferenceObject $_ -CompareObjects $Objects)
# Knowing the overlaps I can infer the group, by adding the group's bill_number to its group property.
If( $Overlaps )
{ # Don't calculate a group unless you actually have overlaps:
$Group = $Overlaps.Clone()
[Void]$Group.Add( $_.Bill_Number ) # Can you do in the above line, but for readability I separated it.
}
Else { $Group = $null } # Ensure's not reusing group from a previous iteration of the loop.
# Create a new PSCustomObject with the data so far.
[PSCustomObject][Ordered]#{
Bill_Number = $_.Bill_Number
Start_Date = [DateTime]$_.Start_Date
End_Date = [DateTime]$_.End_Date
Overlaps = $Overlaps
Group = $Group | Sort-Object # Sorting will make it a lot easier to get unique lists later.
}
}
# The reason I recreated the objects from the CSV file instead of using Select-Object as I had
# previously is that I simply couldn't get Select-Object to maintain type ArrayList that was being
# returned from the function. I know that's a documented problem or circumstance some where.
# Now I'll add one more property called Group_ID a comma delimited string that we can later use
# to echo the groups according to your original request.
$Objects =
$Objects |
Select-Object *,#{Name = 'Group_ID'; Expression = { $_.Group -join ', ' } }
# This output is just for the sake of showing the new objects:
$Objects | Format-Table -AutoSize -Wrap
# Now create an array of unique Group_ID strings, this is possible of the sorts and joins done earlier.
$UniqueGroups = $Objects.Group_ID | Select-Object -Unique
$Num = 1
ForEach($UniqueGroup in $UniqueGroups)
{
"Group $Num : $UniqueGroup"
++$Num # Increment the $Num, using convienient unary operator, so next group is echoed properly.
}
# Below is a traditional for loop that does the same thing. I did that first before deciding the ForEach
# was cleaner. Leaving it commented below, because you're on a learning-quest, so just more demo code...
# For($i = 0; $i -lt $UniqueGroups.Count; ++$i)
# {
# $Num = $i + 1
# $UniqueGroup = $UniqueGroups[$i]
# "Group $Num : $UniqueGroup"
# }
Example 2:
$Objects =
Import-Csv 'C:\temp\DateOverlap.CSV' |
Select-Object Bill_Number,
#{ Name = 'Start_Date'; Expression = { [DateTime]$_.Start_Date } },
#{ Name = 'End_Date'; Expression = { [DateTime]$_.End_Date } }
# The above select statement converts the Start_Date & End_Date properties to [DateTime] objects
# While you had asked to pack everything into the nested loops, that would have resulted in
# unnecessary recasting of object types to ensure proper comparison. Often this is a matter of
# preference, but in this case I think it's better. I did have it working well without the
# above select, but the code is more readable / concise with it. So even if you treat the
# Select-Object command as a blackbox the rest of the code should be easier to understand.
#
# Of course, and if you couldn't tell from my samples Select-Object is incredibly useful. I
# recommend taking the time to learn it thoroughly. The MS documentation can be found here:
# https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-object?view=powershell-5.1
:Outer ForEach( $ReferenceObject in $Objects )
{
# In other revisions I had assigned these values to some shorter variable names.
# I took that out. Again since you're learning I wanted the all the dot referencing
# to be on full display.
$ReferenceObject.Start_Date = $ReferenceObject.Start_Date
$ReferenceObject.End_Date = $ReferenceObject.End_Date
[Collections.ArrayList]$TempArrList = #() # Reset this on each iteration of the outer loop.
:Inner ForEach( $ComparisonObject in $Objects )
{
If( $ComparisonObject.Bill_Number -eq $ReferenceObject.Bill_Number )
{ # Skip the current reference object in the $Objects collection! This prevents the duplication of
# the current Bill's number within it's group, helping to ensure unique-ification.
#
# By now you should have seen across all revision including AdminOfThings demo, that there was some
# need skip the current item when searching for overlaps. And, that there are a number of ways to
# accomplish that. In this case I simply go back to the top of the loop when the current record
# is encountered, effectively skipping it.
Continue Inner
}
# The below logic needs some explaining. So far as I could tell there were 2 cases to look for:
# 1) Either or both the start and end dates fell inside the the timespan of the comparison
# object. This cases is handle by the first 2 conditions.
# 2) If the reference object's timespan covers the entire timespan of the comparison object.
# Meaning the start date is before and the end date is after, fitting the entire
# comparison timespan is within the bounds of the reference timespan. I elected to use
# the 3rd condition below to detect that case because once the start date is earlier I
# only have to care if the end date is greater than the other start date. It's a little
# more inclusive and partially covered by the previous conditions, but whatever, you gotta
# pick something...
#
# Note: This was a deceptively difficult thing to comprehend, I missed that last condition
# in my first example (later corrected) and I think #AdminOfThings also overlooked it.
If(
( $ReferenceObject.Start_Date -ge $ComparisonObject.Start_Date -and $ReferenceObject.Start_Date -le $ComparisonObject.End_Date ) -or
( $ReferenceObject.End_Date -ge $ComparisonObject.Start_Date -and $ReferenceObject.End_Date -le $ComparisonObject.End_Date ) -or
( $ReferenceObject.Start_Date -le $ComparisonObject.Start_Date -and $ReferenceObject.End_Date -ge $ComparisonObject.Start_Date )
)
{
[Void]$TempArrList.Add( $ComparisonObject.Bill_Number )
}
}
# Now Add the properties!
$ReferenceObject | Add-Member -Name Overlaps -MemberType NoteProperty -Value $TempArrList
If( $ReferenceObject.Overlaps )
{
[Void]$TempArrList.Add($ReferenceObject.Bill_Number)
$ReferenceObject | Add-Member -Name Group -MemberType NoteProperty -Value ( $TempArrList | Sort-Object )
$ReferenceObject | Add-Member -Name Group_ID -MemberType NoteProperty -Value ( $ReferenceObject.Group -join ', ' )
# Below a script property also works, but I think the above is easier to follow:
# $ReferenceObject | Add-Member -Name Group_ID -MemberType ScriptProperty -Value { $this.Group -join ', ' }
}
Else
{
$ReferenceObject | Add-Member -Name Group -MemberType NoteProperty -Value $null
$ReferenceObject | Add-Member -Name Group_ID -MemberType NoteProperty -Value $null
}
}
# This output is just for the sake of showing the new objects:
$Objects | Format-Table -AutoSize -Wrap
# Now create an array of unique Group_ID strings, this is possible of the sorts and joins done earlier.
#
# It's important to point out I chose to sort because I saw the clever solution that AdminOfThings
# used. There's a need to display only groups that have unique memberships, not necessarily unique
# ordering of the members. He identified these by doing some additional loops and using the Compare
# -Object cmdlet. Again, I must say that was very clever, and Compare-Object is another tool very much
# worth getting to know. However, the code didn't seem like it cared which of the various orderings it
# ultimately output. Therefore I could conclude the order wasn't really important, and it's fine if the
# groups are sorted. With the objects sorted it's much easier to derive the truely unique lists with the
# simple Select-Object command below.
$UniqueGroups = $Objects.Group_ID | Select-Object -Unique
# Finally Loop through the UniqueGroups
$Num = 1
ForEach($UniqueGroup in $UniqueGroups)
{
"Group $Num : $UniqueGroup"
++$Num # Increment the $Num, using convienient unary operator, so next group is echoed properly.
}
Additional Discussion:
Hopefully the examples are helpful. I wanted to mentioned a few more points:
Using ArrayLists ( [System.Collections.ArrayList] ) instead of native arrays. The typical reason to do this is the ability to add and remove elements quickly. If you search the internet you'll find hundreds of articles explaining why it's faster. It's so common you'll often find experienced PowerShell users implementing it instinctively. But the main reason is speed and the flexibility to easily add and remove elements.
You'll notice I relied heavily on the ability to append new properties to objects. There are several ways to do this, Select-Object , Creating your own objects, and in Example 2 above I used Get-Member. The main reason I used Get-Member was I couldn't get the ArrayList type to stick when using Select-Object.
Regarding loops. This is specific to your desire for nested loops. My first answer still had loops, except some were implied by the pipe, and others were stored in a helper function. The latter is really also a preference; for readability it's sometimes helpful to park some code out of view from the main code body. That said, all the same concepts were there from the beginning. You should get comfortable with the implied loop that comes with pipe-lining capability.
I don't think there's much more I can say without getting redundant. I really hope this was helpful, it was certainly fun for me to work on it. If you have questions or feedback let me know. Thanks.

how to write streaming function in powershell

I tried to create a function that emulates Linux's head:
Function head( )
{
[CmdletBinding()]
param (
[parameter(mandatory=$false, ValueFromPipeline=$true)] [Object[]] $inputs,
[parameter(position=0, mandatory=$false)] [String] $liness = "10",
[parameter(position=1, ValueFromRemainingArguments=$true)] [String[]] $filess
)
$lines = 0
if (![int]::TryParse($liness, [ref]$lines)) {
$lines = 10
$filess = ,$liness + (#{$true=#();$false=$filess}[$null -eq $filess])
}
$read = 0
$input | select-object -First $lines
if ($filess) {
get-content -TotalCount $lines $filess
}
}
The problem is that this will actually read all the content (whether by reading $filess or from $input) and then print the first, where I'd want head to read the first lines and forget about the rest so it can work with large files.
How can this function be rewritten?
Well, as far as I know, you are overdoing it slightly...
"Beginning in Windows PowerShell 3.0, Select-Object includes an optimization feature that prevents commands from creating and processing objects that are not used. When you include a Select-Object command with the First or Index parameter in a command pipeline, Windows PowerShell stops the command that generates the objects as soon as the selected number of objects is generated, even when the command that generates the objects appears before the Select-Object command in the pipeline. To turn off this optimizing behavior, use the Wait parameter."
So all you need to do is:
Get-Content -Path somefile | Select-Object -First 10 #or pass a variable

How does Select-Object stop the pipeline in PowerShell v3?

In PowerShell v2, the following line:
1..3| foreach { Write-Host "Value : $_"; $_ }| select -First 1
Would display:
Value : 1
1
Value : 2
Value : 3
Since all elements were pushed down the pipeline. However, in v3 the above line displays only:
Value : 1
1
The pipeline is stopped before 2 and 3 are sent to Foreach-Object (Note: the -Wait switch for Select-Object allows all elements to reach the foreach block).
How does Select-Object stop the pipeline, and can I now stop the pipeline from a foreach or from my own function?
Edit: I know I can wrap a pipeline in a do...while loop and continue out of the pipeline. I have also found that in v3 I can do something like this (it doesn't work in v2):
function Start-Enumerate ($array) {
do{ $array } while($false)
}
Start-Enumerate (1..3)| foreach {if($_ -ge 2){break};$_}; 'V2 Will Not Get Here'
But Select-Object doesn't require either of these techniques so I was hoping that there was a way to stop the pipeline from a single point in the pipeline.
Check this post on how you can cancel a pipeline:
http://powershell.com/cs/blogs/tobias/archive/2010/01/01/cancelling-a-pipeline.aspx
In PowerShell 3.0 it's an engine improvement. From the CTP1 samples folder ('\Engines Demos\Misc\ConnectBugFixes.ps1'):
# Connect Bug 332685
# Select-Object optimization
# Submitted by Shay Levi
# Connect Suggestion 286219
# PSV2: Lazy pipeline - ability for cmdlets to say "NO MORE"
# Submitted by Karl Prosser
# Stop the pipeline once the objects have been selected
# Useful for commands that return a lot of objects, like dealing with the event log
# In PS 2.0, this took a long time even though we only wanted the first 10 events
Start-Process powershell.exe -Args '-Version 2 -NoExit -Command Get-WinEvent | Select-Object -First 10'
# In PS 3.0, the pipeline stops after retrieving the first 10 objects
Get-WinEvent | Select-Object -First 10
After trying several methods, including throwing StopUpstreamCommandsException, ActionPreferenceStopException, and PipelineClosedException, calling $PSCmdlet.ThrowTerminatingError and $ExecutionContext.Host.Runspace.GetCurrentlyRunningPipeline().stopper.set_IsStopping($true) I finally found that just utilizing select-object was the only thing that didn't abort the whole script (versus just the pipeline). [Note that some of the items mentioned above require access to private members, which I accessed via reflection.]
# This looks like it should put a zero in the pipeline but on PS 3.0 it doesn't
function stop-pipeline {
$sp = {select-object -f 1}.GetSteppablePipeline($MyInvocation.CommandOrigin)
$sp.Begin($true)
$x = $sp.Process(0) # this call doesn't return
$sp.End()
}
New method follows based on comment from OP. Unfortunately this method is a lot more complicated and uses private members. Also I don't know how robust this - I just got the OP's example to work and stopped there. So FWIW:
# wh is alias for write-host
# sel is alias for select-object
# The following two use reflection to access private members:
# invoke-method invokes private methods
# select-properties is similar to select-object, but it gets private properties
# Get the system.management.automation assembly
$smaa=[appdomain]::currentdomain.getassemblies()|
? location -like "*system.management.automation*"
# Get the StopUpstreamCommandsException class
$upcet=$smaa.gettypes()| ? name -like "*upstream*"
filter x {
[CmdletBinding()]
param(
[parameter(ValueFromPipeline=$true)]
[object] $inputObject
)
process {
if ($inputObject -ge 5) {
# Create a StopUpstreamCommandsException
$upce = [activator]::CreateInstance($upcet,#($pscmdlet))
$PipelineProcessor=$pscmdlet.CommandRuntime|select-properties PipelineProcessor
$commands = $PipelineProcessor|select-properties commands
$commandProcessor= $commands[0]
$null = $upce.RequestingCommandProcessor|select-properties *
$upce.RequestingCommandProcessor.commandinfo =
$commandProcessor|select-properties commandinfo
$upce.RequestingCommandProcessor.Commandruntime =
$commandProcessor|select-properties commandruntime
$null = $PipelineProcessor|
invoke-method recordfailure #($upce, $commandProcessor.command)
1..($commands.count-1) | % {
$commands[$_] | invoke-method DoComplete
}
wh throwing
throw $upce
}
wh "< $inputObject >"
$inputObject
} # end process
end {
wh in x end
}
} # end filter x
filter y {
[CmdletBinding()]
param(
[parameter(ValueFromPipeline=$true)]
[object] $inputObject
)
process {
$inputObject
}
end {
wh in y end
}
}
1..5| x | y | measure -Sum
PowerShell code to retrieve PipelineProcessor value through reflection:
$t_cmdRun = $pscmdlet.CommandRuntime.gettype()
# Get pipelineprocessor value ($pipor)
$bindFlags = [Reflection.BindingFlags]"NonPublic,Instance"
$piporProp = $t_cmdRun.getproperty("PipelineProcessor", $bindFlags )
$pipor=$piporProp.GetValue($PSCmdlet.CommandRuntime,$null)
Powershell code to invoke method through reflection:
$proc = (gps)[12] # semi-random process
$methinfo = $proc.gettype().getmethod("GetComIUnknown", $bindFlags)
# Return ComIUnknown as an IntPtr
$comIUnknown = $methinfo.Invoke($proc, #($true))
I know that throwing a PipelineStoppedException stops the pipeline. The following example will simulate what you see with Select -first 1 in v3.0, in v2.0:
filter Select-Improved($first) {
begin{
$count = 0
}
process{
$_
$count++
if($count -ge $first){throw (new-object System.Management.Automation.PipelineStoppedException)}
}
}
trap{continue}
1..3| foreach { Write-Host "Value : $_"; $_ }| Select-Improved -first 1
write-host "after"

Powershell to Validate Email addresses

I'm trying to get Powershell to validate email addresses using Regex and put email addresses into good and bad csv files. I can get it to skip one line and write to file, but cannot get it to target the email addresses and validate them, then write lines to good and bad files. I can do it in C# and JavaScript, but have never done it in Powershell. I know this can be done, but not sure how.
Here is what I have so far:
Function IsValidEmail {
Param ([string] $In)
# Returns true if In is in valid e-mail format.
[system.Text.RegularExpressions.Regex]::IsMatch($In,
"^([\w-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|
(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$");
}
## Now we need to check the original file for invalid and valid emails.**
$list = Get-Content C:\Emails\OriginalEmails\emailAddresses.csv
# This way we also use the foreach loop.
##======= Test to see if the file exists ===========
if (!(Test-Path "C:\Emails\ValidEmails\ValidEmails.csv")) {
New-Item -path C:\Emails\ValidEmails -name ValidEmails.csv -type
"file" # -value "my new text"
Write-Host "Created new file and text content added"
}
else {
## Add-Content -path C:\Share\sample.txt -value "new text content"
Write-Host "File already exists and new text content added"
}
if (!(Test-Path "C:\Emails\InValidEmails\InValidEmails.csv")) {
New-Item -path C:\Emails\InValidEmails -name InValidEmails.csv -type
"file" # -value "my new text"
Write-Host "Created new file and text content added"
}
else {
# Add-Content -path C:\Emails\ValidEmails -value "new text content"
Write-Host "File already exists and new text content added"
}
#$Addresses = Import-Csv "C:\Data\Addresses.csv" -Header
Name, Address, PhoneNumber | Select -Skip 1
$EmailAddressImp = Import-Csv
"C:\Emails\OriginalEmails\emailAddresses.csv" -Header
FirstName, LastName, Email, Address, City, State, ZipCode | Select
FirstName, LastName, Email, Address, City, State, ZipCode -Skip 1
I'm validating the third column "Email" in the original csv file and trying to write out the whole row to file (good file, bad file). Not sure how to buffer either doing this.
ForEach ($emailAddress in $list) {
if (IsValidEmail($emailAddress)) {
"Valid: {0}" -f $emailAddress
Out-File -Append C:\Emails\ValidEmails\ValidEmails.csv -Encoding UTF8
$EmailAddressImp | Export-Csv "C:\Emails\ValidEmails\ValidEmails.csv"
-NoTypeInformation
}
else {
"Invalid: {0}" -f $emailAddress
Out-File -Append C:\Emails\InValidEmails\InValidEmails.csv -
Encoding UTF8
$EmailAddressImp | Export-Csv
"C:\Emails\InValidEmails\InValidEmails.csv" -NoTypeInformation
}
}
I'm trying to get Powershell to validate email addresses using Regex
Don't!
I would recommend against this. Accurately validating email addresses using regular expressions can be much more difficult than you might think.
Let's have a look at your regex pattern:
^([\w-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
In it's current form it incorrectly validates .#domain.tld.
On the other hand, it doesn't validate unicode-encoded internationalized domain names, like user#☎.com (yes, that's a valid email address)
Instead of trying to find or construct a perfect email validation regex pattern, I would use the MailAddress class for validation instead:
function IsValidEmail {
param([string]$EmailAddress)
try {
$null = [mailaddress]$EmailAddress
return $true
}
catch {
return $false
}
}
If the input string is a valid email address, the cast to [mailaddress] will succeed and the function return $true - if not, the cast will result in an exception, and it returns $false.
When exporting the data, I'd consider collecting all the results at once in memory and then writing it to file once, at the end.
If you're using PowerShell version 2 or 3, you can do the same with two passes of Where-Object:
$EmailAddresses = Import-Csv "C:\Emails\OriginalEmails\emailAddresses.csv" -Header FirstName, LastName, Email, Address, City, State, ZipCode | Select -Skip 1
$valid = $list |Where-Object {IsValidEmail $_.Email}
$invalid = $list |Where-Object {-not(IsValidEmail $_.Email)}
If you're using PowerShell version 4.0 or newer, I'd suggest using the .Where() extension method in Split mode:
$EmailAddresses = Import-Csv "C:\Emails\OriginalEmails\emailAddresses.csv" -Header FirstName, LastName, Email, Address, City, State, ZipCode | Select -Skip 1
$valid,$invalid = $list.Where({IsValidEmail $_.Email}, 'Split')
before exporting to file:
if($valid.Count -gt 0){
$valid |Export-Csv "C:\Emails\ValidEmails\ValidEmails.csv" -NoTypeInformation
}
if($invalid.Count -gt 0){
$invalid |Export-Csv "C:\Emails\ValidEmails\InvalidEmails.csv" -NoTypeInformation
}
You can just use the -match operator, instead of calling into the [Regex] class. Here's a simple example, without any wrapper function:
$EmailRegex = '^([\w-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$'
$EmailList = #('a#a.com', 'b#b.co', 'm.a#example.il')
foreach ($Email in $EmailList) {
$DidItMatch = $Email -match $EmailRegex
if ($DidItMatch) {
# It matched! Do something.
}
else {
# It didn't match
}
}
FYI, when you use the -match operator, if it returns boolean $true, then PowerShell automatically populates a built-in (aka. "automatic") variable called $matches. To avoid unexpected behavior, you might want to reset this variable to $null during each iteration, or just wrap it in a function as you did in your original example. This will keep the variable scoped to the function level, as long as you don't declare it in one of the parent scopes.
Once you've validated the e-mail address, you can append it to your existing CSV file, using:
Export-Csv -Append -FilePath filepath.csv -InputObject $Email
For efficiency with the available filesystem resources, you'll probably want to buffer a few e-mail addresses in memory, before appending them to your target CSV file.
# Initialize a couple array buffers
$ValidEmails = #()
$InvalidEmails = #()
if ($ValidEmails.Count -gt 50) {
# Run the CSV export here
}
if ($Invalid.Count -gt $50) {
# Run the CSV export here
}
If you need further help, can you please edit your question and clarify what isn't working for you?
Each of the current top 2 answers here has one significant deficiency:
#Trevor's answer would do just fine, until you supply it this:
John Doe <johndoe#somewhere.com>
#Mathias' answer preaches about accommodating exceptional (yet valid) addresses such as those with non-ASCII or no TLD suffix. The following addresses all validate successfully with the [mailaddress] casting:
olly#somewhere | olly#somewhere. | olly#somewhere...com etc
If, like me, you will not be entertaining these edge cases into your email databases, then a combination of both ideas might prove more useful, like so:
function IsValidEmail {
param([string]$Email)
$Regex = '^([\w-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$'
try {
$obj = [mailaddress]$Email
if($obj.Address -match $Regex){
return $True
}
return $False
}
catch {
return $False
}
}
Perhaps there is a performance overhead with creating $obj for every email address on a possibly long mailing list. But I guess that's another matter.
You can use the mailaddress type to ensure it meets RFC, but you will likely still want to make sure the domain is valid:
Resolve-DnsName -Name ('vertigoray#example.com' -as [mailaddress]).Host -Type 'MX'
Works well as a validation script for a function parameter:
function Assert-FromEmail {
param(
[Parameter(Mandatory = $true)]
[ValidateScript({ Resolve-DnsName -Name $_.Host -Type 'MX' })]
[mailaddress]
$From
)
Write-Output $From
}
Output examples of that function on success:
PS > Assert-FromEmail -From vertigoray#example.com
DisplayName User Host Address
----------- ---- ---- -------
vertigoray example.com vertigoray#example.com
Output examples of that function on failure:
PS > Assert-FromEmail -From vertigoray#example..com
Assert-FromEmail : Cannot validate argument on parameter 'From'. The " Resolve-DnsName -Name $_.Host -Type 'MX' "validation script for the argument with value "vertigoray#example..com" did not return a result of True. Determine why the validation script failed, and then try the command again.
At line:1 char:24
+ Assert-FromEmail -From vertigoray#example..com
+ ~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Assert-FromEmail], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Assert-FromEmail
Here is one to try I wrote up and tested and has not failed me in any environment to date. Not, saying it won't in someone else's, but for me, it's been 100%.
$SomeEmailAddresses = #'
From:JoeBob#yahoo.com,Tom TheCat tcat#snailmail.net,jerry#snailmail.net
To:TulaJane#hotmail.com;JF#gmail.com;tiger#outlook.com;
Doug Tompson DTompson#icloud.com
MailTo:BobsYourUncle#protonmail.com;
johnny.bravo#yahoo.co.uk
'#
(((Select-String -InputObject $SomeEmailAddresses `
-Pattern '\w+#\w+\.\w+|\w+\.\w+#\w+\.\w+\.\w+' `
-AllMatches).Matches).Value)
Rsults
JoeBob#yahoo.com
tcat#snailmail.net
jerry#snailmail.net
TulaJane#hotmail.com
JF#gmail.com
tiger#outlook.com
DTompson#icloud.com
BobsYourUncle#protonmail.com
johnny.bravo#yahoo.co.uk
#postanote
This common email formatting fails
$SomeEmailAddresses = #'
First A. Last first.a.last#gmail.com.
'#
(((Select-String -InputObject $SomeEmailAddresses -Pattern '\w+#\w+\.\w+|\w+\.\w+#\w+\.\w+\.\w+'
-AllMatches).Matches).Value)
Here is the code I use.
The regex does not support the following because the major email players do not support.
Domains as IP addresses.
Space and special characters "(),:;<>#[] inside a quoted string in local-part.
Comments within parentheses in local-part.
$email = "^(?(?=^(?:([a-zA-Z0-9_!#$%&'+-/=?^{|}~]+|[a-zA-Z0-9_!#$%&'*+\-\/=?^{|}~].[a-zA-Z0-9_!#$%&'+-/=?^{|}~][\.a-zA-Z0-9_!#$%&'*+\-\/=?^{|}~]))#[a-zA-Z0-9.-]{1,63}$)[a-zA-Z0-9_.!#$%&'*+-/=?^`{|}~]{1,63}#[a-zA-Z0-9-]+(?:.[a-zA-Z0-9-]{2,})+)$"
$email -match $regexPattern