Remove duplicate values from hashtables - powershell

I am new with powershell and i need to remove duplicate values from my hashtable, eg "c" has a duplicate 3 - "3,4,3" and "d" has duplicate 1 - "1,1,6,4".
[hashtable]$alpha = #{
"a" = "1";
"b" = "2,1,5";
"c" = "3,4,3";
"d" = "1,1,6,4";
"e" = "1,7,9,0";
}
How can I get the result?
I have already tried select-object -unique but doesn't work.

In order to use Select-Object -Unique, your values must be in a collection (array), not inside a single string.
Thus, you must first split the string into an array of tokens (-split operator), and afterwards reassemble the distinct tokens into a string (-join operator).
foreach ($key in #($alpha.Keys)) {
$alpha.$key = ($alpha.$key -split ',' | Select-Object -Unique) -join ','
}
Note the #(...) around $alpha.Keys, which effectively clones the keys collection, which is necessary because you'd otherwise get an error due to trying to modify the collection while it is being enumerated, which is not supported.
In PSv3+ you could use $alpha.Keys.Clone() instead, which is more efficient.
You could also use the pipeline with a call to the ForEach-Object (%) cmdlet (rather than an expression with the foreach loop), but for in-memory data structures a foreach loop is generally the better and faster choice.
For the sake of completeness, here's the pipeline solution (tip of the that to WayneA's answer for refining it):
#($alpha.Keys) | ForEach-Object {
$alpha[$_] = ($alpha[$_.] -split ',' | Select-Object -Unique) -join ','
}
Note that #(...) is still needed around $alpha.Keys, as in the foreach statement solution.
As in the foreach solution, this modifies the hashtable in place.
Another option is to use $alpha.GetEnumerator() in order to force PowerShell to send the hashtable's entries one by one through the pipeline, as [System.Collections.DictionaryEntry] instances representing key-value pairs with a .Key and a .Value property - by default, PowerShell outputs hashtables as a whole.
However, that necessitates creating a new hashtable, because you fundamentally cannot modify a collection being enumerated with .GetEnumerator() (as is also implicitly used by a foreach loop).
$newAlpha = #{} # initialize the new hash table
$alpha.GetEnumerator() | ForEach-Object {
$newAlpha[$_.Key] = ($alpha[$_.Key] -split ',' | Select-Object -Unique) -join ','
}
# $newAlpha now contains the updated entries.

As mentioned, Those are strings, split them first and rejoin after:
$NewAlpha = $Alpha.GetEnumerator() |% { $_.value = ($_.value -split "," | select -unique) -join "," ; $_}
Note: this does not preserve the [hashtable] type.
To do that you will need to employ the approach mentioned in the answer provided by mklement0
#($alpha.Keys) |% {$alpha.$_ = ($alpha.$_ -split ',' | Select-Object -Unique) -join ','}

Related

Powershell: Import-csv, rename all headers

In our company there are many users and many applications with restricted access and database with evidence of those accessess. I don´t have access to that database, but what I do have is automatically generated (once a day) csv file with all accessess of all my users. I want them to have a chance to check their access situation so i am writing a simple powershell script for this purpose.
CSV:
user;database1_dat;database2_dat;database3_dat
john;0;0;1
peter;1;0;1
I can do:
import-csv foo.csv | where {$_.user -eq $user}
But this will show me original ugly headres (with "_dat" suffix). Can I delete last four characters from every header which ends with "_dat", when i can´t predict how many headers will be there tomorrow?
I am aware of calculated property like:
Select-Object #{ expression={$_.database1_dat}; label='database1' }
but i have to know all column names for that, as far as I know.
Am I convicted to "overingeneer" it by separate function and build whole "calculated property expression" from scratch dynamically or is there a simple way i am missing?
Thanks :-)
Assuming that file foo.csv fits into memory as a whole, the following solution performs well:
If you need a memory-throttled - but invariably much slower - solution, see Santiago Squarzon's helpful answer or the alternative approach in the bottom section.
$headerRow, $dataRows = (Get-Content -Raw foo.csv) -split '\r?\n', 2
# You can pipe the result to `where {$_.user -eq $user}`
ConvertFrom-Csv ($headerRow -replace '_dat(?=;|$)'), $dataRows -Delimiter ';'
Get-Content -Raw reads the entire file into memory, which is much faster than reading it line by line (the default).
-split '\r?\n', 2 splits the resulting multi-line string into two: the header line and all remaining lines.
Regex \r?\n matches a newline (both a CRLF (\r\n) and a LF-only newline (\n))
, 2 limits the number of tokens to return to 2, meaning that splitting stops once the 1st token (the header row) has been found, and the remainder of the input string (comprising all data rows) is returned as-is as the last token.
Note the $null as the first target variable in the multi-assignment, which is used to discard the empty token that results from the separator regex matching at the very start of the string.
$headerRow -replace '_dat(?=;|$)'
-replace '_dat(?=;|$)' uses a regex to remove any _dat column-name suffixes (followed by a ; or the end of the string); if substring _dat only ever occurs as a name suffix (not also inside names), you can simplify to -replace '_dat'
ConvertFrom-Csv directly accepts arrays of strings, so the cleaned-up header row and the string with all data rows can be passed as-is.
Alternative solution: algorithmic renaming of an object's properties:
Note: This solution is slow, but may be an option if you only extract a few objects from the CSV file.
As you note in the question, use of Select-Object with calculated properties is not an option in your case, because you neither know the column names nor their number in advance.
However, you can use a ForEach-Object command in which you use .psobject.Properties, an intrinsic member, for reflection on the input objects:
Import-Csv -Delimiter ';' foo.csv | where { $_.user -eq $user } | ForEach-Object {
# Initialize an aux. ordered hashtable to store the renamed
# property name-value pairs.
$renamedProperties = [ordered] #{}
# Process all properties of the input object and
# add them with cleaned-up names to the hashtable.
foreach ($prop in $_.psobject.Properties) {
$renamedProperties[($prop.Name -replace '_dat(?=.|$)')] = $prop.Value
}
# Convert the aux. hashtable to a custom object and output it.
[pscustomobject] $renamedProperties
}
You can do something like this:
$textInfo = (Get-Culture).TextInfo
$headers = (Get-Content .\test.csv | Select-Object -First 1).Split(';') |
ForEach-Object {
$textInfo.ToTitleCase($_) -replace '_dat'
}
$user = 'peter'
Get-Content .\test.csv | Select-Object -Skip 1 |
ConvertFrom-Csv -Delimiter ';' -Header $headers |
Where-Object User -EQ $user
User Database1 Database2 Database3
---- --------- --------- ---------
peter 1 0 1
Not super efficient but does the trick.

PowerShell - Convert Property Names from Pascal Case to Upper Case With Underscores

Let's say I have an object like this:
$test = #{
ThisIsTheFirstColumn = "ValueInFirstColumn";
ThisIsTheSecondColumn = "ValueInSecondColumn"
}
and I want to end up with:
$test = #{
THIS_IS_THE_FIRST_COLUMN = "ValueInFirstColumn";
THIS_IS_THE_SECOND_COLUMN = "ValueInSecondColumn"
}
without manually coding the new column names.
This shows me the values I want:
$test.PsObject.Properties | where-object { $_.Name -eq "Keys" } | select -expand value | foreach{ ($_.substring(0,1).toupper() + $_.substring(1) -creplace '[^\p{Ll}\s]', '_$&').Trim("_").ToUpper()} | Out-Host
which results in:
THIS_IS_THE_FIRST_COLUMN
THIS_IS_THE_SECOND_COLUMN
but now I can't seem to figure out how to assign these new values back to the object.
You can modify hashtable $test in place as follows:
foreach($key in #($test.Keys)) { # !! #(...) is required - see below.
$value = $test[$key] # save value
$test.Remove($key) # remove old entry
# Recreate the entry with the transformed name.
$test[($key -creplace '(?<!^)\p{Lu}', '_$&').ToUpper()] = $value
}
#($test.Keys) creates an array from the existing hashtable keys; #(...) ensures that the key collection is copied to a static array, because using the .Keys property directly in a loop that modifies the same hashtable would break.
The loop body saves the value for the input key at hand and then removes the entry under its old name.[1]
The entry is then recreated under its new key name using the desired name transformation:
$key -creplace '(?<!^)\p{Lu} matches every uppercase letter (\p{Lu}) in a given key, except at the start of the string ((?<!^)), and replaces it with _ followed by that letter (_$&); converting the result to uppercase (.ToUpper()) yields the desired name.
[1] Removing the old entry before adding the renamed one avoids problems with single-word names such as Simplest, whose transformed name, SIMPLEST, is considered the same name due to the case-insensitivity of hasthables in PowerShell. Thus, assigning a value to entry SIMPLEST while entry Simplest still exists actually targets the existing entry, and the subsequent $test.Remove($key) would then simply remove that entry, without having added a new one.
Tip of the hat to JosefZ for pointing out the problem.
I wonder if it is possible to do it in place on the original object?
($test.PsObject.Properties|Where-Object {$_.Name -eq "Keys"}).IsSettable says False. Hence, you need do it in two steps as follows:
$test = #{
ThisIsTheFirstColumn = "ValueInFirstColumn";
ThisIsTheSecondColumn = "ValueInSecondColumn"
}
$auxarr = $test.PsObject.Properties |
Where-Object { $_.Name -eq "Keys" } |
select -ExpandProperty value
$auxarr | ForEach-Object {
$aux = ($_.substring(0,1).toupper() +
$_.substring(1) -creplace '[^\p{Ll}\s]', '_$&').Trim("_").ToUpper()
$test.ADD( $aux, $test.$_)
$test.Remove( $_)
}
$test
Two-step approach is necessary as an attempt to perform REMOVE and ADD methods in the only pipeline leads to the following error:
select : Collection was modified; enumeration operation may not execute.
Edit. Unfortunately, the above solution would fail in case of an one-word Pascal Case key, e.g. for Simplest = "ValueInSimplest". Here's the improved script:
$test = #{
ThisIsTheFirstColumn = "ValueInFirstColumn";
ThisIsTheSecondColumn = "ValueInSecondColumn"
Simplest = "ValueInSimplest" # the simplest (one word) PascalCase
}
$auxarr = $test.PsObject.Properties |
Where-Object { $_.Name -eq "Keys" } |
select -ExpandProperty value
$auxarr | ForEach-Object {
$aux = ($_.substring(0,1).toupper() +
$_.substring(1) -creplace '[^\p{Ll}\s]', '_$&').Trim("_").ToUpper()
$newvalue = $test.$_
$test.Remove( $_)
$test.Add( $aux, $newvalue)
}
$test
This seems to work. I ended up putting stuff in a new hashtable, though.
$test = #{
ThisIsTheFirstColumn = "ValueInFirstColumn";
ThisIsTheSecondColumn = "ValueInSecondColumn"
}
$test2=#{}
$test.PsObject.Properties |
where-object { $_.Name -eq "Keys" } |
select -expand value | foreach{ $originalPropertyName=$_
$prop=($_.substring(0,1).toupper() + $_.substring(1) -creplace '[^\p{Ll}\s]', '_$&').Trim("_").ToUpper()
$test2.Add($prop,$test[$originalPropertyName])
}
$test2

Count unique numbers in CSV (PowerShell or Notepad++)

How to find the count of unique numbers in a CSV file? When I use the following command in PowerShell ISE
1,2,3,4,2 | Sort-Object | Get-Unique
I can get the unique numbers but I'm not able to get this to work with CSV files. If for example I use
$A = Import-Csv C:\test.csv | Sort-Object | Get-Unique
$A.Count
it returns 0. I would like to count unique numbers for all the files in a given folder.
My data looks similar to this:
Col1,Col2,Col3,Col4
5,,7,4
0,,9,
3,,5,4
And the result should be 6 unique values (preferably written inside the same CSV file).
Or would it be easier to do it with Notepad++? So far I have found examples only on how to count the unique rows.
You can try the following (PSv3+):
PS> (Import-CSV C:\test.csv |
ForEach-Object { $_.psobject.properties.value -ne '' } |
Sort-Object -Unique).Count
6
The key is to extract all property (column) values from each input object (CSV row), which is what $_.psobject.properties.value does;
-ne '' filters out empty values.
Note that, given that Sort-Object has a -Unique switch, you don't need Get-Unique (you need Get-Unique only if your input already is sorted).
That said, if your CSV file is structured as simply as yours, you can speed up processing by reading it as a text file (PSv2+):
PS> (Get-Content C:\test.csv | Select-Object -Skip 1 |
ForEach-Object { $_ -split ',' -ne '' } |
Sort-Object -Unique).Count
6
Get-Content reads the CSV file as a line of strings.
Select-Object -Skip 1 skips the header line.
$_ -split ',' -ne '' splits each line into values by commas and weeds out empty values.
As for what you tried:
Import-CSV C:\test.csv | Sort-Object | Get-Unique:
Fundamentally, Sort-Object emits the input objects as a whole (just in sorted order), it doesn't extract property values, yet that is what you need.
Because no -Property argument is passed to Sort-Object to base the sorting on, it compares the custom objects that Import-Csv emits as a whole, by their .ToString() values, which happen to be empty[1]
, so they all compare the same, and in effect no sorting happens.
Similarly, Get-Unique also determines uniqueness by .ToString() here, so that, again, all objects are considered the same and only the very first one is output.
[1] This may be surprising, given that using a custom object in an expandable string does yield a value: compare $obj = [pscustomobject] #{ foo ='bar' }; $obj.ToString(); '---'; "$obj". This inconsistency is discussed in this GitHub issue.

What's the best way in PowerShell to parse these strings?

I'm getting two string passed into my script:
"Project1,Project2,Project3,Project4"
"web,batch,web,components"
The strings come from a tool in our DevOps toolchain and I have no control over the input format. String 1 could be any number of projects. String 2 will be the same number of entries with the "type" of the project in string 1.
I need to emit one string for each distinct type in the second string that contains the projects from the first string:
"Project1,Project3"
"Project2"
"Project4"
I know I can do it with a bunch of nested foreach loops. Is there a way to do this with a hashtable and/or arrays?
You can turn the original input strings into arrays with the -split operator:
$ProjectNames = "Project1,Project2,Project3,Project4" -split ','
$ProjectTypes = "web,batch,web,components" -split ','
Then create an empty hash table to contain the type-to-projectname mappings
$ProjectsByType = #{}
Finally iterate over the two arrays to group the project names by type:
for($i = 0; $i -lt $ProjectNames.Count; $i++){
if(-not $ProjectsByType.ContainsKey($ProjectTypes[$i])){
# Create key and entry as array if it doesn't already exist
$ProjectsByType[$ProjectTypes[$i]] = #()
}
# Add the project to the appropriate project type key
$ProjectsByType[$ProjectTypes[$i]] += $ProjectNames[$i]
}
Now you can produce your desired strings grouped by project type:
$ProjectsByType.Keys |ForEach-Object {
$ProjectsByType[$_] -join ','
}
You could also create objects from the two arrays and use Group-Object to group them:
$Projects = for($i = 0; $i -lt $ProjectNames.Count; $i++){
New-Object psobject -Property #{
Name = $ProjectNames[$i]
Type = $ProjectTypes[$i]
}
}
$Projects |Group-Object -Property Type
This is more interesting if you want to do further processing of the projects, if you just need the strings the first approach is easier
There isn't really an elegant way of combining two arrays that way with built-in methods. A somewhat convoluted way would be the following:
$projects = $projectString -split ','
$types = $typeString -split ','
0..($projects.Count) | group { $types[$_] } | % { $projects[$_.Group] -join ',' }
However, this first generates indices into the arrays to group and format them later, which is inherently a bit iffy (and not very understandable). I tend to pre-process the data to actually reflect what I'm operating on:
$projects = $projectString -split ','
$types = $typeString -split ','
$projectsWithType = 0..($projects.Count) | % {
[pscustomobject]#{
Project = $projects[$_]
Type = $types[$_]
}
}
$projectsWithType | group Type | % { $_.Group -join ',' }
This makes the actual data munging task much clearer.
with only one search in first list
$projects = "Project1,Project2,Project3,Project4" -split ','
$types = "web,batch,web,components" -split ','
$linenumber = 0
$projects |%{New-Object psObject -Property #{Project=$_;TypeProject= $types[$linenumber]};$linenumber++} |
group TypeProject |
select Name, #{N="Projects";E={$_.Group.Project -join ","}}

How to join array in pipe

I wish to join the result from a pipe.
I tried using -join
PS> type .\bleh.log | where { $_ -match "foo"} | select -uniq | $_ -join ','
But that give me this error :/
Expressions are only allowed as the first element of a pipeline.
You could try this :
#(type .\bleh.log | where { $_ -match "foo"} | select -uniq) -join ","
You would need a Foreach-Object (alias %) after the last pipe to have the $_ variable available but it wouldn't help since it holds a single cell value (for each loop iteration).