What's the best way in PowerShell to parse these strings?

What's the best way in PowerShell to parse these strings? - powershell

I'm getting two string passed into my script:
"Project1,Project2,Project3,Project4"
"web,batch,web,components"
The strings come from a tool in our DevOps toolchain and I have no control over the input format. String 1 could be any number of projects. String 2 will be the same number of entries with the "type" of the project in string 1.
I need to emit one string for each distinct type in the second string that contains the projects from the first string:
"Project1,Project3"
"Project2"
"Project4"
I know I can do it with a bunch of nested foreach loops. Is there a way to do this with a hashtable and/or arrays?

You can turn the original input strings into arrays with the -split operator:
$ProjectNames = "Project1,Project2,Project3,Project4" -split ','
$ProjectTypes = "web,batch,web,components" -split ','
Then create an empty hash table to contain the type-to-projectname mappings
$ProjectsByType = #{}
Finally iterate over the two arrays to group the project names by type:
for($i = 0; $i -lt $ProjectNames.Count; $i++){
if(-not $ProjectsByType.ContainsKey($ProjectTypes[$i])){
# Create key and entry as array if it doesn't already exist
$ProjectsByType[$ProjectTypes[$i]] = #()
}
# Add the project to the appropriate project type key
$ProjectsByType[$ProjectTypes[$i]] += $ProjectNames[$i]
}
Now you can produce your desired strings grouped by project type:
$ProjectsByType.Keys |ForEach-Object {
$ProjectsByType[$_] -join ','
}
You could also create objects from the two arrays and use Group-Object to group them:
$Projects = for($i = 0; $i -lt $ProjectNames.Count; $i++){
New-Object psobject -Property #{
Name = $ProjectNames[$i]
Type = $ProjectTypes[$i]
}
}
$Projects |Group-Object -Property Type
This is more interesting if you want to do further processing of the projects, if you just need the strings the first approach is easier

There isn't really an elegant way of combining two arrays that way with built-in methods. A somewhat convoluted way would be the following:
$projects = $projectString -split ','
$types = $typeString -split ','
0..($projects.Count) | group { $types[$_] } | % { $projects[$_.Group] -join ',' }
However, this first generates indices into the arrays to group and format them later, which is inherently a bit iffy (and not very understandable). I tend to pre-process the data to actually reflect what I'm operating on:
$projects = $projectString -split ','
$types = $typeString -split ','
$projectsWithType = 0..($projects.Count) | % {
[pscustomobject]#{
Project = $projects[$_]
Type = $types[$_]
}
}
$projectsWithType | group Type | % { $_.Group -join ',' }
This makes the actual data munging task much clearer.

with only one search in first list
$projects = "Project1,Project2,Project3,Project4" -split ','
$types = "web,batch,web,components" -split ','
$linenumber = 0
$projects |%{New-Object psObject -Property #{Project=$_;TypeProject= $types[$linenumber]};$linenumber++} |
group TypeProject |
select Name, #{N="Projects";E={$_.Group.Project -join ","}}

Related

PowerShell: Find unique values from multiple CSV files

let's say that I have several CSV files and I need to check a specific column and find values that exist in one file, but not in any of the others. I'm having a bit of trouble coming up with the best way to go about it as I wanted to use Compare-Object and possibly keep all columns and not just the one that contains the values I'm checking.
So I do indeed have several CSV files and they all have a Service Code column, and I'm trying to create a list for each Service Code that only appears in one file. So I would have "Service Codes only in CSV1", "Service Codes only in CSV2", etc.
Based on some testing and a semi-related question, I've come up with a workable solution, but with all of the nesting and For loops, I'm wondering if there is a more elegant method out there.
Here's what I do have:
$files = Get-ChildItem -LiteralPath "C:\temp\ItemCompare" -Include "*.csv"
$HashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $files.Count; $i++){
$TempHashSet = [System.Collections.Generic.HashSet[String]]::New([String[]](Import-Csv $files[$i])."Service Code")
$HashList.Add($TempHashSet)
}
$FinalHashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $HashList.Count; $i++){
$UniqueHS = [System.Collections.Generic.HashSet[String]]::New($HashList[$i])
For ($j = 0; $j -lt $HashList.Count; $j++){
#Skip the check when the HashSet would be compared to itself
If ($j -eq $i){Continue}
$UniqueHS.ExceptWith($HashList[$j])
}
$FinalHashList.Add($UniqueHS)
}
It seems a bit messy to me using so many different .NET references, and I know I could make it cleaner with a tag to say using namespace System.Collections.Generic, but I'm wondering if there is a way to make it work using Compare-Object which was my first attempt, or even just a simpler/more efficient method to filter each file.

I believe I found an "elegant" solution based on Group-Object, using only a single pipeline:
# Import all CSV files.
Get-ChildItem $PSScriptRoot\csv\*.csv -File -PipelineVariable file | Import-Csv |
# Add new column "FileName" to distinguish the files.
Select-Object *, #{ label = 'FileName'; expression = { $file.Name } } |
# Group by ServiceCode to get a list of files per distinct value.
Group-Object ServiceCode |
# Filter by ServiceCode values that exist only in a single file.
# Sort-Object -Unique takes care of possible duplicates within a single file.
Where-Object { ( $_.Group.FileName | Sort-Object -Unique ).Count -eq 1 } |
# Expand the groups so we get the original object structure back.
ForEach-Object Group |
# Format-Table requires sorting by FileName, for -GroupBy.
Sort-Object FileName |
# Finally pretty-print the result.
Format-Table -Property ServiceCode, Foo -GroupBy FileName
Test Input
a.csv:
ServiceCode,Foo
1,fop
2,fip
3,fap
b.csv:
ServiceCode,Foo
6,bar
6,baz
3,bam
2,bir
4,biz
c.csv:
ServiceCode,Foo
2,bla
5,blu
1,bli
Output
FileName: b.csv
ServiceCode Foo
----------- ---
4 biz
6 bar
6 baz
FileName: c.csv
ServiceCode Foo
----------- ---
5 blu
Looks correct to me. The values 1, 2 and 3 are duplicated between multiple files, so they are excluded. 4, 5 and 6 exist only in single files, while 6 is a duplicate value only within a single file.
Understanding the code
Maybe it is easier to understand how this code works, by looking at the intermediate output of the pipeline produced by the Group-Object line:
Count Name Group
----- ---- -----
2 1 {#{ServiceCode=1; Foo=fop; FileName=a.csv}, #{ServiceCode=1; Foo=bli; FileName=c.csv}}
3 2 {#{ServiceCode=2; Foo=fip; FileName=a.csv}, #{ServiceCode=2; Foo=bir; FileName=b.csv}, #{ServiceCode=2; Foo=bla; FileName=c.csv}}
2 3 {#{ServiceCode=3; Foo=fap; FileName=a.csv}, #{ServiceCode=3; Foo=bam; FileName=b.csv}}
1 4 {#{ServiceCode=4; Foo=biz; FileName=b.csv}}
1 5 {#{ServiceCode=5; Foo=blu; FileName=c.csv}}
2 6 {#{ServiceCode=6; Foo=bar; FileName=b.csv}, #{ServiceCode=6; Foo=baz; FileName=b.csv}}
Here the Name contains the unique ServiceCode values, while Group "links" the data to the files.
From here it should already be clear how to find values that exist only in single files. If duplicate ServiceCode values within a single file wouldn't be allowed, we could even simplify the filter to Where-Object Count -eq 1. Since it was stated that dupes within single files may exist, we need the Sort-Object -Unique to count multiple equal file names within a group as only one.

It is not completely clear what you expect as an output.
If this is just the ServiceCodes that intersect then this is actually a duplicate with:
Comparing two arrays & get the values which are not common
Union and Intersection in PowerShell?
But taking that you actually want the related object and files, you might use this approach:
$HashTable = #{}
ForEach ($File in Get-ChildItem .\*.csv) {
ForEach ($Object in (Import-Csv $File)) {
$HashTable[$Object.ServiceCode] = $Object |Select-Object *,
#{ n='File'; e={ $File.Name } },
#{ n='Count'; e={ $HashTable[$Object.ServiceCode].Count + 1 } }
}
}
$HashTable.Values |Where-Object Count -eq 1

Here is my take on this fun exercise, I'm using a similar approach as yours with the HashSet but adding [System.StringComparer]::OrdinalIgnoreCase to leverage the .Contains(..) method:
using namespace System.Collections.Generic
# Generate Random CSVs:
$charset = 'abABcdCD0123xXyYzZ'
$ran = [random]::new()
$csvs = #{}
foreach($i in 1..50) # Create 50 CSVs for testing
{
$csvs["csv$i"] = foreach($z in 1..50) # With 50 Rows
{
$index = (0..2).ForEach({ $ran.Next($charset.Length) })
[pscustomobject]#{
ServiceCode = [string]::new($charset[$index])
Data = $ran.Next()
}
}
}
# Get Unique 'ServiceCode' per CSV:
$result = #{}
foreach($key in $csvs.Keys)
{
# Get all unique `ServiceCode` from the other CSVs
$tempHash = [HashSet[string]]::new(
[string[]]($csvs[$csvs.Keys -ne $key].ServiceCode),
[System.StringComparer]::OrdinalIgnoreCase
)
# Filter the unique `ServiceCode`
$result[$key] = foreach($line in $csvs[$key])
{
if(-not $tempHash.Contains($line.ServiceCode))
{
$line
}
}
}
# Test if the code worked,
# If something is returned from here means it didn't work
foreach($key in $result.Keys)
{
$tmp = $result[$result.Keys -ne $key].ServiceCode
foreach($val in $result[$key])
{
if($val.ServiceCode -in $tmp)
{
$val
}
}
}

i was able to get unique items as follow
# Get all items of CSVs in a single variable with adding the file name at the last column
$CSVs = Get-ChildItem "C:\temp\ItemCompare\*.csv" | ForEach-Object {
$CSV = Import-CSV -Path $_.FullName
$FileName = $_.Name
$CSV | Select-Object *,#{N='Filename';E={$FileName}}
}
Foreach($line in $CSVs){
$ServiceCode = $line.ServiceCode
$file = $line.Filename
if (!($CSVs | where {$_.ServiceCode -eq $ServiceCode -and $_.filename -ne $file})){
$line
}
}

PowerShell - Convert Property Names from Pascal Case to Upper Case With Underscores

Let's say I have an object like this:
$test = #{
ThisIsTheFirstColumn = "ValueInFirstColumn";
ThisIsTheSecondColumn = "ValueInSecondColumn"
}
and I want to end up with:
$test = #{
THIS_IS_THE_FIRST_COLUMN = "ValueInFirstColumn";
THIS_IS_THE_SECOND_COLUMN = "ValueInSecondColumn"
}
without manually coding the new column names.
This shows me the values I want:
$test.PsObject.Properties | where-object { $_.Name -eq "Keys" } | select -expand value | foreach{ ($_.substring(0,1).toupper() + $_.substring(1) -creplace '[^\p{Ll}\s]', '_$&').Trim("_").ToUpper()} | Out-Host
which results in:
THIS_IS_THE_FIRST_COLUMN
THIS_IS_THE_SECOND_COLUMN
but now I can't seem to figure out how to assign these new values back to the object.

You can modify hashtable $test in place as follows:
foreach($key in #($test.Keys)) { # !! #(...) is required - see below.
$value = $test[$key] # save value
$test.Remove($key) # remove old entry
# Recreate the entry with the transformed name.
$test[($key -creplace '(?<!^)\p{Lu}', '_$&').ToUpper()] = $value
}
#($test.Keys) creates an array from the existing hashtable keys; #(...) ensures that the key collection is copied to a static array, because using the .Keys property directly in a loop that modifies the same hashtable would break.
The loop body saves the value for the input key at hand and then removes the entry under its old name.[1]
The entry is then recreated under its new key name using the desired name transformation:
$key -creplace '(?<!^)\p{Lu} matches every uppercase letter (\p{Lu}) in a given key, except at the start of the string ((?<!^)), and replaces it with _ followed by that letter (_$&); converting the result to uppercase (.ToUpper()) yields the desired name.
[1] Removing the old entry before adding the renamed one avoids problems with single-word names such as Simplest, whose transformed name, SIMPLEST, is considered the same name due to the case-insensitivity of hasthables in PowerShell. Thus, assigning a value to entry SIMPLEST while entry Simplest still exists actually targets the existing entry, and the subsequent $test.Remove($key) would then simply remove that entry, without having added a new one.
Tip of the hat to JosefZ for pointing out the problem.

I wonder if it is possible to do it in place on the original object?
($test.PsObject.Properties|Where-Object {$_.Name -eq "Keys"}).IsSettable says False. Hence, you need do it in two steps as follows:
$test = #{
ThisIsTheFirstColumn = "ValueInFirstColumn";
ThisIsTheSecondColumn = "ValueInSecondColumn"
}
$auxarr = $test.PsObject.Properties |
Where-Object { $_.Name -eq "Keys" } |
select -ExpandProperty value
$auxarr | ForEach-Object {
$aux = ($_.substring(0,1).toupper() +
$_.substring(1) -creplace '[^\p{Ll}\s]', '_$&').Trim("_").ToUpper()
$test.ADD( $aux, $test.$_)
$test.Remove( $_)
}
$test
Two-step approach is necessary as an attempt to perform REMOVE and ADD methods in the only pipeline leads to the following error:
select : Collection was modified; enumeration operation may not execute.
Edit. Unfortunately, the above solution would fail in case of an one-word Pascal Case key, e.g. for Simplest = "ValueInSimplest". Here's the improved script:
$test = #{
ThisIsTheFirstColumn = "ValueInFirstColumn";
ThisIsTheSecondColumn = "ValueInSecondColumn"
Simplest = "ValueInSimplest" # the simplest (one word) PascalCase
}
$auxarr = $test.PsObject.Properties |
Where-Object { $_.Name -eq "Keys" } |
select -ExpandProperty value
$auxarr | ForEach-Object {
$aux = ($_.substring(0,1).toupper() +
$_.substring(1) -creplace '[^\p{Ll}\s]', '_$&').Trim("_").ToUpper()
$newvalue = $test.$_
$test.Remove( $_)
$test.Add( $aux, $newvalue)
}
$test

This seems to work. I ended up putting stuff in a new hashtable, though.
$test = #{
ThisIsTheFirstColumn = "ValueInFirstColumn";
ThisIsTheSecondColumn = "ValueInSecondColumn"
}
$test2=#{}
$test.PsObject.Properties |
where-object { $_.Name -eq "Keys" } |
select -expand value | foreach{ $originalPropertyName=$_
$prop=($_.substring(0,1).toupper() + $_.substring(1) -creplace '[^\p{Ll}\s]', '_$&').Trim("_").ToUpper()
$test2.Add($prop,$test[$originalPropertyName])
}
$test2

Remove duplicate values from hashtables

I am new with powershell and i need to remove duplicate values from my hashtable, eg "c" has a duplicate 3 - "3,4,3" and "d" has duplicate 1 - "1,1,6,4".
[hashtable]$alpha = #{
"a" = "1";
"b" = "2,1,5";
"c" = "3,4,3";
"d" = "1,1,6,4";
"e" = "1,7,9,0";
}
How can I get the result?
I have already tried select-object -unique but doesn't work.

In order to use Select-Object -Unique, your values must be in a collection (array), not inside a single string.
Thus, you must first split the string into an array of tokens (-split operator), and afterwards reassemble the distinct tokens into a string (-join operator).
foreach ($key in #($alpha.Keys)) {
$alpha.$key = ($alpha.$key -split ',' | Select-Object -Unique) -join ','
}
Note the #(...) around $alpha.Keys, which effectively clones the keys collection, which is necessary because you'd otherwise get an error due to trying to modify the collection while it is being enumerated, which is not supported.
In PSv3+ you could use $alpha.Keys.Clone() instead, which is more efficient.
You could also use the pipeline with a call to the ForEach-Object (%) cmdlet (rather than an expression with the foreach loop), but for in-memory data structures a foreach loop is generally the better and faster choice.
For the sake of completeness, here's the pipeline solution (tip of the that to WayneA's answer for refining it):
#($alpha.Keys) | ForEach-Object {
$alpha[$_] = ($alpha[$_.] -split ',' | Select-Object -Unique) -join ','
}
Note that #(...) is still needed around $alpha.Keys, as in the foreach statement solution.
As in the foreach solution, this modifies the hashtable in place.
Another option is to use $alpha.GetEnumerator() in order to force PowerShell to send the hashtable's entries one by one through the pipeline, as [System.Collections.DictionaryEntry] instances representing key-value pairs with a .Key and a .Value property - by default, PowerShell outputs hashtables as a whole.
However, that necessitates creating a new hashtable, because you fundamentally cannot modify a collection being enumerated with .GetEnumerator() (as is also implicitly used by a foreach loop).
$newAlpha = #{} # initialize the new hash table
$alpha.GetEnumerator() | ForEach-Object {
$newAlpha[$_.Key] = ($alpha[$_.Key] -split ',' | Select-Object -Unique) -join ','
}
# $newAlpha now contains the updated entries.

As mentioned, Those are strings, split them first and rejoin after:
$NewAlpha = $Alpha.GetEnumerator() |% { $_.value = ($_.value -split "," | select -unique) -join "," ; $_}
Note: this does not preserve the [hashtable] type.
To do that you will need to employ the approach mentioned in the answer provided by mklement0
#($alpha.Keys) |% {$alpha.$_ = ($alpha.$_ -split ',' | Select-Object -Unique) -join ','}

When a word matches retrieve the varying string after it

I have a query which looks like this:
FROM TableA
INNER JOIN TableB
ON TableA.xx = TableB.xx
INNER JOIN TableC
ON TableA.yy = TableC.yy
I am trying to write a script which selects the tables which come after the word "JOIN".
The script that I wrote now is:
$data = Get-Content -Path query1.txt
$dataconv = "$data".ToLower() -replace '\s+', ' '
$join = 0
$overigetabellen = ($dataconv) | foreach {
if ($_ -match "join (.*)") {
$join++
$join = $matches[1].Split(" ")[0]
#Write-Host "Table(s) on which is joined:" $join"."
$join
}
}
$overigetabellen
This gives me only the first table, so TableB.
Can anyone help me how I get the second table also as output?

Process your data with Select-String:
$data | Select-String -AllMatches -Pattern '(?<=join\s+)\S+' |
Select-Object -Expand Matches |
Select-Object -Expand Groups |
Select-Object -Expand Value
(?<=...) is a so-called positive lookbehind assertion that is used for matching the pattern without being included in the returned string (meaning the returned matches are just the table names without the JOIN before them).

This is my naive attempt to find the desired table names.
Split the data input on whitespace into an array, find the indices of the word "JOIN", and then access the following indices after the word "JOIN."
$data = Get-Content -Path query1.txt
$indices = #()
$output = #()
$dataarray = $data -split '\s+'
$singleIndex = -1
Do{
$singleIndex = [array]::IndexOf($dataarray,"JOIN",$singleIndex + 1)
If($singleIndex -ge 0){$indices += $singleIndex}
}While($singleIndex -ge 0)
foreach ($index in $indices) {
$output += $dataarray[$index + 1]
}
Outputs:
TableB
TableC
You can adjust for capitalization (saw you set your input to all lowercase), etc as needed if you expect varying input files.

Is the Name property of output of Group-Object always string?

The following script can transform(pivot) the array by the third column (x, y). However, it needs to concatenate the first two columns for the group-object command. And then the Name of the output need to be split to get the original values.
It can be error prone if the data has the separator character. And it seems not performance optimized since extra string concatenation/split actions are needed. Is it a more direct way (like SQL group clause) in powershell?
$a =#('a','b','x',10),
#('a','b','y',20),
#('c','e','x',50),
#('c','e','y',30)
# $a | % { "[$_]"}
$a | %{
new-object PsObject -prop #{
label = "$($_[0]),$($_[1])" # Concatenate for grouping
value = #{ $_[2] = $_[3] }
}
} |
group label | % {
$l = #($_.Name -split ",") + # then split to restore
#($_.Group.value.x, $_.Group.value.y)
"[$l]"
}

Yes, the "Name" property of GroupInfo is always a string.
The easiest way to find the distinct values is to sample the first item in each group:
$a |Group-Object -Property {$_[0]},{$_[1]} |ForEach-Object {
$Group = $_.Group
# The first item in each group
$SampleItem = $Group | Select-Object -First 1
# Now we can inspect the key values, $SampleItem[0] and $SampleItem[1]
Write-Host ('This group has {0} and {1} as primary keys:' -f $SampleItem[0..1]) -ForegroundColor Green
$Group |ForEach-Object {
# echo each array in group
Write-Host ($_ -join ' ')
}
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

What's the best way in PowerShell to parse these strings? - powershell

Related

PowerShell: Find unique values from multiple CSV files

PowerShell - Convert Property Names from Pascal Case to Upper Case With Underscores

Remove duplicate values from hashtables

When a word matches retrieve the varying string after it

Is the Name property of output of Group-Object always string?

Categories

Resources