PowerShell CSV comparison - powershell

I am very, very, very new to Powershell. I was wondering if any one an help me with the following script:
The idea is to have two excel spreadsheets.
1.csv
QCODE
PC1009
PC1009
PC1011
PC1012
2.csv
QCODE
PC1009
PC1009
PC1009
PC1012
I am trying to compare values between the two CSV documents. If the value in cell1 in 1.csv is equal to any cell in 2.csv the script must perform a certain action, once the action is finished it must loop over to cell2 in 1.csv and compare it again with all the values in 2.csv
This is about as far as I have managed yo get:
$CSV=Import-Csv C:\1.csv
$COMP=Import-Csv C:\2.csv
$count=0
$cnt=0
while($count -le $CSV.Count)
{
while($validator -eq $false)
{
if($CSV[$count].QCODE -eq $COMP[$cnt].QCODE)
{
Write-Host "Exiting"
$validator=$true
}
else{
$cnt++
}
}
$count++
}
It's a mess, I apologize. Any help would be greatly appreciated

Here is a solution for you. I have created two CSV files with matching headers. The column names are:
Prop1
Prop2
Prop3
Prop4
Prop5
When these lines are imported into PowerShell, it will automatically create a PSObject for each line. The property names on the PSObject will be the column headers. These two CSV files exist in the folder named c:\test.
NOTE: There is a single, mismatching value between the two files, in the dead middle. This will be our test.
The code looks like this. There are some in-line comments to help guide you. Basically, we're dynamically querying all of the property (column) names, getting the value of each one (the cell values), and comparing them. If they do not match, we throw a warning. Based on the single, mismatching "cell" in this example, the output I get is in a screenshot below. It seems to be working quite well in my testing.
NOTE: Even though it says that line #1 is mismatching, and you might think it's line #2, that's because arrays are zero-based. Therefore, in array terminology, #1 is actually #2, because it starts counting at zero.
# Import both CSV files
$Csv1 = Import-Csv -Path C:\test\csv1.csv;
$Csv2 = Import-Csv -Path C:\test\csv2.csv;
# For each line in CSV1 ...
foreach ($Line1 in $Csv1) {
$LineNumber = $Csv1.IndexOf($Line1);
# Get the same line from CSV2
$Line2 = $Csv2[$LineNumber];
# For each property (column) ...
foreach ($Property in (Get-Member -InputObject $Line1 -MemberType NoteProperty)) {
# Get the property's name
$PropertyName = $Property.Name;
# If the value of the property doesn't match each CSV file ..
if ($Line1.$PropertyName -ne $Line2.$PropertyName) {
# Warn the user
Write-Warning -Message ('Value of property {0} did not match for line # {1}' -f $PropertyName, $LineNumber);
# PERFORM SOME CUSTOM ACTION HERE
};
}
}

Related

Check if a condition is met by a line within a TXT but "in an advanced way"

I have a TXT file with 1300 megabytes (huge thing). I want to build code that does two things:
Every line contains a unique ID at the beginning. I want to check for all lines with the same unique ID if the conditions is met for that "group" of IDs. (This answers me: For how many lines with the unique ID X have all conditions been met)
If the script is finished I want to remove all lines from the TXT where the condition was met (see 2). So I can rerun the script with another condition set to "narrow down" the whole document.
After few cycles I finally have a set of conditions that applies to all lines in the document.
It seems that my current approach is very slow.( one cycle needs hours). My final result is a set of conditions that apply to all lines of code.
If you find an easier way to do that, feel free to recommend.
Help is welcome :)
Code so far (does not fullfill everything from 1&2)
foreach ($item in $liste)
{
# Check Conditions
if ( ($item -like "*XXX*") -and ($item -like "*YYY*") -and ($item -notlike "*ZZZ*")) {
# Add a line to a document to see which lines match condition
Add-Content "C:\Desktop\it_seems_to_match.txt" "$item"
# Retrieve the unique ID from the line and feed array.
$array += $item.Split("/")[1]
# Remove the line from final document
$liste = $liste -replace $item, ""
}
}
# Pipe the "new cleaned" list somewhere
$liste | Set-Content -Path "C:\NewListToWorkWith.txt"
# Show me the counts
$array | group | % { $h = #{} } { $h[$_.Name] = $_.Count } { $h } | Out-File "C:\Desktop\count.txt"
Demo Lines:
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
performance considerations:
Add-Content "C:\Desktop\it_seems_to_match.txt" "$item"
try to avoid wrapping cmdlet pipelines
See also: Mastering the (steppable) pipeline
$array += $item.Split("/")[1]
Try to avoid using the increase assignment operator (+=) to create a collection
See also: Why should I avoid using the increase assignment operator (+=) to create a collection
$liste = $liste -replace $item, ""
This is a very expensive operation considering that you are reassigning (copying) a long list ($liste) with each iteration.
Besides it is a bad practice to change an array that you are currently iterating.
$array | group | ...
Group-Object is a rather slow cmdlet, you better collect (or count) the items on-the-fly (where you do $array += $item.Split("/")[1]) using a hashtable, something like:
$Name = $item.Split("/")[1]
if (!$HashTable.Contains($Name)) { $HashTable[$Name] = [Collections.Generic.List[String]]::new() }
$HashTable[$Name].Add($Item)
To minimize memory usage it may be better to read one line at a time and check if it already exists. Below code I used StringReader and you can replace with StreamReader for reading from a file. I'm checking if the entire string exists, but you may want to split the line. Notice I have duplicaes in the input but not in the dictionary. See code below :
$rows= #"
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/2XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGA/3XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/4XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGB/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
images/STRINGC/5XXXXXXXX_rTTTTw_GGGG1_Top_MMM1_YY02_ZZZ30_AAAA5.jpg
"#
$dict = [System.Collections.Generic.Dictionary[int, System.Collections.Generic.List[string]]]::new();
$reader = [System.IO.StringReader]::new($rows)
while(($row = $reader.ReadLine()) -ne $null)
{
$hash = $row.GetHashCode()
if($dict.ContainsKey($hash))
{
#check if list contains the string
if($dict[$hash].Contains($row))
{
#string is a duplicate
}
else
{
#add string to dictionary value if it is not in list
$list = $dict[$hash].Value
$list.Add($row)
}
}
else
{
#add new hash value to dictionary
$list = [System.Collections.Generic.List[string]]::new();
$list.Add($row)
$dict.Add($hash, $list)
}
}
$dict

Powershell: storing variables to a file [duplicate]

I would like to write out a hash table to a file with an array as one of the hash table items. My array item is written out, but it contains files=System.Object[]
Note - Once this works, I will want to reverse the process and read the hash table back in again.
clear-host
$resumeFile="c:\users\paul\resume.log"
$files = Get-ChildItem *.txt
$files.GetType()
write-host
$types="txt"
$in="c:\users\paul"
Remove-Item $resumeFile -ErrorAction SilentlyContinue
$resumeParms=#{}
$resumeParms['types']=$types
$resumeParms['in']=($in)
$resumeParms['files']=($files)
$resumeParms.GetEnumerator() | ForEach-Object {"{0}={1}" -f $_.Name,$_.Value} | Set-Content $resumeFile
write-host "Contents of $resumefile"
get-content $resumeFile
Results
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True True Object[] System.Array
Contents of c:\users\paul\resume.log
files=System.Object[]
types=txt
in=c:\users\paul
The immediate fix is to create your own array representation, by enumerating the elements and separating them with ,, enclosing string values in '...':
# Sample input hashtable. [ordered] preserves the entry order.
$resumeParms = [ordered] #{ foo = 42; bar = 'baz'; arr = (Get-ChildItem *.txt) }
$resumeParms.GetEnumerator() |
ForEach-Object {
"{0}={1}" -f $_.Name, (
$_.Value.ForEach({
(("'{0}'" -f ($_ -replace "'", "''")), $_)[$_.GetType().IsPrimitive]
}) -join ','
)
}
Not that this represents all non-primitive .NET types as strings, by their .ToString() representation, which may or may not be good enough.
The above outputs something like:
foo=42
bar='baz'
arr='C:\Users\jdoe\file1.txt','C:\Users\jdoe\file2.txt','C:\Users\jdoe\file3.txt'
See the bottom section for a variation that creates a *.psd1 file that can later be read back into a hashtable instance with Import-PowerShellDataFile.
Alternatives for saving settings / configuration data in text files:
If you don't mind taking on a dependency on a third-party module:
Consider using the PSIni module, which uses the Windows initialization file (*.ini) file format; see this answer for a usage example.
Adding support for initialization files to PowerShell itself (not present as of 7.0) is being proposed in GitHub issue #9035.
Consider using YAML as the file format; e.g., via the FXPSYaml module.
Adding support for YAML files to PowerShell itself (not present as of 7.0) is being proposed in GitHub issue #3607.
The Configuration module provides commands to write to and read from *.psd1 files, based on persisted PowerShell hashtable literals, as you would declare them in source code.
Alternatively, you could modify the output format in the code at the top to produce such files yourself, which allows you to read them back in via
Import-PowerShellDataFile, as shown in the bottom section.
As of PowerShell 7.0 there's no built-in support for writing such as representation; that is, there is no complementary Export-PowerShellDataFile cmdlet.
However, adding this ability is being proposed in GitHub issue #11300.
If creating a (mostly) plain-text file is not a must:
The solution that provides the most flexibility with respect to the data types it supports is the XML-based CLIXML format that Export-Clixml creates, as Lee Dailey suggests, whose output can later be read with Import-Clixml.
However, this format too has limitations with respect to type fidelity, as explained in this answer.
Saving a JSON representation of the data, as Lee also suggests, via ConvertTo-Json / ConvertFrom-Json, is another option, which makes for human-friendlier output than XML, but is still not as friendly as a plain-text representation; notably, all \ chars. in file paths must be escaped as \\ in JSON.
Writing a *.psd1 file that can be read with Import-PowerShellDataFile
Within the stated constraints regarding data types - in essence, anything that isn't a number or a string becomes a string - it is fairly easy to modify the code at the top to write a PowerShell hashtable-literal representation to a *.psd1 file so that it can be read back in as a [hashtable] instance via Import-PowerShellDataFile:
As noted, if you don't mind installing a module, consider the Configuration module, which has this functionality built int.
# Sample input hashtable.
$resumeParms = [ordered] #{ foo = 42; bar = 'baz'; arr = (Get-ChildItem *.txt) }
# Create a hashtable-literal representation and save it to file settings.psd1
#"
#{
$(
($resumeParms.GetEnumerator() |
ForEach-Object {
" {0}={1}" -f $_.Name, (
$_.Value.ForEach({
(("'{0}'" -f ($_ -replace "'", "''")), $_)[$_.GetType().IsPrimitive]
}) -join ','
)
}
) -join "`n"
)
}
"# > settings.psd1
If you read settings.psd1 with Import-PowerShellDataFile settings.psd1 later, you'll get a [hashtable] instance whose entries you an access as usual and which produces the following display output:
Name Value
---- -----
bar baz
arr {C:\Users\jdoe\file1.txt, C:\Users\jdoe\file1.txt, C:\Users\jdoe\file1.txt}
foo 42
Note how the order of entries (keys) was not preserved, because hashtable entries are inherently unordered.
On writing the *.psd1 file you can preserve the key(-creation) order by declaring the input hashtable (System.Collections.Hashtable) as [ordered], as shown above (which creates a System.Collections.Specialized.OrderedDictionary instance), but the order is, unfortunately, lost on reading the *.psd1 file.
As of PowerShell 7.0, even if you place [ordered] before the opening #{ in the *.psd1 file, Import-PowerShellDataFile quietly ignores it and creates an unordered hashtable nonetheless.
This is a problem I deal with all the time and it drives me mad. I really think that there should be a function specifically for this action... so I wrote one.
function ConvertHashTo-CSV
{
Param (
[Parameter(Mandatory=$true)]
$hashtable,
[Parameter(Mandatory=$true)]
$OutputFileLocation
)
$hastableAverage = $NULL #This will only work for hashtables where each entry is consistent. This checks for consistency.
foreach ($hashtabl in $hashtable)
{
$hastableAverage = $hastableAverage + $hashtabl.count #Counts the amount of headings.
}
$Paritycheck = $hastableAverage / $hashtable.count #Gets the average amount of headings
if ( ($parity = $Paritycheck -is [int]) -eq $False) #if the average is not an int the hashtable is not consistent
{
write-host "Error. Hashtable is inconsistent" -ForegroundColor red
Start-Sleep -Seconds 5
return
}
$HashTableHeadings = $hashtable[0].GetEnumerator().name #Get the hashtable headings
$HashTableCount = ($hashtable[0].GetEnumerator().name).count #Count the headings
$HashTableString = $null # Strange to hold the CSV
foreach ($HashTableHeading in $HashTableHeadings) #Creates the first row containing the column headings
{
$HashTableString += $HashTableHeading
$HashTableString += ", "
}
$HashTableString = $HashTableString -replace ".{2}$" #Removed the last , added by the above loop in error
$HashTableString += "`n"
foreach ($hashtabl in $hashtable) #Adds the data
{
for($i=0;$i -lt $HashTableCount;$i++)
{
$HashTableString += $hashtabl[$i]
if ($i -lt ($HashTableCount - 1))
{
$HashTableString += ", "
}
}
$HashTableString += "`n"
}
$HashTableString | Out-File -FilePath $OutputFileLocation #writes the CSV to a file
}
To use this copy the function into your script, run it, and then
ConvertHashTo-CSV -$hashtable $Hasharray -$OutputFileLocation c:\temp\data.CSV
The code is annotated but a brief explanation of what it does. Steps through the arrays and hashtables and adds them to a string adding the required formatting to make the string a CSV file, then outputs that to a file.
The main limitation of this is that the Hashtabes in the array all have to contain the same amount of fields. To get around this if a hashtable has a field that doesnt contain data ensure it contains at least a space.
More on this can be found here : https://grumpy.tech/powershell-convert-hashtable-to-csv/

How to check column count in a file to satisfy a condition

I am trying to write a PowerShell script to check the column count and see if it satisfies the condition or else throw error or email.
something I have tried:
$columns=(Get-Content "C:\Users\xs15169\Desktop\temp\OEC2_CFLOW.txt" | select -First 1).Split(",")
$Count=columns.count
if ($count -eq 280)
echo "column count is:$count"
else
email
I'm going to assume your text file is in CSV format, I can't imagine what format you're working with if it's a text-file table and not formatted as CSV.
If your CSV has headers
Process the CSV file, and count the number of properties on the resulting Powershell object.
$columnCount = #( ( Import-Csv '\path\to\file.txt' ).PSObject.Properties ).Count
We need to force the Properties object to an array (which is the #() syntax) to accurately get the count. The PSObject property is a hidden property for metadata about an object in Powershell, which is where we look for the Properties (column names) and get the count of how many there are.
CSV without headers
If your CSV doesn't have headers, Import-Csv requires you to manually specify the headers. There are tricks you can do to build out unique column names on-the-fly, but they are overly complex for simply getting a column count.
To take what you've already tried above, we can get the data in the first line and process the number of columns, though you were doing it incorrectly in the question. Here's how to properly do it:
$columnCount = ( ( Get-Content "\path\to\file.txt" | Select-Object -First 1 ) -Split ',' ).Count
What was wrong with the original
Both above solutions consolidate getting the column count down to one line of code. But in your original sample, you made a couple small mistakes:
$columns=( Get-Content "\path\to\file.txt" | select -First 1 ).Split(",")
# You forgot to prepend "columns" with a $. Should look like the below line
$Count=$columns.count
And you forgot to use curly braces with your if block:
if ($count -eq 280) {
echo "column count is:$count"
} else {
email
}
As for using the -Split operator vs. the .Split() method - this is purely stylistic preference on my part, and using Split() is perfectly valid.

String matching in PowerShell

I am new to scripting, and I would like to ask you help in the following:
This script should be scheduled task, which is working with Veritas NetBackup, and it creates a backup register in CSV format.
I am generating two source files (.csv comma delimited):
One file contains: JobID, FinishDate, Policy, etc...
The second file contains: JobID, TapeID
It is possible that in the second file there are multiple same JobIDs with different TapeID-s.
I would like to reach that, the script for each line in source file 1 should check all of the source file 2 and if there is a JobID match, if yes, it should have the following output:
JobID,FinishDate,Policy,etc...,TapeID,TapeID....
I have tried it with the following logic, but sometimes I have no TapeID, or I have two same TapeID-s:
Contents of sourcefile 1 is in $BackupStatus
Contents of sourcefile 2 is in $TapesUsed
$FinalReport =
foreach ($FinalPart1 in $BackupStatus) {
write-output $FinalPart1
$MediaID =
foreach ($line in $TapesUsed){
write-output $line.split(",")[1] | where-object{$line.split(",")[0] -like $FinalPart1.split(",")[0]}
}
write-output $MediaID
}
If the CSV files are not huge, it is easier to use Import-Csv instead of splitting the files by hand:
$BackupStatus = Import-Csv "Sourcefile1.csv"
$TapesUsed = Import-Csv "Sourcefile2.csv"
This will generate a list of objects for each file. You can then compare these lists quite easily:
Foreach ($Entry in $BackupStatus) {
$Match = $TapesUsed | Where {$_.JobID -eq $Entry.JobID}
if ($Match) {
$Output = New-Object -TypeName PSCustomObject -Property #{"JobID" = $Entry.JobID ; [...] ; "TapeID" = $Match.TapeID # replace [...] with the properties you want to use
Export-Csv -InputObject $Output -Path <OUTPUTFILE.CSV> -Append -NoTypeInformation }
}
This is a relatively verbose variant, but I prefer it like this.
I am checking for each entry in the first file whether there is a matching entry in the second. If there is one I combine the required fields from the entry of the first list with the ones from the entry in the second list into one object that I can then export very comfortably using Export-Csv.

How to store an array of hashtables in a text file and then call all of the values for a given key in each hashtable

I am using a text file as the backend for an application that I am developing. I first started off leaving the text file in a human-readable format but I decided that there was no sense in that figured it would be best to leave out formatting.
Where I am now in the backend dev process is creating a single-line hashtable with identical keys but different values for each entry. Seems logical and easy to work with.
Here is a mock-up of the entries in the text file:
#{'bName'='1xx'; 'bTotal'='1yy'; 'bSet'='1zz'}
#{'bName'='2xx'; 'bTotal'='2yy'; 'bSet'='2zz'}
#{'bName'='3xx'; 'bTotal'='3yy'; 'bSet'='3zz'}
As you can see, the keys for each entry are identical, however, the values are going to be different. (The numerical and repetitious nature of the values are purely coincidental and put in place for the sake of a mock-up. Actual values will not be numerically-oriented and won't be repetitious as seen in the example.)
I am able to access keys and values by typing:
$hash = Get-Content .\Desktop\Test.txt | Out-String | iex
which outputs:
Name Value
---- -----
bName 1xx
bTotal 1yy
bSet 1zz
bName 2xx
bTotal 2yy
bSet 2zz
bName 3xx
bTotal 3yy
bSet 3zz
What I ultimately want to do is gather each of the values for bName, bTotal, and bSet so that I can append each to a separate WinForms ComboBox. The WinForms part will be simple, I am just having a bit of an issue with getting the values from each hashtable in the text file.
I tried:
$hash.Values | ?{$hash.Keys -contains 'bName'}
but it just prints out every $hash.Value regardless of the $hash.Key match given in the pipe.
I understand that $hash is an array and I figured I may have to pipe out each iteration in a foreach ($hash | %{}) loop but I'm not quite sure the correct way to do this. For example, when I try:
$hash | $_.Keys
or
$hash | $_.Values
it isn't treating each iteration like a hashtable.
What am I doing wrong here? Am I going about it in a convoluted way while there is a much easier way to accomplish this? I am open to all sorts of ideas or suggestions.
As an afterthought: It is kind of funny how often an obvious solution presents itself when you step away and divert your attention towards something else.
I went to grab lunch and I can't, for the life of me, begin to comprehend why I didn't realize that I could just very easily do this:
$hash.bName
or:
$hash.bTotal
or:
$hash.bSet
That will do exact as I was wanting to do. However, considering the answers provided, I may go a different route in terms of using an .ini file in CSV format rather than creating an array of hashtables.
One way of storing hashtables in a text file is the INI format.
[hashtable1]
bName=1xx
bTotal=1yy
bSet=1zz
[hashtable2]
bName=2xx
bTotal=2yy
bSet=2zz
[hashtable3]
bName=3xx
bTotal=3yy
bSet=3zz
INI files are basically a hashtable of hashtables in text form. They can be read like this:
$ht = #{}
Get-Content 'C:\path\to\hashtables.txt' | ForEach-Object {
$_.Trim()
} | Where-Object {
$_ -notmatch '^(;|$)'
} | ForEach-Object {
if ($_ -match '^\[.*\]$') {
$section = $_ -replace '\[|\]'
$ht[$section] = #{}
} else {
$key, $value = $_ -split '\s*=\s*', 2
$ht[$section][$key] = $value
}
}
and written like this:
$ht.Keys | ForEach-Object {
'[{0}]' -f $_
foreach ($key in $ht[$_].Keys) {
'{0}={1}' -f $key, $ht[$_][$key]
}
} | Set-Content 'C:\path\to\hashtables.txt'
Individual values in such a hashtable of hashtables can be accessed like this:
$ht['section']['key']
or like this:
$ht.section.key
Another option would be to store each hashtable in a separate file
hashtable1.txt:
bName=1xx
bTotal=1yy
bSet=1zz
hashtable2.txt.
bName=2xx
bTotal=2yy
bSet=2zz
hashtable3.txt:
bName=3xx
bTotal=3yy
bSet=3zz
That would allow you to import each file into a hashtable via ConvertFrom-StringData:
$ht1 = Get-Content 'C:\path\to\hashtable1.txt' | Out-String |
ConvertFrom-Stringdata
Writing the files would basically be the same as above (there is no ConverTo-StringData cmdlet):
$ht1.Keys | ForEach-Object {
'{0}={1}' -f $_, $ht[$_]
} | Set-Content 'C:\path\to\hashtables1.txt'
PowerShell has built in csv handling so it makes it a good choice to use in this case. So, assuming you had your data stored in a file in the standard csv format with headers:
"bName","bTotal","bSet"
"1xx","1yy","1zz"
"2xx","2yy","2zz"
"3xx","3yy","3zz"
Then you import your data like this:
$data = Import-Csv $path
Now you have an array of PsCustomObject and each header in the csv file is a property of the object. So if, for example, you wanted to get the bTotal of the second object you would do the following:
$data[1].bTotal
2yy