Compare two csv entries in powershell - powershell

I'm just learning powershell, but have run up on this error that has me stumped. The objective is to take a csv file and ascertain how many groups there are in the file. There will be multiple entries under each group. The end result would be to split the file into different arrays/dictionaries/whathaveyou (i started with powershell 3hrs ago..) based on the groups, keeping the TaxonIDs intact, then export those to separate files. But for now, I'm just at the comparison step.
My practice data look like
Group,TaxonID
AA,1
AA,2
BB,3
BB,4
and true data look like:
Group,TaxonID
Bilateria_Ropsin,Mus_musculus_Rhabdomeric_MEL
Bilateria_Ropsin,ROp_OG3TodaroP
To do this, I tried to compare the group in one row with the group in the next row. if they differ, then I add one to a variable for later use. Here's what i've got to do the comparison:
Set-Variable -name book -value C:\Users\XXX\Documents\Book1.csv
$work = Import-Csv $book
$numgroups = 0
$i=0
foreach ($Group in $work) {
$Ogroup = $work[$i] | select-object{$_.Group}
$nextGroup = $work[$i+1] | select-object{$_.Group}
$compare = $Ogroup.Equals($nextGroup)
$compare
$i++
}
When i print out $Ogroup and $nextGroup, they give me the proper pairs (AA AA, AA BB, BB BB), but $compare always prints out false. Using Compare-Object gives me an error about $nextGroup being null..so I opted for using .Equals(). CompareTo() throws an error about it not being a valid method.
I'm stumped and need help.

You shoud use Group-Object CmdLet
Set-Variable -name book -value C:\Users\XXX\Documents\Book1.csv
$work = Import-Csv $book
$groups = $work | Group-Object -Property Group
$groups.count
$groups[0].name
$groups[0].Group[0]
for ($i=0 ; $i -lt $groups.count ; $i++) { $groups[$i].name}
for ($i = 0 ; $i -lt $groups.count ; $i++)
{
$groups[$i].name;
for ($j=0 ; $j -lt $groups[$i].count;$j++)
{
$groups[$i].group[$j]
}
}

Comparing isn't needed. Split the groups out first:
$book = "$env:USERPROFILE\Documents\Book1.csv"
$work = Import-CSV $book
$Groups = $work | Group-Object Group
ForEach($Group in $Groups){
"Group Name: "+$Group.name
"Taxon IDs: "
$Group.group.TaxonID
}
If you want to save each group to a file, within that ForEach loop you can do $Group|export-csv "c:\path\$($Group.name).csv" -notypeinfo
Edit: Sorry, was working on an answer when security called me over the intercom and said a water main burst under my truck and they needed me to go move my vehicle. Figured not losing my vehicle to a sinkhole was more important than this.

Related

powershell 2 working with multiple csv files

I have 2 csv files. One with names and the other with a list of phone numbers. i'm trying to loop through each csv file and run a script on the user and name.
file examples
Name Number
a 1
b 2
c 3
So I need to run a script like a + 1, b + 2, c + 3
trying to use a foreach loop but it not working correctly it's nested incorrectly and can't figure it out.
if ($users){
foreach ($u in $users)
{
$username= ($u.'Mail')
foreach ($n in $numbers)
{
$number = ($n.'ID')
}
You're close, in that you need to use a loop, but you want one loop to get data from both files, so in this case you're best off using a For loop like this:
$Combined = For($i = 0; $i -lt $users.count; $i++) {
$Props = #{
Mail = $users[$i].Mail
ID = $numbers[$i].ID
}
New-Object PSObject -Prop $Props
}
$Combined | Export-Csv C:\Path\To\Combined.csv -NoTypeInfo

PowerShell Keeping track of updated and rejected rows while cleaning up files

Add-Member, hashtables, arrays and such confuse me a bit so I'm not sure the best way to approach this. My goal is to take an input.CSV, perform clean up and send those cleaned rows to Fixed.CSV, and send any 'reject rows' that couldn't be handled to reject.CSV with an explanation of why they were rejected.
My original script was splitting the 'good' from the 'bad' based on a single characteristic (e.g. a missing Account ID), but as a I get into the clean-up, there are other things that would cause a row to error out and I don't want to read the data into memory with .Where() and continually 'split' it - especially considering I'd like to finish with only 3 files total (OG-input.CSV, Fixed.CSV, Junk-reject.CSV).
$data, $rejectData = (Import-CSV $CSV).Where({![string]::IsNullOrEmpty($_."Account ID")}, 'Split')
If($rejectData){
$rejectData | Add-Member -NotePropertyName "Reject Reason" -NotePropertyValue "Account ID missing"
$rejectData | Export-CSV -LiteralPath "$($CSV.DirectoryName)\$($CSV.BaseName)_reject.csv" -NoTypeInformation
My output file was basically created after I had performed a bunch of steps on each row of $data above.
$outputFile = New-Object System.Collections.ArrayList
Foreach($row in $data){
# Do stuff, check using If, make updates, etc.
[void]$outputFile.Add($row)
}
$outputFile | Export-CSV -LiteralPath "$($CSV.DirectoryName)\$($CSV.BaseName)Fixed.csv" -NoTypeInformation
What I'm thinking at this point is instead of splitting the data initially, I should just iterate through all rows and if I can update them; I will and send to $outputFixed. If there is an error that can't be corrected, I'll send them to $outputReject - but here's the caveat, I want to add a new column for "Reject Reason" and then update that as I go. What I mean by this is, there could be multiple reasons a row gets rejected and I'd like to track each one. I've gotten it somewhat close, but creating the new column is giving me trouble. I was originally going to use Add-Member for the first time I add the column, and then just update the value in that column for each $row; something like $row."Reject Reason" = "$($row."Reject Reason")|New Reason" as this gets me a pipe-delimited list of reasons a row rejected. Then I found Powershell add-member. Add a member that's an ArrayList? that got me thinking maybe I could have the reasons within Reject Reason be a list themselves rather than just delimited. However, I'm not sure I quite understand the nuances of the answers proposed and can't figure out what might work best for me.
Nested arrays/lists are great, but you'll have to consider how you want to store and display your data.
A CSV file, like a table, doesn't properly handle nested objects like lists or arrays. This can be fine if you know your data, and don't mind converting your RejectReason field from/to a delimited string when you read it. For example, you could use Where-Object's filter block to find all the entries in $outputRejected with a specific reason:
# similar to what you had before
$csv = Import-Csv $path
$report = foreach ($row in $csv) {
$row | Add-Member -NotePropertyName 'RejectCode' -NotePropertyValue ''
if ($row.id -lt 5) { $row.RejectCode = $row.RejectCode+'Too Low|' }
if ($row.id -gt 3) { $row.RejectCode = $row.RejectCode+'Too High|' }
# Output the finalized row
$row
}
# Example: filter by reason code
$OutputRejected | Where-Object {($_.Reason -split '\|') -contains 'Too High'}
ID RejectCode
-- ----------
4 Too Low|Too High|
5 Too High|
For what you are doing, this usually works just fine. You have to be careful of your additional separator characters, but since you're defining the RejectCode yourself, it shouldn't be an issue.
For anything more complicated, I tend to create a PSCustomObject from each $row and set each property to what I need. This tends to work a little better for me than using Add-Member:
$report = foreach ($row in $csv) {
# custom object with manually defined properties
$reportRow = [PSCustomObject][Ordered]#{
ID = $row.ID
Name = $row.Name
Data = # run some commands to fix bad data
Reasons = #() # list object
}
# can edit properties as normal
if ($row.id -lt 5) { $reportRow.Reasons += $row.RejectCode+'Too Low|' }
if ($row.id -gt 3) { $reportRow.Reasons += $row.RejectCode+'Too High|' }
$reportrow
}
Just be aware that powershell's CSV commands tend to squish properties into the unhelpful system.object[] text when your properties aren't simple values like strings or ints. A better option for saving nested objects like this is a structured format like JSON. e.g.: $report | ConvertTo-Json | Out-File $path.
Without seeing any of your CSV, You could do something like this:
$csvPath = 'X:\Temp'
$original = Import-CSV -Path (Join-Path -Path $csvPath -ChildPath 'OG-input.CSV')
# create a List object to collect the rejected items
$rejects = [System.Collections.Generic.List[object]]::new()
$correct = foreach ($item in $original) {
$reason = $null
if ([string]::IsNullOrWhiteSpace($_.'Account ID')) { $reason = "Empty 'Account ID' field" }
elseif ($_.'Account ID'.Length -gt 20) { $reason = "'Account ID' field exceeds maximum length" }
# more elseif checks go here
# after all checks are done
if (!$reason) {
# all OK for this row; just output so it gets collected in $correct
$item
}
else {
# it's a rejected item, add an object to the $rejects list
$obj = $item | Select-Object *, #{Name = 'Reason'; Expression = {$reason}}
$rejects.Add($obj)
}
}
# save both files
$correct | Export-Csv -Path (Join-Path -Path $csvPath -ChildPath 'Fixed.CSV') -NoTypeInformation
$rejects | Export-Csv -Path (Join-Path -Path $csvPath -ChildPath 'Junk-reject.CSV') -NoTypeInformation
You need to fill in the rest of the checks and reasons for rejection of course
Here's what I ended up going with. I think it works, as the output looks about like I expected.
Foreach($row in $data){
#Process all reject reasons first and reject those rows
If([string]::IsNullOrEmpty($row."Account ID")){
$row | Add-Member -NotePropertyName "Reject Reason" -NotePropertyValue ("$($row."Reject Reason")", "Missing Account ID" -Join "|").TrimStart("|") -Force
}
If([string]::IsNullOrEmpty($row."Service Start Dates") -And ([string]::IsNullOrEmpty($row."Service End Dates"))){
$row | Add-Member -NotePropertyName "Reject Reason" -NotePropertyValue ("$($row."Reject Reason")", "Missing Both Service Dates" -Join "|").TrimStart("|") -Force
}
If(Get-Member -InputObject $row "Reject Reason"){
[void]$outputReject.Add($row)
Continue
}
If([string]::IsNullOrEmpty($row."Birth Date")){
$row."Birth Date" = $dte
}
If([string]::IsNullOrEmpty($row."Gender")){
$row."Gender" = "Female"
}
If( [string]::IsNullOrEmpty($row."Service Start Dates") -And !( [string]::IsNullOrEmpty($row."Service End Dates"))){
$row."Service Start Dates" = $row."Service End Dates"
}
[void]$outputFixed.Add($row)
}
$outputFixed | Export-CSV -LiteralPath "$($inputFile.DirectoryName)\$($inputFile.BaseName)Fixed.csv" -NoTypeInformation
If($outputReject){
$outputReject | Export-CSV -LiteralPath "$($inputFile.DirectoryName)\$($inputFile.BaseName)RejectedRows.csv" -NoTypeInformation
}
Basically I'm still collecting each row in an ArrayList that will be output once the entire file has been processed. I'm using Add-Member with -Force to 'overwrite' the reject reason(s) and a -Join of the text with a .TrimStart("|") to get rid of the leading pipe. This will definitely work for me (plus was easy to implement with what I already had written)

How to seperate CSV values within a CSV into new rows in PowerShell

I'm receiving an automated report from a system that cannot be modified as a CSV. I am using PowerShell to split the CSV into multiple files and parse out the specific data needed. The CSV contains columns that may contain no data, 1 value, or multiple values that are comma separated within the CSV file itself.
Example(UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542, 340668, 292196"
"Approval","AA-334454, 344366, 323570, 322827, 360225, 358850, 345935"
"ITS","345935, 358850"
"Services",""
I want the data to have one entry per line like this (UPDATED FOR CLARITY):
"Group","Members"
"Event","362403"
"Risk","324542"
"Risk","340668"
"Risk","292196"
#etc.
I've tried splitting the data and I just get an unknown number of columns at the end.
I tried a foreach loop, but can't seem to get it right (pseudocode below):
Import-CSV $Groups
ForEach ($line in $Groups){
If($_.'Members'.count -gt 1, add-content "$_.Group,$_.Members[2]",)}
I appreciate any help you can provide. I've searched all the stackexchange posts and used Google but haven't been able to find something that addresses this exact issue.
Import-Csv .\input.csv | ForEach-Object {
ForEach ($Member in ($_.Members -Split ',')) {
[PSCustomObject]#{Group = $_.Group; Member = $Member.Trim()}
}
} | Export-Csv .\output.csv -NoTypeInformation
# Get the raw text contents
$CsvContents = Get-Content "\path\to\file.csv"
# Convert it to a table object
$CsvData = ConvertFrom-CSV -InputObject $CsvContents
# Iterate through the records in the table
ForEach ($Record in $CsvData) {
# Create array from the members values at commas & trim whitespace
$Record.Members -Split "," | % {
$MemberCount = $_.Trim()
# Check if the count is greater than 1
if($MemberCount -gt 1) {
# Create our output string
$OutputString = "$($Record.Group), $MemberCount"
# Write our output string to a file
Add-Content -Path "\path\to\output.txt" -Value $OutputString
}
}
}
This should work, you had the right idea but I think you may have been encountering some syntax issues. Let me know if you have questions :)
Revised the code as per your updated question,
$List = Import-Csv "\path\to\input.csv"
foreach ($row in $List) {
$Group = $row.Group
$Members = $row.Members -split ","
# Process for each value in Members
foreach ($MemberValue in $Members) {
# PS v3 and above
$Group + "," + $MemberValue | Export-Csv "\path\to\output.csv" -NoTypeInformation -Append
# PS v2
# $Group + "," + $MemberValue | Out-File "\path\to\output.csv" -Append
}
}

Adding multiple rows to CSV file at once through PowerShell

Background
I've been looking through several posts here on Stack and can only find answers to "how to add one single row of data to a CSV file" (notably this one). While they are good, they only refer to the specific case of adding a single entry from memory. Suppose I have 100,000 rows I want to add to a CSV file, then the speed of the query will be orders of magnitude slower if I for each row write it to file. I imagine that it will be much faster to keep everything in memory, and once I've built a variable that contains all the data that I want to add, only then write it to file.
Current situation
I have log files that I receive from customers containing about half a million rows. Some of these rows begin with a datetime and how much memory the server is using. In order to get a better view of how the memory usage looks like, I want to plot the memory usage over time using this information. (Note: yes, the best solution would be to ask the developers to add this information as it is fairly common we need this, but since we don't have that yet, I need to work with what I got)
I am able to read the log files, extract the contents, create two variables called $timeStamp and $memoryUsage that finds all the relevant entries. The problem occurs when I occurs when I try to add this to a custom PSObject. It would seem that using a $csvObject += $newRow only adds a pointer to the $newRow variable rather than the actual row itself. Here's the code that I've got so far:
$header1 = "Time Stamp"
$header2 = "Memory Usage"
$csvHeaders = #"
$header1;$header2
"#
# The following two lines are a workaround to make sure that the $csvObject becomes a PSObject that matches the output I'm trying to achieve.
$csvHeaders | Out-File -FilePath $csvFullPath
$csvObject = Import-Csv -Path $csvFullPath -Delimiter ";"
foreach ($TraceFile in $traceFilesToLookAt) {
$curTraceFile = Get-Content $TraceFile.FullName
Write-Host "Starting on file: $($TraceFile.Name)`n"
foreach ($line in $curTraceFile) {
try {
if (($line.Substring(4,1) -eq '-') -and ($line.Substring(7,1) -eq '-')) {
$TimeStamp = $line.Split("|",4)[0]
$memoryUsage = $($line.Split("|",4)[2]).Replace(",","")
$newRow = New-Object PSObject -Property #{
$header1 = $TimeStamp;
$header2 = $memoryUsage
}
$reorderedRow = $newRow | Select-Object -Property $header1,$header2
$reorderedRow | Export-Csv -Path $csvFullPath -Append -Delimiter ";"
}
} catch {
Out-Null
}
This works fine as it appends the row each time it finds one to the CSV file. The problem is that it's not very efficient.
End goal
I would ideally like to solve it with something like:
$newRow = New-Object PSObject -Property #{
$header1 = $TimeStamp;
$header2 = $memoryUsage
}
$rowsToAddToCSV += $newRow
And then in the final step do a:
$rowsToAddToCSV | Export-Csv -Path $csvFullPath -Append -Delimiter ";"
I have not been able to create any form of workaround for this. Among other things, PowerShell tells me that op_Addition is not part of the object, that the object I'm trying to export (the collection of rows) doesn't match the CSV file etc.
Anything that appends thousands of items to an array in a loop is bound to perform poorly, because each time an item is appended, the array will be re-created with its size increased by one, all existing items are copied, and then the new item is put in the new free slot.
Any particular reason why you can't simply do something like this?
$traceFilesToLookAt | ForEach-Object {
Get-Content $_.FullName | ForEach-Object {
if ($_.Substring(4, 1) -eq '-' -and $_.Substring(7, 1) -eq '-') {
$line = $_.Split('|', 4)
New-Object PSObject -Property #{
'Time Stamp' = $line[0]
'Memory Usage' = $line[2].Replace(',', '')
}
}
}
} | Export-Csv -Path $csvFullPath -Append -Delimiter ";"
A regular expression match might be an even more elegant approach to extracting timestamp and memory usage from the input files, but I'm going to leave that as an exercise for you.

Powershell: Search data in *.txt files to export into *.csv

First of all, this is my first question here. I often come here to browse existing topics, but now I'm hung on my own problem. And I didn't found a helpful resource right now. My biggest concern would be, that it won't work in Powershell... At the moment I try to get a small Powershell tool to save me a lot of time. For those who don't know cw-sysinfo, it is a tool that collects information of any host system (e.g. Hardware-ID, Product Key and stuff like that) and generates *.txt files.
My point is, if you have 20, 30 or 80 server in a project, it is a huge amount of time to browse all files and just look for those lines you need and put them together in a *.csv file.
What I have working is more like the basic of the tool, it browses all *.txt in a specific path and checks for my keywords. And here is the problem that I just can use the words prior to those I really need, seen as follow:
Operating System: Windows XP
Product Type: Professional
Service Pack: Service Pack 3
...
I don't know how I can tell Powershell to search for "Product Type:"-line and pick the following "Professional" instead. Later on with keys or serial numbers it will be the same problem, that is why I just can't browse for "Standard" or "Professional".
I placed my keywords($controls) in an extra file that I can attach the project folders and don't need to edit in Powershell each time. Code looks like this:
Function getStringMatch
{
# Loop through the project directory
Foreach ($file In $files)
{
# Check all keywords
ForEach ($control In $controls)
{
$result = Get-Content $file.FullName | Select-String $control -quiet -casesensitive
If ($result -eq $True)
{
$match = $file.FullName
# Write the filename according to the entry
"Found : $control in: $match" | Out-File $output -Append
}
}
}
}
getStringMatch
I think this is the kind of thing you need, I've changed Select-String to not use the -quiet option, this will return a matches object, one of the properties of this is the line I then split the line on the ':' and trim any spaces. These results are then placed into a new PSObject which in turn is added to an array. The array is then put back on the pipeline at the end.
I also moved the call to get-content to avoid reading each file more than once.
# Create an array for results
$results = #()
# Loop through the project directory
Foreach ($file In $files)
{
# load the content once
$content = Get-Content $file.FullName
# Check all keywords
ForEach ($control In $controls)
{
# find the line containing the control string
$result = $content | Select-String $control -casesensitive
If ($result)
{
# tidy up the results and add to the array
$line = $result.Line -split ":"
$results += New-Object PSObject -Property #{
FileName = $file.FullName
Control = $line[0].Trim()
Value = $line[1].Trim()
}
}
}
}
# return the results
$results
Adding the results to a csv is just a case of piping the results to Export-Csv
$results | Export-Csv -Path "results.csv" -NoTypeInformation
If I understand your question correctly, you want some way to parse each line from your report files and extract values for some "keys". Here are a few lines to give you an idea of how you could proceede. The example is for one file, but can be generalized very easily.
$config = Get-Content ".\config.txt"
# The stuff you are searching for
$keys = #(
"Operating System",
"Product Type",
"Service Pack"
)
foreach ($line in $config)
{
$keys | %{
$regex = "\s*?$($_)\:\s*(?<value>.*?)\s*$"
if ($line -match $regex)
{
$value = $matches.value
Write-Host "Key: $_`t`tValue: $value"
}
}
}