Find out Text data in CSV File Numeric Columns in Powershell - powershell

I am very new in powershell.
I am trying to validate my CSV file by finding out if there is any text value in my numeric fields. I can define with columns are numeric.
This is my source data like this
ColA ColB ColC ColD
23 23 ff 100
2.30E+01 34 2.40E+01 23
df 33 ss df
34 35 36 37
I need output something like this (only text values if found in any column)
ColA ColC ColD
2.30E+01 ff df
df 2.40E+01
ss
I have tried some code but not getting any results, get only some output like as under
System.Object[]
---------------
xxx fff' ddd 3.54E+03
...
This is what I was trying
#
cls
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$arrResult = #()
$arraycol = #()
$FileCol = #("ColA","ColB","ColC","ColD")
$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"
#Importing CSVs
$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","
############## Test Datatype (Is-Numeric)##########
foreach($col in $FileCol)
{
foreach ($line in $dif_file) {
$val = $line.$col
$isnum = Is-Numeric($val)
if ($isnum -eq $false) {
$arrResult += $line.$col
$arraycol += $col
}
}
}
[pscustomobject]#{$arraycol = "$arrResult"}| out-file "C:\Users\$env:username\Desktop\Errors1.csv"
####################
can someone guide me right direction?
Thanks

You can try something like this,
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"
#Importing CSVs
$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","
#$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
# Use this to specify certain columns
$columns = "ColB", "ColC", "ColD"
foreach($row in $dif_file) {
foreach ($col in $columns) {
if ($col -in $columns) {
if (!(Is-Numeric $row.$col)) {
$row.$col = ""
}
}
}
}
$dif_file | Export-Csv C:\temp\formatted.txt
Look up name of columns as you go
Look up values of each col in each row and if it is not numeric, change to ""
Exported updated file.

I think not displaying columns that have no data creates the challenge here. You can do the following:
$csv = Import-Csv "C:\Users\$env:username\desktop\f2.csv"
$finalprops = [collections.generic.list[string]]#()
$out = foreach ($line in $csv) {
$props = $line.psobject.properties | Where {$_.Value -notmatch '^[\d\.]+$'} |
Select-Object -Expand Name
$props | Where {$_ -notin $finalprops} | Foreach-Object { $finalprops.add($_) }
if ($props) {
$line | Select $props
}
$out | Select-Object ($finalprops | Sort)
Given the nature of Format-Table or tabular output, you only see the properties of the first object in the collection. So if object1 has ColA only, but object2 has ColA and ColB, you only see ColA.

The output order you want is quite different than the input CSV; you're tracking bad text data not by first occurrence, but by column order, which requires some extra steps.
test.csv file contents:
ColA,ColB,ColC,ColD
23,23,ff,100
2.30E+01,34,2.40E+01,23
df,33,ss,df
34,35,36,37
Sample code tested to meet your description:
$csvIn = Import-Csv "$PSScriptRoot\test.csv";
# create working data set with headers in same order as input file
$data = [ordered]#{};
$csvIn[0].PSObject.Properties | foreach {
$data.Add($_.Name, (New-Object System.Collections.ArrayList));
};
# add fields with text data
$csvIn | foreach {
$_.PSObject.Properties | foreach {
if ($_.Value -notmatch '^-?[\d\.]+$') {
$null = $data[$_.Name].Add($_.Value);
}
}
}
$removes = #(); # remove `good` columns with numeric data
$rowCount = 0; # column with most bad values
$data.GetEnumerator() | foreach {
$badCount = $_.Value.Count;
if ($badCount -eq 0) { $removes += $_.Key; }
if ($badCount -gt $rowCount) { $rowCount = $badCount; }
}
$removes | foreach { $data.Remove($_); }
0..($rowCount - 1) | foreach {
$h = [ordered]#{};
foreach ($key in $data.Keys) {
$h.Add($key, $data[$key][$_]);
}
[PSCustomObject]$h;
} |
Export-Csv -NoTypeInformation -Path "$PSScriptRoot\text-data.csv";
output file contents:
"ColA","ColC","ColD"
"2.30E+01","ff","df"
"df","2.40E+01",
,"ss",

#Jawad, Finally I have tried
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$arrResult = #()
$columns = "ColA","ColB","ColC","ColD"
$dif_file_path = "C:\Users\$env:username\desktop\f1.csv"
$dif_file = Import-Csv -Path $dif_file_path -Delimiter "," |select $columns
$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
foreach($row in $dif_file) {
foreach ($col in $columns) {
$val = $row.$col
$isnum = Is-Numeric($val)
if ($isnum -eq $false) {
$arrResult += $col+ " " +$row.$col
}}}
$arrResult | out-file "C:\Users\$env:username\desktop\Errordata.csv"
I get correct result in my out file, order is very ambiguous like
ColA ss
ColB 5.74E+03
ColA ss
ColC rrr
ColB 3.54E+03
ColD ss
ColB 8.31E+03
ColD cc
any idea to get proper format? thanks
Note: with your suggested code, I get complete source file with all data , not the specific error data.

Related

Remove certain duplicate values from csv file

I try to import a csv file and create a xlsx file from the data afterwards. My Goal is to only show the value of Column1 once and not in every row. The csv file is already sorted so a check if the previous/next row has the same value would be possible.
CSV
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"Value1A";"Value2B";"Value2C"
"Value1A";"Value3B";"Value3C"
"Value2A";"Value4B";"Value4C"
Expected Outcome
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"";"Value2B";"Value2C"
"";"Value2B";"Value1C"
"Value2A";"Value4B";"Value4C"
Outcome
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"Value1A";"Value2B";"Value2C"
"Value1A";"Value2B";"Value1C"
"Value2A";"Value4B";"Value4C"
Only column1 duplicate cells should be empty.
My Code to import and add to Excel
$csv = "C:\path\to\file.csv"
$i = 1
Import-Csv $csv | Select-Object -Property Column1,Column2,Column3 | ForEach-Object {
$j = 1
foreach ($prop in $_.PSObject.Properties) {
if ($i -eq 1) {
$serverInfoSheet.Cells.Item($i, $j++).Value = $prop.Name
} else {
$serverInfoSheet.Cells.Item($i, $j++).Value = $prop.Value
}
}
$i++
}
To provide further context imagine Column1 as a Date and Columns2 and 3 are Employees.
Example of expected outcome
"12/01/2020";"Mark";"Tony"
"";"Mark";"Andrew"
"";"Tony;Vanessa"
"12/02/2020";"Tony";"Michael"
I dont want the date to repeat 2 times because the excel sheet loses clear view.
$Csv = #'
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"Value1A";"Value2B";"Value2C"
"Value1A";"Value3B";"Value3C"
"Value2A";"Value4B";"Value4C"
'#
$Csv | ConvertFrom-Csv -Delimiter ';' |
Foreach-Object -Begin { $Last1 = $Null } {
if ( $_.Column1 -eq $Last1 ) { $_.Column1 = '' }
else { $Last1 = $_.Column1 }
$_
} | ConvertTo-Csv -Delimiter ';'
"Column1";"Column2";"Column3"
"Value1A";"Value1B";"Value1C"
"";"Value2B";"Value2C"
"";"Value3B";"Value3C"
"Value2A";"Value4B";"Value4C"

How to export two variables into same CSV as joined via PowerShell?

I have a PowerShell script employing poshwsus module like below:
$FileOutput = "C:\WSUSReport\WSUSReport.csv"
$ProcessLog = "C:\WSUSReport\QueryLog2.txt"
$WSUSServers = "C:\WSUSReport\Computers.txt"
$WSUSPort = "8530"
import-module poshwsus
ForEach ($Server in Get-Content $WSUSServers)
{
& connect-poshwsusserver $Server -port $WSUSPort | out-file $ProcessLog -append
$r1 = & Get-PoshWSUSClient | select #{name="Computer";expression={$_.FullDomainName}},#{name="LastUpdated";expression={if ([datetime]$_.LastReportedStatusTime -gt [datetime]"1/1/0001 12:00:00 AM") {$_.LastReportedStatusTime} else {$_.LastSyncTime}}}
$r2 = & Get-PoshWSUSUpdateSummaryPerClient -UpdateScope (new-poshwsusupdatescope) -ComputerScope (new-poshwsuscomputerscope) | Select Computer,NeededCount,DownloadedCount,NotApplicableCount,NotInstalledCount,InstalledCount,FailedCount
}
What I need to do is to export CSV outpout including the results with the columns (like "inner join"):
Computer, NeededCount, DownloadedCount, NotApplicableCount, NotINstalledCount, InstalledCount, FailedCount, LastUpdated
I have tried to use the line below in foreach, but it didn't work as I expected.
$r1 + $r2 | export-csv -NoTypeInformation -append $FileOutput
I appreciate if you may help or advise.
EDIT --> The output I've got:
ComputerName LastUpdate
X A
Y B
X
Y
So no error, first two rows from $r2, last two rows from $r1, it is not joining the tables as I expected.
Thanks!
I've found my guidance in this post: Inner Join in PowerShell (without SQL)
Modified my query accordingly like below, works like a charm.
$FileOutput = "C:\WSUSReport\WSUSReport.csv"
$ProcessLog = "C:\WSUSReport\QueryLog.txt"
$WSUSServers = "C:\WSUSReport\Computers.txt"
$WSUSPort = "8530"
import-module poshwsus
function Join-Records($tab1, $tab2){
$prop1 = $tab1 | select -First 1 | % {$_.PSObject.Properties.Name} #properties from t1
$prop2 = $tab2 | select -First 1 | % {$_.PSObject.Properties.Name} #properties from t2
$join = $prop1 | ? {$prop2 -Contains $_}
$unique1 = $prop1 | ?{ $join -notcontains $_}
$unique2 = $prop2 | ?{ $join -notcontains $_}
if ($join) {
$tab1 | % {
$t1 = $_
$tab2 | % {
$t2 = $_
foreach ($prop in $join) {
if (!$t1.$prop.Equals($t2.$prop)) { return; }
}
$result = #{}
$join | % { $result.Add($_,$t1.$_) }
$unique1 | % { $result.Add($_,$t1.$_) }
$unique2 | % { $result.Add($_,$t2.$_) }
[PSCustomObject]$result
}
}
}
}
ForEach ($Server in Get-Content $WSUSServers)
{
& connect-poshwsusserver $Server -port $WSUSPort | out-file $ProcessLog -append
$r1 = & Get-PoshWSUSClient | select #{name="Computer";expression={$_.FullDomainName}},#{name="LastUpdated";expression={if ([datetime]$_.LastReportedStatusTime -gt [datetime]"1/1/0001 12:00:00 AM") {$_.LastReportedStatusTime} else {$_.LastSyncTime}}}
$r2 = & Get-PoshWSUSUpdateSummaryPerClient -UpdateScope (new-poshwsusupdatescope) -ComputerScope (new-poshwsuscomputerscope) | Select Computer,NeededCount,DownloadedCount,NotApplicableCount,NotInstalledCount,InstalledCount,FailedCount
Join-Records $r1 $r2 | Select Computer,NeededCount,DownloadedCount,NotApplicableCount,NotInstalledCount,InstalledCount,FailedCount, LastUpdated | export-csv -NoTypeInformation -append $FileOutput
}
I think this could be made simpler. Since Select-Object's -Property parameter accepts an array of values, you can create an array of the properties you want to display. The array can be constructed by comparing your two objects' properties and outputting a unique list of those properties.
$selectProperties = $r1.psobject.properties.name | Compare-Object $r2.psobject.properties.name -IncludeEqual -PassThru
$r1,$r2 | Select-Object -Property $selectProperties
Compare-Object by default will output only differences between a reference object and a difference object. Adding the -IncludeEqual switch displays different and equal comparisons. Adding the -PassThru parameter outputs the actual objects that are compared rather than the default PSCustomObject output.

CSV file - count distinct, group by, sum

I have a file that looks like the following;
- Visitor ID,Revenue,Channel,Flight
- 1234,100,Email,BA123
- 2345,200,PPC,BA112
- 456,150,Email,BA456
I need to produce a file that contains;
The count of distinct Visitor IDs (3)
The total revenue (450)
The count of each Channel
Email 2
PPC 2
The count of each Flight
BA123 1
BA112 1
BA456 1
So far I have the following code, however when executing this on the 350MB file, it takes too long and in some cases breaks the memory limit. As I have to run this function on multiple columns, it is going through the file many times. I ideally need to do this in one file pass.
$file = 'log.txt'
function GroupBy($columnName)
{
$objects = Import-Csv -Delimiter "`t" $file | Group-Object $columnName |
Select-Object #{n=$columnName;e={$_.Group[0].$columnName}}, Count
for($i=0;$i -lt $objects.count;$I++) {
$line += $columnName +"|"+$objects[$I]."$columnName" +"|Count|"+ $objects[$I].'Count' + $OFS
}
return $line
}
$finalOutput += GroupBy "Channel"
$finalOutput += GroupBy "Flight"
Write-Host $finalOutput
Any help would be much appreciated.
Thanks,
Craig
The fact that your are importing the CSV again for each column is what is killing your script. Try to do the loading once, then re-use the data. For example:
$data = Import-Csv .\data.csv
$flights = $data | Group-Object Flight -NoElement | ForEach-Object {[PsCustomObject]#{Flight=$_.Name;Count=$_.Count}}
$visitors = ($data | Group-Object "Visitor ID" | Measure-Object).Count
$revenue = ($data | Measure-Object Revenue -Sum).Sum
$channel = $data | Group-Object Channel -NoElement | ForEach-Object {[PsCustomObject]#{Channel=$_.Name;Count=$_.Count}}
You can display the data like this:
"Revenue : $revenue"
"Visitors: $visitors"
$flights | Format-Table -AutoSize
$channel | Format-Table -AutoSize
This will probably work - using hashmaps.
Pros: It will be faster/use less memory.
Cons: It is less readable
by far than Group-Object, and requires more code.
Make it even less memory-hungry: Read the CSV-file line by line
$data = Import-CSV -Path "C:\temp\data.csv" -Delimiter ","
$DistinctVisitors = #{}
$TotalRevenue = 0
$ChannelCount = #{}
$FlightCount = #{}
$data | ForEach-Object {
$DistinctVisitors[$_.'Visitor ID'] = $true
$TotalRevenue += $_.Revenue
if (-not $ChannelCount.ContainsKey($_.Channel)) {
$ChannelCount[$_.Channel] = 0
}
$ChannelCount[$_.Channel] += 1
if (-not $FlightCount.ContainsKey($_.Flight)) {
$FlightCount[$_.Flight] = 0
}
$FlightCount[$_.Flight] += 1
}
$DistinctVisitorsCount = $DistinctVisitors.Keys | Measure-Object | Select-Object -ExpandProperty Count
Write-Output "The count of distinc Visitor IDs $DistinctVisitorsCount"
Write-Output "The total revenue $TotalRevenue"
Write-Output "The Count of each Channel"
$ChannelCount.Keys | ForEach-Object {
Write-Output "$_ $($ChannelCount[$_])"
}
Write-Output "The count of each Flight"
$FlightCount.Keys | ForEach-Object {
Write-Output "$_ $($FlightCount[$_])"
}

Powershell : merge two CSV files with partially duplicate lines

I have scraped two files from a website in order to list the companies in my city.
The first lists : name, city, phone number, email
The second lists : name, city, phone number
And I will have duplicate lines if I merge them, as an example, i will have the following :
> "Firm1";"Los Angeles";"000000";"info#firm1.lol"
> "Firm1";"Los Angeles";"000000";""
> "Firm2";"Los Angeles";"111111";""
> "Firm3";"Los Angeles";"000000";"contact#firm3.lol"
> "Firm3";"Los Angeles";"000000";""
> ...
Is there a way to merge the two files and keep the max info like this :
> "Firm1";"Los Angeles";"000000";"info#firm1.lol"
> "Firm2";"Los Angeles";"111111";""
> "Firm3";"Los Angeles";"000000";"contact#firm3.lol"
> ...
According to the fact you've got a file like this called 'firm.csv'
"Firm1";"Los Angeles";"000000";"info#firm1.lol"
"Firm1";"Los Angeles";"000000";""
"Firm2";"Los Angeles";"111111";""
"Firm3";"Los Angeles";"000000";"contact#firm3.lol"
"Firm3";"Los Angeles";"000000";""
You can load it using :
$firms = import-csv C:\temp\firm.csv -Header 'Firm','Town','Tel','Mail' -Delimiter ';'
Then
$firms | Sort-Object -Unique -Property 'Firm'
According to Joey's comment I improved the solution :
$firms | Group-Object -Property 'firm' | % {$_.group | Sort-Object -Property mail -Descending | Select-Object -first 1}
EDIT: just realized the two files don't contain the same headers. Here is an update.
$main = Import-Csv firm1.csv -Header 'Firm','Town','Tel','Mail' -Delimiter ";"
$alt = Import-Csv firm2.csv -Header 'Firm','Town','Tel' -Delimiter ";"
foreach ($f in $alt)
{
$found = $false
foreach($g in $main)
{
if ($g.Firm -eq $f.Firm -and $g.city -eq $f.city)
{
$found = $true
if ($g.Tel -eq "")
{
$g.Tel = $f.Tel
}
}
}
if ($found -eq $false)
{
$main += $f
}
}
# Everything is merged into the $main array
$main
There must be better approach but this is one costy way to do this.
$firms = import-csv C:\firm.csv -Header 'Firm','Town','Tel','Mail' -Delimiter ';'
$Result = #()
ForEach($i in $firms){
$found = 0;
ForEach($m in $Result){
if($m.Firm -eq $i.Firm){
$found = 1
if( $i.Mail.length -ne 0 )
{
$m.Mail = $i.Mail
}
break;
}
}
if($found -eq 0){
$Result += [pscustomobject] #{Firm=$i.Firm; Town=$i.Town; Tel=$i.Tel; Mail=$i.Mail}
}
}
$Result | export-csv C:\out.csv

compare two csv using powershell and return matching and non-matching values

I have two csv files, i want to check the users in username.csv matches with userdata.csv copy
to output.csv. If it does not match return the name alone in the output.csv
For Ex: User Data contains 3 columns
UserName,column1,column2
Hari,abc,123
Raj,bca,789
Max,ghi,123
Arul,987,thr
Prasad,bxa,324
username.csv contains usernames
Hari
Rajesh
Output.csv should contain
Hari,abc,123
Rajesh,NA,NA
How to achieve this. Thanks
Sorry for that.
$Path = "C:\PowerShell"
$UserList = Import-Csv -Path "$($path)\UserName.csv"
$UserData = Import-Csv -Path "$($path)\UserData.csv"
foreach ($User in $UserList)
{
ForEach ($Data in $UserData)
{
If($User.Username -eq $Data.UserName)
{
# Process the data
$Data
}
}
}
This returns only matching values. I also need to add the non-matching values in output
file. Thanks.
something like this will work:
$Path = "C:\PowerShell"
$UserList = Import-Csv -Path "$($path)\UserName.csv"
$UserData = Import-Csv -Path "$($path)\UserData.csv"
$UserOutput = #()
ForEach ($name in $UserList)
{
$userMatch = $UserData | where {$_.UserName -eq $name.usernames}
If($userMatch)
{
# Process the data
$UserOutput += New-Object PsObject -Property #{UserName =$name.usernames;column1 =$userMatch.column1;column2 =$userMatch.column2}
}
else
{
$UserOutput += New-Object PsObject -Property #{UserName =$name.usernames;column1 ="NA";column2 ="NA"}
}
}
$UserOutput | ft
It loops through each name in the user list. Line 9 does a search of the userdata CSV for a matching user name if it finds it it adds the user data for that user to the output if no match is found it adds the user name to the output with NA in both columns.
had to change your userList csv:
usernames
Hari
Rajesh
expected output:
UserName column1 column2
-------- ------- -------
Hari abc 123
Rajesh NA NA
I had a similar situation, where I needed a "changed record collection" holding the entire record when the current record was either new or had any changes when compared to the previous record. This was my code:
# get current and previous CSV
$current = Import-Csv -Path $current_file
$previous = Import-Csv -Path $previous_file
# collection with new or changed records
$deltaCollection = New-Object Collections.Generic.List[System.Object]
:forEachCurrent foreach ($row in $current) {
$previousRecord = $previous.Where( { $_.Id -eq $row.Id } )
$hasPreviousRecord = ($null -ne $previousRecord -and $previousRecord.Count -eq 1)
if ($hasPreviousRecord -eq $false) {
$deltaCollection.Add($current)
continue forEachCurrent
}
# check if value of any property is changed when compared to the previous
:forEachCurrentProperty foreach ($property in $current.PSObject.Properties) {
$columnName = $property.Name
$currentValue = if ($null -eq $property.Value) { "" } else { $property.Value }
$previousValue = if ($hasPreviousRecord) { $previousRecord[0]."$columnName" } else { "" }
if ($currentValue -ne $previousValue -or $hasPreviousRecord -eq $false) {
$deltaCollection.Add($currentCenter)
continue forEachCurrentProperty
}
}
}