Powershell : merge two CSV files with partially duplicate lines - powershell

I have scraped two files from a website in order to list the companies in my city.
The first lists : name, city, phone number, email
The second lists : name, city, phone number
And I will have duplicate lines if I merge them, as an example, i will have the following :
> "Firm1";"Los Angeles";"000000";"info#firm1.lol"
> "Firm1";"Los Angeles";"000000";""
> "Firm2";"Los Angeles";"111111";""
> "Firm3";"Los Angeles";"000000";"contact#firm3.lol"
> "Firm3";"Los Angeles";"000000";""
> ...
Is there a way to merge the two files and keep the max info like this :
> "Firm1";"Los Angeles";"000000";"info#firm1.lol"
> "Firm2";"Los Angeles";"111111";""
> "Firm3";"Los Angeles";"000000";"contact#firm3.lol"
> ...

According to the fact you've got a file like this called 'firm.csv'
"Firm1";"Los Angeles";"000000";"info#firm1.lol"
"Firm1";"Los Angeles";"000000";""
"Firm2";"Los Angeles";"111111";""
"Firm3";"Los Angeles";"000000";"contact#firm3.lol"
"Firm3";"Los Angeles";"000000";""
You can load it using :
$firms = import-csv C:\temp\firm.csv -Header 'Firm','Town','Tel','Mail' -Delimiter ';'
Then
$firms | Sort-Object -Unique -Property 'Firm'
According to Joey's comment I improved the solution :
$firms | Group-Object -Property 'firm' | % {$_.group | Sort-Object -Property mail -Descending | Select-Object -first 1}

EDIT: just realized the two files don't contain the same headers. Here is an update.
$main = Import-Csv firm1.csv -Header 'Firm','Town','Tel','Mail' -Delimiter ";"
$alt = Import-Csv firm2.csv -Header 'Firm','Town','Tel' -Delimiter ";"
foreach ($f in $alt)
{
$found = $false
foreach($g in $main)
{
if ($g.Firm -eq $f.Firm -and $g.city -eq $f.city)
{
$found = $true
if ($g.Tel -eq "")
{
$g.Tel = $f.Tel
}
}
}
if ($found -eq $false)
{
$main += $f
}
}
# Everything is merged into the $main array
$main

There must be better approach but this is one costy way to do this.
$firms = import-csv C:\firm.csv -Header 'Firm','Town','Tel','Mail' -Delimiter ';'
$Result = #()
ForEach($i in $firms){
$found = 0;
ForEach($m in $Result){
if($m.Firm -eq $i.Firm){
$found = 1
if( $i.Mail.length -ne 0 )
{
$m.Mail = $i.Mail
}
break;
}
}
if($found -eq 0){
$Result += [pscustomobject] #{Firm=$i.Firm; Town=$i.Town; Tel=$i.Tel; Mail=$i.Mail}
}
}
$Result | export-csv C:\out.csv

Related

Find out Text data in CSV File Numeric Columns in Powershell

I am very new in powershell.
I am trying to validate my CSV file by finding out if there is any text value in my numeric fields. I can define with columns are numeric.
This is my source data like this
ColA ColB ColC ColD
23 23 ff 100
2.30E+01 34 2.40E+01 23
df 33 ss df
34 35 36 37
I need output something like this (only text values if found in any column)
ColA ColC ColD
2.30E+01 ff df
df 2.40E+01
ss
I have tried some code but not getting any results, get only some output like as under
System.Object[]
---------------
xxx fff' ddd 3.54E+03
...
This is what I was trying
#
cls
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$arrResult = #()
$arraycol = #()
$FileCol = #("ColA","ColB","ColC","ColD")
$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"
#Importing CSVs
$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","
############## Test Datatype (Is-Numeric)##########
foreach($col in $FileCol)
{
foreach ($line in $dif_file) {
$val = $line.$col
$isnum = Is-Numeric($val)
if ($isnum -eq $false) {
$arrResult += $line.$col
$arraycol += $col
}
}
}
[pscustomobject]#{$arraycol = "$arrResult"}| out-file "C:\Users\$env:username\Desktop\Errors1.csv"
####################
can someone guide me right direction?
Thanks
You can try something like this,
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$dif_file_path = "C:\Users\$env:username\desktop\f2.csv"
#Importing CSVs
$dif_file = Import-Csv -Path $dif_file_path -Delimiter ","
#$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
# Use this to specify certain columns
$columns = "ColB", "ColC", "ColD"
foreach($row in $dif_file) {
foreach ($col in $columns) {
if ($col -in $columns) {
if (!(Is-Numeric $row.$col)) {
$row.$col = ""
}
}
}
}
$dif_file | Export-Csv C:\temp\formatted.txt
Look up name of columns as you go
Look up values of each col in each row and if it is not numeric, change to ""
Exported updated file.
I think not displaying columns that have no data creates the challenge here. You can do the following:
$csv = Import-Csv "C:\Users\$env:username\desktop\f2.csv"
$finalprops = [collections.generic.list[string]]#()
$out = foreach ($line in $csv) {
$props = $line.psobject.properties | Where {$_.Value -notmatch '^[\d\.]+$'} |
Select-Object -Expand Name
$props | Where {$_ -notin $finalprops} | Foreach-Object { $finalprops.add($_) }
if ($props) {
$line | Select $props
}
$out | Select-Object ($finalprops | Sort)
Given the nature of Format-Table or tabular output, you only see the properties of the first object in the collection. So if object1 has ColA only, but object2 has ColA and ColB, you only see ColA.
The output order you want is quite different than the input CSV; you're tracking bad text data not by first occurrence, but by column order, which requires some extra steps.
test.csv file contents:
ColA,ColB,ColC,ColD
23,23,ff,100
2.30E+01,34,2.40E+01,23
df,33,ss,df
34,35,36,37
Sample code tested to meet your description:
$csvIn = Import-Csv "$PSScriptRoot\test.csv";
# create working data set with headers in same order as input file
$data = [ordered]#{};
$csvIn[0].PSObject.Properties | foreach {
$data.Add($_.Name, (New-Object System.Collections.ArrayList));
};
# add fields with text data
$csvIn | foreach {
$_.PSObject.Properties | foreach {
if ($_.Value -notmatch '^-?[\d\.]+$') {
$null = $data[$_.Name].Add($_.Value);
}
}
}
$removes = #(); # remove `good` columns with numeric data
$rowCount = 0; # column with most bad values
$data.GetEnumerator() | foreach {
$badCount = $_.Value.Count;
if ($badCount -eq 0) { $removes += $_.Key; }
if ($badCount -gt $rowCount) { $rowCount = $badCount; }
}
$removes | foreach { $data.Remove($_); }
0..($rowCount - 1) | foreach {
$h = [ordered]#{};
foreach ($key in $data.Keys) {
$h.Add($key, $data[$key][$_]);
}
[PSCustomObject]$h;
} |
Export-Csv -NoTypeInformation -Path "$PSScriptRoot\text-data.csv";
output file contents:
"ColA","ColC","ColD"
"2.30E+01","ff","df"
"df","2.40E+01",
,"ss",
#Jawad, Finally I have tried
function Is-Numeric ($Value) {
return $Value -match "^[\d\.]+$"
}
$arrResult = #()
$columns = "ColA","ColB","ColC","ColD"
$dif_file_path = "C:\Users\$env:username\desktop\f1.csv"
$dif_file = Import-Csv -Path $dif_file_path -Delimiter "," |select $columns
$columns = $dif_file | Get-member -MemberType 'NoteProperty' | Select-Object -ExpandProperty 'Name'
foreach($row in $dif_file) {
foreach ($col in $columns) {
$val = $row.$col
$isnum = Is-Numeric($val)
if ($isnum -eq $false) {
$arrResult += $col+ " " +$row.$col
}}}
$arrResult | out-file "C:\Users\$env:username\desktop\Errordata.csv"
I get correct result in my out file, order is very ambiguous like
ColA ss
ColB 5.74E+03
ColA ss
ColC rrr
ColB 3.54E+03
ColD ss
ColB 8.31E+03
ColD cc
any idea to get proper format? thanks
Note: with your suggested code, I get complete source file with all data , not the specific error data.

For each thing in one CSV check for multiple types of matches in another CSV

Sorry if the description is unclear, but I couldn't think of how else to word it.
I have two CSV files:
LocalAdmins.csv -- ColumnA = PC name; ColumnB = username in local admin group
Exempt.csv -- ColumnA = PC name; ColumnB = username allowed to be a local admin
What I'm trying to do is loop through LocalAdmins.csv, and for each one check to see if the PC name shows up in Exempt.csv (or matches any defined naming patterns in that file), and if a match is found, check to see if the local admin username for that PC in LocalAdmins.csv shows up in the list of AllowedUsers for that PC in Exempt.csv.
If the username is NOT in the AllowedUsers list, or if the PC name is not in Exempt.csv, then output the entry from LocalAdmins.csv. Here is what I have so far:
$admins = Import-Csv .\LocalAdmins.csv
$exempt = Import-Csv .\Exempt.csv
$violations = ".\Violations.csv"
foreach ($admin in $admins) {
foreach ($item in $exempt) {
if ($admin.PC -like $item.PC) {
if ($admin.Name -notin ($item.AllowedUsers -split ",")) {
$admin | Export-Csv $violations -Append -NoTypeInformation
}
}
else {
$admin | Export-Csv $violations -Append -NoTypeInformation
}
}
}
The problem is the nested foreach loop generates duplicates, meaning if there are 3 lines in Exempt.csv then a single entry in LocalAdmins.csv will have 3 duplicate outputs (one for each line in Exempt.csv). So the output looks like this:
When it should look like this:
I'm guessing the problem is somewhere in the structure of the loops, but I just need some help figuring out what to tweak. Any input is greatly appreciated!
Not optimized (unique sort by any property should work):
$admins = Import-Csv .\LocalAdmins.csv
$exempt = Import-Csv .\Exempt.csv
$violations = ".\Violations.csv"
$(
foreach ($admin in $admins) {
foreach ($item in $exempt) {
if ($admin.PC -like $item.PC) {
if ($admin.Name -notin ($item.AllowedUsers -split ",")) {
$admin
}
}
else {
$admin
}
}
}
) | Sort-Object -Property PC, Name -Unique |
Export-Csv $violations -Append -NoTypeInformation
With better restrictions of the forEach, there shouldn't be duplicates
and no need to Sort -unique.
Getting input from here-strings
## Q:\Test\2019\02\05\SO_54523868.ps1
$admins = #'
PC,NAME
XYZlaptop,user6
workstationXYZ,user7
computerABC,user8
ABClaptop,user1
'# | ConvertFrom-Csv # .\LocalAdmins.csv
$exempt = #'
PC,AllowedUsers
*laptop,"user1,user2"
computerXYZ,"user3,user4"
workstation*,"user5"
'# | ConvertFrom-Csv # .\Exempt.csv
$violationsFile = ".\Violations.csv"
$violations = foreach ($admin in $admins) {
$violation = $True
foreach ($item in ($exempt|Where-Object {$admin.PC -like $_.PC})){
if ($admin.NAME -in ($item.AllowedUsers -split ',')){
$violation = $False
}
}
if ($violation){$admin}
}
$violations
$violations | Export-Csv $violationsFile -NotypeInformation
## with Doug Finke's ImportExcel module installed, you can directly get the excel file:
#$violations | Export-Excel .\Violatons.xlsx -AutoSize -Show

CSV file - count distinct, group by, sum

I have a file that looks like the following;
- Visitor ID,Revenue,Channel,Flight
- 1234,100,Email,BA123
- 2345,200,PPC,BA112
- 456,150,Email,BA456
I need to produce a file that contains;
The count of distinct Visitor IDs (3)
The total revenue (450)
The count of each Channel
Email 2
PPC 2
The count of each Flight
BA123 1
BA112 1
BA456 1
So far I have the following code, however when executing this on the 350MB file, it takes too long and in some cases breaks the memory limit. As I have to run this function on multiple columns, it is going through the file many times. I ideally need to do this in one file pass.
$file = 'log.txt'
function GroupBy($columnName)
{
$objects = Import-Csv -Delimiter "`t" $file | Group-Object $columnName |
Select-Object #{n=$columnName;e={$_.Group[0].$columnName}}, Count
for($i=0;$i -lt $objects.count;$I++) {
$line += $columnName +"|"+$objects[$I]."$columnName" +"|Count|"+ $objects[$I].'Count' + $OFS
}
return $line
}
$finalOutput += GroupBy "Channel"
$finalOutput += GroupBy "Flight"
Write-Host $finalOutput
Any help would be much appreciated.
Thanks,
Craig
The fact that your are importing the CSV again for each column is what is killing your script. Try to do the loading once, then re-use the data. For example:
$data = Import-Csv .\data.csv
$flights = $data | Group-Object Flight -NoElement | ForEach-Object {[PsCustomObject]#{Flight=$_.Name;Count=$_.Count}}
$visitors = ($data | Group-Object "Visitor ID" | Measure-Object).Count
$revenue = ($data | Measure-Object Revenue -Sum).Sum
$channel = $data | Group-Object Channel -NoElement | ForEach-Object {[PsCustomObject]#{Channel=$_.Name;Count=$_.Count}}
You can display the data like this:
"Revenue : $revenue"
"Visitors: $visitors"
$flights | Format-Table -AutoSize
$channel | Format-Table -AutoSize
This will probably work - using hashmaps.
Pros: It will be faster/use less memory.
Cons: It is less readable
by far than Group-Object, and requires more code.
Make it even less memory-hungry: Read the CSV-file line by line
$data = Import-CSV -Path "C:\temp\data.csv" -Delimiter ","
$DistinctVisitors = #{}
$TotalRevenue = 0
$ChannelCount = #{}
$FlightCount = #{}
$data | ForEach-Object {
$DistinctVisitors[$_.'Visitor ID'] = $true
$TotalRevenue += $_.Revenue
if (-not $ChannelCount.ContainsKey($_.Channel)) {
$ChannelCount[$_.Channel] = 0
}
$ChannelCount[$_.Channel] += 1
if (-not $FlightCount.ContainsKey($_.Flight)) {
$FlightCount[$_.Flight] = 0
}
$FlightCount[$_.Flight] += 1
}
$DistinctVisitorsCount = $DistinctVisitors.Keys | Measure-Object | Select-Object -ExpandProperty Count
Write-Output "The count of distinc Visitor IDs $DistinctVisitorsCount"
Write-Output "The total revenue $TotalRevenue"
Write-Output "The Count of each Channel"
$ChannelCount.Keys | ForEach-Object {
Write-Output "$_ $($ChannelCount[$_])"
}
Write-Output "The count of each Flight"
$FlightCount.Keys | ForEach-Object {
Write-Output "$_ $($FlightCount[$_])"
}

Combining like objects in an array

I am attempting to analyze a group of text files (MSFTP logs) and do counts of IP addresses that have submitted bad credentials. I think I have it worked out except I don't think that the array is passing to/from the function correctly. As a result, I get duplicate entries if the same IP appears in multiple log files. What am I doing wrong?
Function LogBadAttempt($FTPLog,$BadPassesArray)
{
$BadPassEx="PASS - 530"
Foreach($Line in $FTPLog)
{
if ($Line -match $BadPassEx)
{
$IP=($Line.Split(' '))[1]
if($BadPassesArray.IP -contains $IP)
{
$CurrentIP=$BadPassesArray | Where-Object {$_.IP -like $IP}
[int]$CurrentCount=$CurrentIP.Count
$CurrentCount++
$CurrentIP.Count=$CurrentCount
}else{
$info=#{"IP"=$IP;"Count"='1'}
$BadPass=New-Object -TypeName PSObject -Property $info
$BadPassesArray += $BadPass
}
}
}
return $BadPassesArray
}
$BadPassesArray=#()
$FTPLogs = Get-Childitem \\ftpserver\MSFTPSVC1\test
$Result = ForEach ($LogFile in $FTPLogs)
{
$FTPLog=Get-Content ($LogFile.fullname)
LogBadAttempt $FTPLog
}
$Result | Export-csv C:\Temp\test.csv -NoTypeInformation
The result looks like...
Count IP
7 209.59.17.20
20 209.240.83.135
18441 209.59.17.20
13059 200.29.3.98
and would like it to combine the entries for 209.59.17.20
You're making this way too complicated. Process the files in a pipeline and use a hashtable to count the occurrences of each IP address:
$BadPasswords = #{}
Get-ChildItem '\\ftpserver\MSFTPSVC1\test' | Get-Content | ? {
$_ -like '*PASS - 530*'
} | % {
$ip = ($_ -split ' ')[1]
$BadPasswords[$ip]++
}
$BadPasswords.GetEnumerator() |
select #{n='IP';e={$_.Name}}, #{n='Count';e={$_.Value}} |
Export-Csv 'C:\Temp\test.csv' -NoType

compare two csv using powershell and return matching and non-matching values

I have two csv files, i want to check the users in username.csv matches with userdata.csv copy
to output.csv. If it does not match return the name alone in the output.csv
For Ex: User Data contains 3 columns
UserName,column1,column2
Hari,abc,123
Raj,bca,789
Max,ghi,123
Arul,987,thr
Prasad,bxa,324
username.csv contains usernames
Hari
Rajesh
Output.csv should contain
Hari,abc,123
Rajesh,NA,NA
How to achieve this. Thanks
Sorry for that.
$Path = "C:\PowerShell"
$UserList = Import-Csv -Path "$($path)\UserName.csv"
$UserData = Import-Csv -Path "$($path)\UserData.csv"
foreach ($User in $UserList)
{
ForEach ($Data in $UserData)
{
If($User.Username -eq $Data.UserName)
{
# Process the data
$Data
}
}
}
This returns only matching values. I also need to add the non-matching values in output
file. Thanks.
something like this will work:
$Path = "C:\PowerShell"
$UserList = Import-Csv -Path "$($path)\UserName.csv"
$UserData = Import-Csv -Path "$($path)\UserData.csv"
$UserOutput = #()
ForEach ($name in $UserList)
{
$userMatch = $UserData | where {$_.UserName -eq $name.usernames}
If($userMatch)
{
# Process the data
$UserOutput += New-Object PsObject -Property #{UserName =$name.usernames;column1 =$userMatch.column1;column2 =$userMatch.column2}
}
else
{
$UserOutput += New-Object PsObject -Property #{UserName =$name.usernames;column1 ="NA";column2 ="NA"}
}
}
$UserOutput | ft
It loops through each name in the user list. Line 9 does a search of the userdata CSV for a matching user name if it finds it it adds the user data for that user to the output if no match is found it adds the user name to the output with NA in both columns.
had to change your userList csv:
usernames
Hari
Rajesh
expected output:
UserName column1 column2
-------- ------- -------
Hari abc 123
Rajesh NA NA
I had a similar situation, where I needed a "changed record collection" holding the entire record when the current record was either new or had any changes when compared to the previous record. This was my code:
# get current and previous CSV
$current = Import-Csv -Path $current_file
$previous = Import-Csv -Path $previous_file
# collection with new or changed records
$deltaCollection = New-Object Collections.Generic.List[System.Object]
:forEachCurrent foreach ($row in $current) {
$previousRecord = $previous.Where( { $_.Id -eq $row.Id } )
$hasPreviousRecord = ($null -ne $previousRecord -and $previousRecord.Count -eq 1)
if ($hasPreviousRecord -eq $false) {
$deltaCollection.Add($current)
continue forEachCurrent
}
# check if value of any property is changed when compared to the previous
:forEachCurrentProperty foreach ($property in $current.PSObject.Properties) {
$columnName = $property.Name
$currentValue = if ($null -eq $property.Value) { "" } else { $property.Value }
$previousValue = if ($hasPreviousRecord) { $previousRecord[0]."$columnName" } else { "" }
if ($currentValue -ne $previousValue -or $hasPreviousRecord -eq $false) {
$deltaCollection.Add($currentCenter)
continue forEachCurrentProperty
}
}
}