I have a CSV file (file1) that looks like: (User dirs and the size)
Initials,Size
User1,10
User2,100
User3,131
User4,140
I have another CSV file (file2) that looks like: (VIP users)
User2
User4
Now what I'm trying to do, is to update file1, so it looks like:
User1,10
User3,131
User2 and User4 is removed because they are in file2
I can get them removed, but at the same time I remove the size for all users, so my output is only the Users:
User1
User3
My code:
$SourcePath = "\\server1\info\SYSINFO\UsrSize"
$DestinationFile = "\\server1\info\SYSINFO\UsrSize\OverLimit\UsersOverLimit1.log"
$VIP_Exclusion_List = "\\server1\info\SYSINFO\UsrSize\OverLimit\_VIP_EXCLUSION_LIST.txt"
$Database = "\\server1\info\SYSINFO\UsrSize\OverLimit\_UsersOverLimitDATABASE.log"
$INT_SizeToLookFor = 100
dir $SourcePath -Filter usr*.txt | import-csv -delimiter "`t" |
Where-Object {[INT] $_."Size excl. Backup/Pst" -ge $INT_SizeToLookFor} |
Select-Object Initials,"Size excl. Backup/Pst" | convertto-csv -NoTypeInformation | % { $_ -replace '"', ""} | out-file $DestinationFile ;
$Userlist = import-csv $DestinationFile | Select-Object Initials |
convertto-csv -NoTypeInformation | % { $_ -replace '"', ""};
compare-object ($Userlist) (get-content $VIP_Exclusion_List) |
select-object inputObject | convertto-csv -NoTypeInformation |
% { $_ -replace '"', ""} | out-file "\\server1\info\SYSINFO\UsrSize\OverLimit\UsersOverLimitThisTime.log";
If the files are small-ish and you don't care too much about performance, then the following would be a trivial way:
$data = Import-Csv file1
$vips = Import-Csv file2
$data = $data | ?{ $vips -notcontains $_.Initials }
$data | Export-Csv file1_new -NoTypeInformation
A faster way would be to add the names to remove to a set, but given the things you're talking about here I doubt you'll get into the range of a few thousand or million users.
I solved it using this code:
$ArrayVIP = get-content $VIP_Exclusion_List
select-string $DestinationFile -pattern $ArrayVIP -notmatch |
select -expand line |
out-file $DestinationFile
Taken from here: Removing lines from a CSV
Related
I'm trying (badly) to work through combining CSV files into one file and prepending a column that contains the file name. I'm new to PowerShell, so hopefully someone can help here.
I tried initially to do the well documented approach of using Import-Csv / Export-Csv, but I don't see any options to add columns.
Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv CombinedFile.txt -UseQuotes Never -NoTypeInformation -Append
Next I'm trying to loop through the files and append the name, which kind of works, but for some reason this stops after the first row is generated. Since it's not a CSV process, I have to use the switch to skip the first title row of each file.
$getFirstLine = $true
Get-ChildItem -Filter *.csv | Where-Object {$_.Name -NotMatch "Combined.csv"} | foreach {
$filePath = $_
$collection = Get-Content $filePath
foreach($lines in $collection) {
$lines = ($_.Basename + ";" + $lines)
}
$linesToWrite = switch($getFirstLine) {
$true {$lines}
$false {$lines | Select -Skip 1}
}
$getFirstLine = $false
Add-Content "Combined.csv" $linesToWrite
}
This is where the -PipelineVariable parameter comes in real handy. You can set a variable to represent the current iteration in the pipeline, so you can do things like this:
Get-ChildItem -Filter *.csv -PipelineVariable File | Where-Object {$_.Name -NotMatch "Combined.csv"} | ForEach-Object { Import-Csv $File.FullName } | Select *,#{l='OriginalFile';e={$File.Name}} | Export-Csv Combined.csv -Notypeinfo
Merging your CSVs into one and adding a column for the file's name can be done as follows, using a calculated property on Select-Object:
Get-ChildItem -Filter *.csv | ForEach-Object {
$fileName = $_.Name
Import-Csv $_.FullName | Select-Object #{
Name = 'FileName'
Expression = { $fileName }
}, *
} | Export-Csv path/to/merged.csv -NoTypeInformation
I have this CSV file that I kind of do a lot to. My most recent task is to add a summary sheet.
With that said I have a CSV file I pull from a website and send through lot of checks. Code Below:
$Dups = import-csv 'C:\Working\cylrpt.csv' | Group-Object -Property 'Device Name'| Where-Object {$_.count -ge 2} | ForEach-Object {$_.Group} | Select #{Name="Device Name"; Expression={$_."Device Name"}},#{Name="MAC"; Expression={$_."Mac Addresses"}},Zones,#{Name="Agent"; Expression={$_."Agent Version"}},#{Name="Status"; Expression={$_."Is online"}}
$Dups | Export-CSV $working\temp\01-Duplicates.csv -NoTypeInformation
$csvtmp = Import-CSV $working\cylrpt.csv | Select #{N='Device';E={$_."Device Name"}},#{N='OS';E={$_."OS Version"}},Zones,#{N='Agent';E={$_."Agent Version"}},#{N='Active';E={$_."Is Online"}},#{N='Checkin';E={[DateTime]$_."Online Date"}},#{N='Checked';E={[DateTime]$_."Offline Date"}},Policy
$csvtmp | %{
if ($_.Zones -eq ""){$_.Zones = "Unzoned"}
}
$csvtmp | Export-Csv $working\cy.csv -NoTypeInformation
import-csv $working\cy.csv | Select Device,policy,OS,Zones,Agent,Active,Checkin,Checked | % {
$_ | Export-CSV -path $working\temp\$($_.Zones).csv -notypeinformation -Append
}
The first check is for duplicates, I used separate lines of code for this because I wanted to create a CSV for duplicates.
The second check backfills all blank cells under the Zones column with "UnZoned"
The third thing is does is goes through the entire CSV file and creates a CSV file for each Zone
So this is my base. I need to add another CSV file for a Summary of the Zone information. The Zones are in the format of XXX-WS or XXX-SRV, where XXX can be between 3 and 17 letters.
I would like the Summary sheet to look like this
ABC ###
ABC-WS ##
ABC-SRV ##
DEF ###
DEF-WS ##
DEF-SRV ##
My thoughts are to either do the count from the original CSV file or to count the number of lines in each CSV file and subtract 1, for the header row.
Now the Zones are dynamic so I can't just say I want ZONE XYZ, because that zone may not exist.
So what I need is to be able to either count the like zone type in the original file and either output that to an array or file, that would be my preferred method to give the number of items with the same zone name. I just don't know how to write it to look for and count matching variables. Here is the code I'm trying to use to get the count:
import-csv C:\Working\cylrpt.csv | Group-Object -Property 'Zones'| ForEach-Object {$_.Group} | Select #{N='Device';E={$_."Device Name"}},Zones | % {
$Znum = ($_.Zones).Count
If ($Znum -eq $null) {
$Znum = 1
} else {
$Znum++
}
}
$Count = ($_.Zones),$Znum | Out-file C:\Working\Temp\test2.csv -Append
Here is the full code minus the report key:
$cylURL = "https://protect.cylance.com/Reports/ThreatDataReportV1/devices/"
$working = "C:\Working"
Remove-item -literalpath "\\?\C:\Working\Cylance Report.xlsx"
Invoke-WebRequest -Uri $cylURL -outfile $working\cylrpt.csv
$Dups = import-csv 'C:\Working\cylrpt.csv' | Group-Object -Property 'Device Name'| Where-Object {$_.count -ge 2} | ForEach-Object {$_.Group} | Select #{Name="Device Name"; Expression={$_."Device Name"}},#{Name="MAC"; Expression={$_."Mac Addresses"}},Zones,#{Name="Agent"; Expression={$_."Agent Version"}},#{Name="Status"; Expression={$_."Is online"}}
$Dups | Export-CSV $working\temp\01-Duplicates.csv -NoTypeInformation
$csvtmp = Import-CSV $working\cylrpt.csv | Select #{N='Device';E={$_."Device Name"}},#{N='OS';E={$_."OS Version"}},Zones,#{N='Agent';E={$_."Agent Version"}},#{N='Active';E={$_."Is Online"}},#{N='Checkin';E={[DateTime]$_."Online Date"}},#{N='Checked';E={[DateTime]$_."Offline Date"}},Policy
$csvtmp | %{
if ($_.Zones -eq ""){$_.Zones = "Unzoned"}
}
$csvtmp | Export-Csv $working\cy.csv -NoTypeInformation
import-csv $working\cy.csv | Select Device,policy,OS,Zones,Agent,Active,Checkin,Checked | % {
$_ | Export-CSV -path $working\temp\$($_.Zones).csv -notypeinformation -Append
}
cd $working\temp;
Rename-Item "Unzoned.csv" -NewName "02-Unzoned.csv"
Rename-Item "Systems-Removal.csv" -NewName "03-Systems-Removal.csv"
$CSVFiles = Get-ChildItem -path $working\temp -filter *.csv
$Excel = "$working\Cylance Report.xlsx"
$Num = $CSVFiles.Count
Write-Host "Found the following Files: ($Num)"
ForEach ($csv in $CSVFiles) {
Write-host "Merging $CSVFiles.Name"
}
$EXc1 = New-Object -ComObject Excel.Application
$Exc1.SheetsInNewWorkBook = $CSVFiles.Count
$XLS = $EXc1.Workbooks.Add()
$Sht = 1
ForEach ($csv in $CSVFiles) {
$Row = 1
$Column = 1
$WorkSHT = $XLS.WorkSheets.Item($Sht)
$WorkSHT.Name = $csv.Name -Replace ".csv",""
$File = (Get-Content $csv)
ForEach ($line in $File) {
$LineContents = $line -split ',(?!\s*\w+")'
ForEach ($Cell in $LineContents) {
$WorkSHT.Cells.Item($Row,$Column) = $Cell -Replace '"',''
$Column++
}
$Column = 1
$Row++
}
$Sht++
}
$Output = $Excel
$XLS.SaveAs($Output)
$EXc1.Quit()
Remove-Item *.csv
cd ..\
Found the solution
$Zcount = import-csv C:\Working\cylrpt.csv | where Zones -ne "$null" | select #{N='Device';E={$_."Device Name"}},Zones | group Zones | Select Name,Count
$Zcount | Export-Csv -path C:\Working\Temp\01-Summary.csv -NoTypeInformation
I have a CSV file which contains many lines and I want to take the text between <STR_0.005_Long>, and µm,5.000µm.
Example line from the CSV:
Straightness(Up/Down) <STR_0.005_Long>,4.444µm,5.000µm,,Pass,2.476µm,1.968µm,25,0.566µm,0.720µm
This is the script that I am trying to write:
$arr = #()
$path = "C:\Users\georgi\Desktop\5\test.csv"
$pattern = "(?<=.*<STR_0.005_Long>,)\w+?(?=µm,5.000µm*)"
$Text = Get-Content $path
$Text.GetType() | Format-Table -AutoSize
$Text[14] | Foreach {
if ([Regex]::IsMatch($_, $pattern)) {
$arr += [Regex]::Match($_, $pattern)
Out-File C:\Users\georgi\Desktop\5\test.txt -Append
}
}
$arr | Foreach {$_.Value} | Out-File C:\Users\georgi\Desktop\5\test.txt -Append
Use a Where-Object filter with your regular expression and simply output the match to the output file:
Get-Content $path |
Where-Object { $_ -match $pattern } |
ForEach-Object { $matches[0] } |
Out-File 'C:\Users\georgi\Desktop\5\test.txt'
Of course, since you have a CSV, you could simply use Import-Csv and export the value of that particular column:
Import-Csv $path | Select-Object -Expand 'column_name' |
Out-File 'C:\Users\georgi\Desktop\5\test.txt'
Replace column_name with the actual name of the column. If the CSV doesn't have a column header you can specify one via the -Header parameter:
Import-Csv $path -Header 'col1','col2','col3',... |
Select-Object -Expand 'col2' |
Out-File 'C:\Users\georgi\Desktop\5\test.txt'
I am merging a lot of large CSV files, e.g. while skipping the leading junk and appending the filename to each line:
Get-ChildItem . | Where Name -match "Q[0-4]20[0-1][0-9].csv" |
Foreach-Object {
$file = $_.BaseName
Get-Content $_.FullName | select-object -skip 3 | % {
"$_,${file}" | Out-File -Append temp.csv -Encoding ASCII
}
}
In PowerShell this is incredibly slow even on an i7/16GB machine (~5 megabyte/minute). Can I make it more efficient or should I just switch to e.g. Python?
Get-Content / Set-Content are terrible with larger files. Streams are a good alternative when performance is key. So with that in mind lets use one to read in each file and another to write out the results.
$rootPath = "C:\temp"
$outputPath = "C:\test\somewherenotintemp.csv"
$streamWriter = [System.IO.StreamWriter]$outputPath
Get-ChildItem $rootPath -Filter "*.csv" -File | ForEach-Object{
$file = $_.BaseName
[System.IO.File]::ReadAllLines($_.FullName) |
Select-Object -Skip 3 | ForEach-Object{
$streamWriter.WriteLine(('{0},"{1}"' -f $_,$file))
}
}
$streamWriter.Close(); $streamWriter.Dispose()
Create a writing stream $streamWriter to output the edited lines in each file. We could read in the file and write the file in larger batches, which would be faster, but we need to ignore a few lines and make changes to each one so processing line by line is simpler. Avoid writing anything to console during this time as it will just slow everything down.
What '{0},"{1}"' -f $_,$file does is quote that last "column" that is added in case the basename contains spaces.
Measure-Command -Expression {
Get-ChildItem C:\temp | Where Name -like "*.csv" | ForEach-Object {
$file = $_.BaseName
Get-Content $_.FullName | select-object -Skip 3 | ForEach-Object {
"$_,$($file)" | Out-File -Append C:\temp\t\tempe1.csv -Encoding ASCII -Force
}
}
} # TotalSeconds : 12,0526802 for 11415 lines
If you first put everything into an array in memory, things go a lot faster:
Measure-Command -Expression {
$arr = #()
Get-ChildItem C:\temp | Where Name -like "*.csv" | ForEach-Object {
$file = $_.BaseName
$arr += Get-Content $_.FullName | select-object -Skip 3 | ForEach-Object {
"$_,$($file)"
}
}
$arr | Out-File -Append C:\temp\t\tempe2.csv -Encoding ASCII -Force
} # TotalSeconds : 0,8197193 for 11415 lines
EDIT: Fixed it so that your filename was added to each row.
To avoid -Append to ruin the performance of your script you could use a buffer array variable:
# Initialize buffer
$csvBuffer = #()
Get-ChildItem *.csv | Foreach-Object {
$file = $_.BaseName
$content = Get-Content $_.FullName | Select-Object -Skip 3 | %{
"$_,${file}"
}
# Populate buffer
$csvBuffer += $content
# Write buffer to disk if it contains 5000 lines or more
$csvBufferCount = $csvBuffer | Measure-Object | Select-Object -ExpandProperty Count
if( $csvBufferCount -ge 5000 )
{
$csvBuffer | Out-File -Path temp.csv -Encoding ASCII -Append
$csvBuffer = #()
}
}
# Important : empty the buffer remainder
if( $csvBufferCount -gt 0 )
{
$csvBuffer | Out-File -Path temp.csv -Encoding ASCII -Append
$csvBuffer = #()
}
I've figured out how to compare single columns in two files, but I cant figure out how to compare two files, with one column in the first and multiple columns in the second file. Both containing emails.
First file.csv (contains single column with emails)
john#email.com
jack#email.com
jill#email.com
Second file.csv (contains multiple column with emails)
john#email.nl,john#email.eu,john#email.com
jill#email.se,jill#email.com,jill#email.us
By comparing I would like to output, the difference. This would result in.
Output.csv
jack#email.com
Anyone able to help me? :)
Single columns comparison and output difference
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Since the first file already contains email addresses per column, you can import it right away.
Take the second file and split the strings containing several addresses.
A new array with seperate addresses will be generated.
Judging from your output, you only seek addresses that are within the first csv but not in the second.
Your code could look like this:
$firstFile = Get-Content 'FirstFile.csv'
$secondFile = (Get-Content 'SecondFile.csv').Split(',')
foreach ($item in $firstFile) {
if ($item -notin $secondFile) {
$item | Export-Csv output.csv -Append -NoTypeInformation
}
}
If you want to maintain your code, can you consider a script like:
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
Rename-Item .\users-emails.csv users-emails.csv.bk
Get-Content .\users-emails.csv.bk).replace(',', "`r`n") | Set-Content .\users-emails.csv
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Remove-Item .\users-emails.csv
Rename-Item .\users-emails.csv.bk users-emails.csv
or, more simplest
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
Get-Content .\users-emails.csv).replace(',', "`r`n") | Set-Content .\users-emails.csv.bk
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv.bk | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Remove-Item .\users-emails.csv.bk
None of the suggestions so far works :(
Still hoping :)
Will delete comment when happy :p
Can you try this?
$One = (Get-Content .\FirstFile.csv).Split(',')
$Two = (Get-Content .\SecondFile.csv).Split(',')
$CsvPath = '.\Output.csv'
$Diff = #()
(Compare-Object ($One | Sort-Object) ($two | Sort-Object)| `
Where-Object {$_.SideIndicator -eq '<='}).inputobject | `
ForEach-Object {$Diff += New-Object PSObject -Property #{email=$_}}
$Diff | Export-Csv -Path $CsvPath -NoTypeInformation
Output.csv will contain entries that exist in FirstFile but not SecondFIle.