Remove row if term found in a CSV column - powershell

I need to remove a row from a table if I find a certain kind of email address in my CSV table. There are multiple email fields but I only want to do this for one email column, not any which is what I think the code below would currently do.
How can I specify that?
$data = foreach ($line in Get-Content D:\Data\info.csv) {
if ($line -Like '*#LS*' -or $line -Like '*#gmail*') {
} else {
$line
}
}
$data | Set-Content D:\Data\info.csv -Force

Use a Where-Object filter:
$file = 'D:\Data\info.csv'
(Get-Content $file) | Where-Object {
$_ -notlike '*#LS*' -and $_ -notlike '*#gmail*'
} | Set-Content $file
If you want to check a particular field instead of the entire line use Import-Csv/Export-Csv instead of Get-Content/Set-Content:
$file = 'D:\Data\info.csv'
(Import-Csv $file) | Where-Object {
$_.FOO -notlike '*#LS*' -and $_.FOO -notlike '*#gmail*'
} | Export-Csv $file -NoType
Replace FOO with the actual field name.
Instead of the wildcard matches you could also use string operations:
-not $_.FOO.Contains('#LS') -and -not $_.FOO.Contains('#gmail')
or a single regular expression:
$_.FOO -notmatch '#(LS|gmail)'

To complement Ansgar Wiechers' helpful answer:
Select-String allows for a concise solution (PSv3+ syntax) with whole-line matching - as advised by Ansgar, using Import-Csv and limiting matching to the field of interest is preferable:
(Select-String -NotMatch '#LS', '#gmail' D:\Data\info.csv).Line |
Set-Content D:\Data\info.csv -Force
Note how the Select-String call is enclosed in (...) to ensure that the input file is processed in full up front, allowing you to write back to the same file in the same pipeline.
Note, however, that that loads all matching lines into memory at once, and that there's a small risk of data loss if writing back to the input file gets interrupted - both issues could be remedied with more effort.

Related

How to strip out leading time stamp?

I have some log files.
Some of the UPDATE SQL statements are getting errors, but not all.
I need to know all the statements that are getting errors so I can find the pattern of failure.
I can sort all the log files and get the unique lines, like this:
$In = "C:\temp\data"
$Out1 = "C:\temp\output1"
$Out2 = "C:\temp\output2"
Remove-Item $Out1\*.*
Remove-Item $Out2\*.*
# Get the log files from the last 90 days
Get-ChildItem $In -Filter *.log | Where-Object {$_.LastWriteTime -gt (Get-Date).AddDays(-90)} |
Foreach-Object {
$content = Get-Content $_.FullName
#filter and save content to a file
$content | Where-Object {$_ -match 'STATEMENT'} | Sort-Object -Unique | Set-Content $Out1\$_
}
# merge all the files, sort unique, write to output
Get-Content $Out2\* | Sort-Object -Unique | Set-Content $Out3\output.txt
Works great.
But some of the logs have a leading date-time stamp in the leading 24 char. I need to strip that out, or all those lines are unique.
If it helps, all the files either have the leading timestamp or they don't. The lines are not mixed within a single file.
Here is what I have so far:
# Get the log files from the last 90 days
Get-ChildItem $In -Filter *.log | Where-Object {$_.LastWriteTime -gt (Get-Date).AddDays(-90)} |
Foreach-Object {
$content = Get-Content $_.FullName
#filter and save content to a file
$s = $content | Where-Object {$_ -match 'STATEMENT'}
# strip datetime from front if exists
If (Where-Object {$s.Substring(0,1) -Match '/d'}) { $s = $s.Substring(24) }
$s | Sort-Object -Unique | Set-Content $Out1\$_
}
# merge all the files, sort unique, write to output
Get-Content $Out1\* | Sort-Object -Unique | Set-Content $Out2\output.txt
But it just write the lines out without stripping the leading chars.
Regex /d should be \d (\ is the escape character in general, and character-class shortcuts such as d for a digit[1] must be prefixed with it).
Use a single pipeline that passes the Where-Object output to a ForEach-Object call where you can perform the conditional removal of the numeric prefix.
$content |
Where-Object { $_ -match 'STATEMENT' } |
ForEach-Object { if ($_[0] -match '\d') { $_.Substring(24) } else { $_ } } |
Set-Content $Out1\$_
Note: Strictly speaking, \d matches everything that the Unicode standard considers a digit, not just the ASCII-range digits 0 to 9; to limit matching to the latter, use [0-9].

Where-Object leaving blank rows

I'm again stuck on something that should be so simple. I have a CSV file in which I need to do a few string modifications and export it back out. The data looks like this:
FullName
--------
\\server\project\AOI
\\server\project\AOI\Folder1
\\server\project\AOI\Folder2
\\server\project\AOI\Folder3\User
I need to do the following:
Remove the "\\server\project" from each line but leave the rest of the line
Delete all rows which do not have a Folder (e.g., in the example above, the first row would be deleted but the other three would remain)
Delete any row with the word "User" in the path
Add a column called T/F with a value of "FALSE" for each record
Here is my initial attempt at this:
Get-Content C:\Folders.csv |
% {$_.replace('\\server\project\','')} |
Where-Object {$_ -match '\\'} |
#Removes User Folders rows from CSV
Where-Object {$_ -notmatch 'User'} |
Out-File C:\Folders-mod.csv
This works to a certain extent, except it deletes my header row and I have not found a way to add a column using Get-Content. For that, I have to use Import-Csv, which is fine, but it seems inefficient to be constantly reloading the same file. So I tried rewriting the above using Import-Csv instead of Get-Content:
$Folders = Import-Csv C:\Folders.csv
foreach ($Folder in $Folders) {
$Folder.FullName = $Folder.FullName.Replace('\\server\AOI\', '') |
Where-Object {$_ -match '\\'} |
Where-Object {$_ -notmatch 'User Files'}
}
$Folders | Export-Csv C:\Folders-mod.csv -NoTypeInformation
I haven't added the coding for adding the new column yet, but this keeps the header. However, I end up with a bunch of empty rows where the Where-Object deletes the line, and the only way I can find to get rid of them is to run the output file through a Get-Content command. This all seems overly complicated for something that should be simple.
So, what am I missing?
Thanks to TheMadTechnician for pointing out what I was doing wrong. Here is my final script (with additional column added):
$Folders= Import-CSV C:\Folders.csv
ForEach ($Folder in $Folders)
{
$Folder.FullName = $Folder.FullName.replace('\\server\project\','')
}
$Folders | Where-Object {$_ -match '\\' -and $_ -notmatch 'User'} |
Select-Object *,#{Name='T/F';Expression={'FALSE'}} |
Export-CSV C:\Folders.csv -NoTypeInformation
I would do this with a Table Array and pscustomobject.
#Create an empty Array
$Table = #()
#Manipulate the data
$Fullname = Get-Content C:\Folders.csv |
ForEach-Object {$_.replace('\\server\project\', '')} |
Where-Object {$_ -match '\\'} |
#Removes User Folders rows from CSV
Where-Object {$_ -notmatch 'User'}
#Define custom objects
Foreach ($name in $Fullname) {
$Table += [pscustomobject]#{'Fullname' = $name; 'T/F' = 'FALSE'}
}
#Export results to new csv
$Table | Export-CSV C:\Folders-mod.csv -NoTypeInformation
here's yet another way to do it ... [grin]
$FileList = #'
FullName
\\server\project\AOI
\\server\project\AOI\Folder1
\\server\project\AOI\Folder2
\\server\project\AOI\Folder3\User
'# | ConvertFrom-Csv
$ThingToRemove = '\\server\project'
$FileList |
Where-Object {
# toss out any blank lines
$_ -and
# toss out any lines with "user" in them
$_ -notmatch 'User'
} |
ForEach-Object {
[PSCustomObject]#{
FullName = $_.FullName -replace [regex]::Escape($ThingToRemove)
'T/F' = $False
}
}
output ...
FullName T/F
-------- ---
\AOI False
\AOI\Folder1 False
\AOI\Folder2 False
notes ...
putting a slash in the property name is ... icky [grin]
that requires wrapping the property name in quotes every time you need to access it. try another name - perhaps "Correct".
you can test for blank array items [lines] with $_ all on its own
the [regex]::Escape() stuff is really quite handy

Can I collapse this into a single line of code using the pipeline?

I'm querying a highly structured file system. I need to look at the nodes that are at the 14th level of the tree. I've come up with the following based on some other posts on querying filesystems and my own research:
$lines = Get-ChildItem "\\ad1hfdahp001\D$\software\anthill\var\artifacts" -Recurse -Force -EA SilentlyContinue |
Where-Object { $_ -is [System.IO.DirectoryInfo] } |
Select -ExpandProperty FullName
$paths=#()
foreach ($d in $lines) {
$a = $d -split "\\"
if ($a.count -eq 14) {$paths += $d}
}
Is there a way to add the code in the foreach block (or part of it) to the first statement so that $lines only contains the paths with 14 levels? I know this is trivial, but I'm processing a huge amount of data I feel as though adding this as a filter to the pipeline in the first statement would be much more efficient than dumping all the directories into an array and then reprocessing the array to select the 14-level entries.
Sure. Simply add
... | Where-Object { #($_ -split '\\').Count -eq 14 }
after the Select-Object.

Powershell. Trying to loop using Import-csv

I am converting code that currently parse a large csv file once for each agency. Below is what one line looks like:
Import-Csv $HostList | Where-Object {$_."IP Address" -Match "^192.168.532.*" -or $_Domain -eq "MYDOMAIN"`
-and (get-date $_.discovery_timestamp) -gt $PreviousDays} | select-object Hostname","Domain","IP Address","discovery_timestamp","some_version" | `
Export-Csv -NoTypeInformation -Path $out_data"\OBS_"$Days"_days.txt"
write-host "OBS_DONE"
I have about 30 of these.
I want to parse the csv file once, possibly using foreach and import.csv.
I thought I could do something like:
$HostFile = Import-csv .\HostList.csv
foreach ($line in $HostFile)
{
Where-Object {$_."IP Address" -Match "^172.31.52.*"}
write-host $line
#| Export-Csv -NoTypeInformation -Path H:\Case_Scripts\test.csv
I've tried many permutations of the above, and it never is matching on the "Where-Object" like it does on the example functioning script above.
Any guidance and learning opportunities are appreciated.
My good man, you need to be introduced to the Switch cmdlet. This will sort for all of your companies at once.
$CompanyA = #()
$CompanyB = #()
$CompanyD = #()
$Unknown = #()
Switch(Import-CSV .\HostList.csv){
{(($_."IP Address" -match "^192.168.532.*") -or ($_Domain -eq "CompanyA.com")) -and ((get-date $_.discovery_timestamp) -gt $PreviousDays)} {$CompanyA += $_; Continue}
{(($_."IP Address" -match "^192.26.19.*") -or ($_Domain -eq "CompanyB.net")) -and ((get-date $_.discovery_timestamp) -gt $PreviousDays)} {$CompanyB += $_; Continue}
{(($_."IP Address" -match "^94.8.222.*") -or ($_Domain -eq "CompanyC.org")) -and ((get-date $_.discovery_timestamp) -gt $PreviousDays)} {$CompanyC += $_; Continue}
default {$Unknown += $_}
}
$CompanyA | Export-Csv $out_data"\CompanyA"$Days"_days.txt" -NoType
$CompanyB | Export-Csv $out_data"\CompanyB"$Days"_days.txt" -NoType
$CompanyC | Export-Csv $out_data"\CompanyC"$Days"_days.txt" -NoType
If($Unknown.count -gt 0){Write-Host $Unknown.count + " Entries Did Not Match Any Company" -Fore Red
$Unknown}
That will import the CSV, and for each entry try to match it against the criteria for each of the three lines. If it matches the criteria in the first ScriptBlock, it will perform the action in the second ScriptBlock (add that entry to one of the three arrays I created first). Then it outputs each array to it's own text file in CSV format as you had done in your script. The ;Continue just makes it so it stops trying to match once it finds a valid match, and continue's to the next record. If it can't match any of them it will default to adding it to the Unknown array, and at the end it checks if there are any in there it warns the host and lists the unmatched entries.
Edit: Switch's -RegEx argument... Ok, so the purpose of that is if you want a simple regex match such as:
Switch -regex ($ArrayOfData){
".*#.+?\..{2,5}" {"It's an email address"}
"\d{3}(?:\)|\.|-)\d{3}(?:\.|-)\d{4}" {"It's a phone number"}
"\d{3}-\d{2}-\d{4}" {"Probably a social security number"}
}
What this doesn't allow for is -and, or -or statements. If you use a scriptblock to match against (like I did in my top example) you can use the -match operator, which performs a regex match by default, so you can still use regex without having to use the -regex argument for Switch.
Where-Object should have a collection of objects passed to it. In your second script block this isn't happening, so in the filter the $_ variable will be empty.
There are a couple of ways to fix this - the first one that comes to mind is to replace Where-Object with an 'if' statement. For example
if ($line."IP Address" -Match "^172.31.52.*") { write-output $line }

Is there a PowerShell "string does not contain" cmdlet or syntax?

In PowerShell I'm reading in a text file. I'm then doing a Foreach-Object over the text file and am only interested in the lines that do NOT contain strings that are in $arrayOfStringsNotInterestedIn.
What is the syntax for this?
Get-Content $filename | Foreach-Object {$_}
If $arrayofStringsNotInterestedIn is an [array] you should use -notcontains:
Get-Content $FileName | foreach-object { `
if ($arrayofStringsNotInterestedIn -notcontains $_) { $) }
or better (IMO)
Get-Content $FileName | where { $arrayofStringsNotInterestedIn -notcontains $_}
You can use the -notmatch operator to get the lines that don't have the characters you are interested in.
Get-Content $FileName | foreach-object {
if ($_ -notmatch $arrayofStringsNotInterestedIn) { $) }
To exclude the lines that contain any of the strings in $arrayOfStringsNotInterestedIn, you should use:
(Get-Content $FileName) -notmatch [String]::Join('|',$arrayofStringsNotInterestedIn)
The code proposed by Chris only works if $arrayofStringsNotInterestedIn contains the full lines you want to exclude.