use powershell to grep through many large files - powershell

I have about 70 CSV files, all 1GB or so in size. On a windows environment I need to grep through them all to find specific lines.
My search file called "input.txt" contains these strings:
CG234242424
CG234234234
CG234234235
In a Linux environment I would do this:
for line in `cat input.txt`; do grep $line *.csv >> output.txt; done;
How would I do this in Powershell?
Background - I'm a Linux guy, This is a once off request by the business users for an audit.

I'd build a regular expression from the strings in the input file, and then use Select-String to check the CSV files for the presence of that pattern:
$re = (Get-Content 'input.txt' | ForEach-Object { [regex]::Escape($_) }) -join '|'
Select-String -Path '*.csv' -Pattern $re -CaseSensitive > 'output.txt'
But since PowerShell produces structured data rather than simple string output, you may want to make use of that structure:
$re = (Get-Content 'input.txt' | ForEach-Object { [regex]::Escape($_) }) -join '|'
Select-String -Path '*.csv' -Pattern $re -CaseSensitive |
Select-Object Filename, LineNumber, Line |
Export-Csv 'output.csv' -NoType
If you must process each string from the input file separately you'd do it like this:
foreach ($line in Get-Content 'input.txt') {
Select-String -Path '*.csv' -Pattern $line -SimpleMatch -CaseSensitive |
Select-Object Filename, LineNumber, Line |
Export-Csv 'output.csv' -NoType -Append
}

Related

How can I use ForEach-Object output for Select-String directly?

I have several single line xml files out of which I want to extract the contents of a particular tag <LIFNR>. I managed doing that using the following two lines of power shell:
get-content P:\Webservice\21*_*CR_d*.xml | foreach-object { $_ -replace '>', ">`r`n" } > xmls_newline.txt
get-content .\xmls_newline.txt | Select-String -Pattern '.*<\/LIFNR>'
However if I want to skip the step of creating an intermediary text file I cannot achieve the same result.
$xml = get-content P:\Webservice\21*_*CR_d*.xml | foreach-object { $_ -replace '>', ">`r`n" }
$xml | Select-String -Pattern '.*<\/LIFNR>'
or
get-content P:\Webservice\21*_*CR_d*.xml | foreach-object { $_ -replace '>', ">`r`n" } | Select-String -Pattern '.*<\/LIFNR>'
each just print the result of the string splitting in the foreach-object statement, the Select-String command is ignored.
Can somebody explain to me what is different about the latter two attempts compared to the working solution?
Get-Content outputs each line as a separate string. To emulate the same behavior, split the string after > instead of appending a linebreak:
Get-Content P:\Webservice\21*_*CR_d*.xml |ForEach-Object { $_ -split '(?<=>)' } | Select-String -Pattern '.*<\/LIFNR>'
The construct (?<=...) is a lookbehind assertion - this way, PowerShell will split on a zero-length string immediately after >

Efficiently extract matching lines including path and line number?

The following snippet extracts only the matching lines, I also want the path and line number:
Get-ChildItem $thePath\ -Include "*.txt" -Recurse | Get-Content | Select-String -Pattern 'THE_PATTEN' | Set-Content "output.txt"
I tried with this method and still it only extracts the matching lines:
Get-ChildItem $thePath\ -Include "*.txt" -Recurse | Get-Content | Select-String -Pattern 'THE_PATTEN' | Select-Object -ExpandProperty Line | Set-Content "output.txt"
How can I extract the path:filename:line number: matching line?
You don't need get-content. The path is passed over the pipe. (. is for -path, and *.txt is for -filter for speed)
get-childitem -recurse . *.txt | select-string hi
foo2\file3.txt:1:hi
file1.txt:1:hi
file2.txt:1:hi
First note that Get-ChildItem -Filter is way more efficient than Get-ChildItem -Include (see help get-childitem). Next is that Select-String accepts files. No need to get the content first. Now just Select the properties you need and export your file. (Note that the variable $match and $matches are system variables so you might not want to use them.)
$Patterns = Get-ChildItem $thePath -Filter "*.txt" -Recurse| Select-String -Pattern 'THE_PATTEN' | select Path,Filename,LineNumber,Line
# Export to csv (usable in excel)
$Patterns | Export-Csv output.csv -NoTypeInformation # -Delimiter ";" # the delimiter is optinal and depending of your region
# Exporting txt
foreach ($Pattern in $Patterns){
('{0} : {1} : {2}' -f ($Pattern.Path),($Pattern.LineNumber),($Pattern.Line)) | Add-Content "output.txt"
}
Yea, you can get line number and file name from the output of Select-String:
ls *.txt | % { Select-String -Path $_ -Pattern "THE_PATTERN" | select-object LineNumber, Line, Path }
You'll notice this approach is also a touch faster.
Good luck!

Delete line if it includes a specific string

I have a text file where it will say the computer name and current date they logged in.
04/10/2017, "PC1"
04/10/2017, "PC4"
05/10/2017, "PC3"
09/10/2017, "PC2"
I'm having issues trying to run a script that will look for any line that includes "PC2" and delete that line :
get-content "c:\file.csv" | %{if($_ -match "PC2"){$_ -replace $_, ""}} | set-content c:\file.csv
(Get-Content 'C:\File.csv') -notmatch 'PC2' | Set-Content 'C:\File1.csv'
You can also use regex
File extension is csv
Import-Csv 'C:\File.csv' -Header Logged,Computer |
where {$_.Computer -ne 'PC2'} |
Export-Csv 'C:\File.csv' -NoClobber -NoTypeInformation
(Get-Content -Path 'C:\File.csv') |
Where-Object { $_ -notlike '*PC2*' } |
Set-Content -Path 'C:\File.csv'
Here you go. This utilizes an easier-to-understand wildcard comparison operator and just filters out the lines that have the matched string.

Filtering sections of data including the starting and ending lines- PowerShell

I have a text file that looks like this:
Data I'm NOT looking for
More data that doesn't matter
Even more data that I don't
&Start/Finally the data I'm looking for
&Data/More data that I need
&Stop/I need this too
&Start/Second batch of data I need
&Data/I need this too
&Stop/Okay now I'm done
Ending that I don't need
Here is what the output needs to be:
File1.txt
&Start/Finally the data I'm looking for
&Data/More data that I need
&Stop/I need this too
File2.txt
&Start/Second batch of data I need
&Data/I need this too
&Stop/Okay now I'm done
I need to do this for every file in a folder (sometimes there will be multiple files that will need to be filtered.) The files names can be incrementing: ex. File1.txt, File2.txt, File3.txt.
This is what I have tried with no luck:
ForEach-Object{
$text -join "`n" -split '(?ms)(?=^&START)' -match '^&START' |
Out-File B:\PowerShell\$filename}
Thanks!
Looks like you were pretty close: your code correctly extracted the paragraphs of interest, but intra-paragraph out-filtering of non-&-starting lines was missing, and you needed to write to paragraph-specific output files:
$text -join "`n" -split '(?m)(?=^&Start)' -match '^&Start' |
ForEach-Object { $ndx=0 } { $_ -split '\n' -match '^&' | Out-File "File$((++$ndx)).txt" }
This creates sequentially numbered files starting with File1.txt for every paragraph of interest.
To do it for every file in a folder, with output filenames using fixed naming scheme File<n> across all input files (and thus cumulative numbering):
Get-ChildItem -File . | ForEach-Object -Begin { $ndx=0 } -Process {
(Get-Content -Raw $_) -split '(?m)(?=^&Start)' -match '^&Start' |
ForEach-Object { $_ -split '\n' -match '^&' | Out-File "File$((++$ndx)).txt" }
}
To do it for every file in a folder, with output filenames based on the input filenames and numbering per input file (PSv4+, due to use of -PipelineVariable):
Get-ChildItem -File . -PipelineVariable File | ForEach-Object {
(Get-Content -Raw $_) -split '(?m)(?=^&Start)' -match '^&Start' |
ForEach-Object {$ndx=0} { $_ -split '\n' -match '^&' | Out-File "$($File.Name)$((++$ndx)).txt" }
}
You post a second question (against the rules) and it was deleted but here is my quick answer for it. I hope it will help you and give you more sense how PS works:
$InputFile = "C:\temp\test\New folder (3)\File1.txt"
# get file content
$a=Get-Content $InputFile
# loop for every line in range 2 to last but one
for ($i=1; $i -lt ($a.count-1); $i++)
{
#geting string part between & and / , and construct output file name
$OutFile = "$(Split-Path $InputFile)\$(($a[$i] -split '/')[0] -replace '&','').txt"
$a[0]| Out-File $OutFile #creating output file and write first line in it
$a[$i]| Out-File $OutFile -Append #write info line
$a[-1]| Out-File $OutFile -Append #write last line
}
Something like this?
$i=0
gci -path "C:\temp\ExplodeDir" -file | %{ (get-content -path $_.FullName -Raw).Replace("`r`n`r`n", ";").Replace("`r`n", "~").Split(";") | %{if ($_ -like "*Start*") {$i++; ($_ -split "~") | out-file "C:\temp\ResultFile\File$i.txt" }} }

Powershell: addin line into the .txt file

I have a text (.txt) file with following content:
Car1
Car2
Car3
Car4
Car5
For changing Car1 for random text I used this script:
Get-ChildItem "C:\Users\boris.magdic\Desktop\q" -Filter *.TXT |
Foreach-Object{
$content = Get-Content $_.FullName
$content | ForEach-Object { $_ -replace "Car1", "random_text" } | Set-Content $_.FullName
}
This is working ok, but now I want to add one text line under Car2 in my text file.
How can I do that?
Just chain another -replace and use a new line!
Get-ChildItem "C:\Users\boris.magdic\Desktop\q" -Filter *.TXT |
Foreach-Object{
$file = $_.FullName
$content = Get-Content $file
$content | ForEach-Object { $_ -replace "Car1", "random_text" -replace "(Car2)","`$1`r`nOtherText" } | Set-Content $file
}
First thing is that | Set-Content $_.FullName would not work since the file object does not exist in that pipe. So one simple this to do it save the variable for use later in the pipe. You can also use the ForEach($file in (Get-ChildItem....)) construct.
The specific change to get what you want is the second -replace. We place what you want to match in brackets to that we can reference it in the replacement string with $1. We use a backtick to ensure PowerShell does not treat it as a variable.
We can remove some redundancy as well since -replace will work against the strings of file as a whole
Get-ChildItem "c:\temp" -Filter *.TXT |
Foreach-Object{
$file = $_.FullName
(Get-Content $file) -replace "Car1", "random_text" -replace "(Car2)","`$1`r`nOtherText" | Set-Content $file
}
While this does work with your sample text I want to point out that more complicated strings might require more finesse to ensure you make the correct changed and that the replacements we are using are regex based and do not need to be for this specific example.
.Replace()
So if you were just doing simple replacements then we can update your original logic.
Foreach-Object{
$file = $_.FullName
$content = Get-Content $_.FullName
$content | ForEach-Object { $_.replace("Car1", "random_text").replace("Car2","Car2`r`nOtherText")} | Set-Content $file
}
So that is just simple text replacement chained using the string method .Replace()