Recursive search for string within files on a drive - powershell

I am trying to do a search for specific file within a directory with the following command:
gci -recurse -path "E:\" | select-string "searchContent" | select path
doing so gave me an insufficient memory error. I have seen other posts recommending piping it into foreach-object, but I couldn't figure out how to get it to work in my scenario. Any assistance appreciated!

When reading the file as a whole (single multiline string), your seach can be much faster than by testing line-by-line.
Also, you could speed up significantly if you could use a filename pattern as filter for the Get-ChildItem cmdlet. If you for instance only want to search through .txt files, add -Filter '*.txt'.
In any case, append switch -File so Get-ChildItem won't try to pass DirectoryInfo objects to the rest of the code.
Try:
# since we use regular expression operator `-match`, escape the word or phrase you need to find
$searchContent = [regex]::Escape('whateveryouarelookingfor')
$result = Get-ChildItem -Path 'E:\' -Recurse -File | ForEach-Object {
if ((Get-Content -Path $_.FullName -Raw) -match $searchContent) { $_.FullName }
}
A bit faster than using ForEach-Object{..} would be to use a foreach() instead (skipping processing time needed to pipe results)
# since we use regular expression operator `-match`, escape the word or phrase you need to find
$searchContent = [regex]::Escape('whateveryouarelookingfor')
$result = foreach ($file in (Get-ChildItem -Path 'E:\' -Recurse -File)) {
if ((Get-Content -Path $file.FullName -Raw) -match $searchContent) { $file.FullName }
}
Now you can display the full path and filenames on screen
$result
and save it as text file on disk
$result | Set-Content -Path ('X:\FilesContaining_{0}.txt' -f $searchContent)

Just assign it to a variable, and then have a foreach loop that assigns each one to another variable.
$files = gci -recurse -path "E:\"
foreach ($fileName in $files)
{
if ($fileName.Name -like "*searchContent*")
{
write-host $fileName.Name
}
}

I feel this should consume less memory. Can't tell for sure but you can let me know. The concept is the same but using [System.IO.StreamReader].
Note: This will keep on looking for all files it can find, if you need the loop to stop at first finding then a new condition should be added.
foreach($file in Get-ChildItem -Recurse -path "E:\" -File)
{
$reader = [System.IO.StreamReader]::new($file.FullName)
while(-not $reader.EndOfStream)
{
if($reader.ReadLine() -match 'searchContent')
{
$file.FullName
break
}
}
$reader.Dispose()
}

Related

replacing files names with split output

I am trying to use PowerShell to read filenames from a dir;
then within a for loop:
split names using a delimiter; store desired output in a new variable. Now I want to replace the original filenames in the directory with this new variable. So far I have gathered the following with the expected outputs shown:
$files = Get-ChildItem -Path C:\Test
write-output $files
Directory: C:\Test
1_N04532L_LEFT.JPG
2_N04532R_RIGHT.JPG
code continues
foreach ($file in $files)
{
$nameArray = $file -split "_"
$newName = $nameArray[1]
write-output $newName
}
N04532L
N04532R
Any Ideas on how to accomplish this. I am not a programmer and there is lots of data on this, but it's not working for me.
As both commenters already explained, there is the Rename-Item cmdlet for renaming files.
Since this cmdlet can take a scriptblock in its NewName parameter, you can use that to create a new filename.
# adding switch -File makes sure you do not also try to rename subfolders
$files = Get-ChildItem -Path 'C:\Test' -File
foreach ($file in $files) {
$file | Rename-Item -NewName { '{0}{1}' -f ($file.BaseName -split '_')[1], $file.Extension }
}
You can shorten this by piping the results from Get-ChildItem trhough one-by-one to the Rename-Item cmdlet.
Because we're piping the FileInfo objects here, we can make use of the $_ automatic variable
# enclose the Get-ChildItem cmd in brackets so this will enumerate the files to completion
# before passing them on to te Rename-Item cmdlet.
# if you don't, files you already have renamed could be picked up and processed again..
(Get-ChildItem -Path 'C:\Test' -File) |
Rename-Item -NewName { '{0}{1}' -f ($_.BaseName -split '_')[1], $_.Extension }
Note: when renaming files, you can always run into naming collisions, upon which you will receive an exception

Powershell script to locate only files starting with specified letters and ending with .csv

cd 'A:\P\E\D'
$files = Get-ChildItem . *.CSV -rec
ForEach ($file in $files) {
(Get-Content $file -Raw) | ForEach-Object {
*some simple code*
} | Set-Content $file
}
How to modify this powershell script to locate only files starting with letters A/a to O/o and ending with .csv in specified directory cd?
I thought the solution below would work, but the test file M_K_O_X.CSV stored in the cd directory was not found and modified. The solution above will find and modify the file. It's possible that I have the regex expression wrong or the problem is somewhere else? I tried also this regex -- "[A-O]..CSV"
cd 'A:\P\E\D'
$files = Get-ChildItem . -rec | Where-Object { $_.Name -like "[a-oA-O]*.*.CSV" }
ForEach ($file in $files) {
(Get-Content $file -Raw) | ForEach-Object {
*some simple code*
} | Set-Content $file
}
Looking at your wildcard pattern, seems like you have an extra *. that shouldn't be there:
'M_K_O_X.CSV' -like '[a-oA-O]*.*.CSV' # False
'M_K_O_X.CSV' -like '[a-oA-O]*.CSV' # True
In this case you could simply use the -Include Parameter which supports character ranges. Also PowerShell is case insensitive by default, [a-oA-O]*.CSV can be reduced to [a-o]*.CSV:
Get-ChildItem 'A:\P\E\D' -Recurse -Include '[a-o]*.csv' | ForEach-Object {
($_ | Get-Content -Raw) | ForEach-Object {
# *some simple code*
} | Set-Content -LiteralPath $_.FullName
}
As commented, I would use the standard wildcard -Filter to filter for all files with a .csv extension.
Then pipe to a Where-Object clause in which you can use regex -match
$files = Get-ChildItem -Path 'A:\P\E\D' -Filter '*.csv' -File -Recurse |
Where-Object { $_.Name -match '^[a-o]' }
foreach ($file in $files) {
# switch `-Raw` makes Get-Content return a single multiline string, so no need for a loop
$content = Get-Content -Path $file.FullName -Raw
# *some simple code manipulating $content*
$content | Set-Content -Path $file.FullName
}
However, if these are valid csv files, I would not recommend using a pure textual manipulation on them, instead use Import-Csv -Path $file.FullName and work on the properties on each of the objects returned.

Trying to truncate csv/txt files leaving header row only

I have several csv and txt files in a directory with data in them. I need to truncate the data from all of these files but leave the header in each.
You can use following script - it should work, if all files have more than one line...
$files = dir .\* -include ('*.csv', '*.txt')
foreach ($file in $files) {
$firstline = (get-content $file)[0]
set-content $file -Value $firstline
}
You do not need to read the whole file in order to just capture the first line..
Get-ChildItem -Path 'D:\Test' -File | Where-Object { $_.Extension -match '\.(csv|txt)'} | ForEach-Object {
# only read the first line using -TotalCount
($_ | Get-Content -TotalCount 1) | Set-Content -Path $_.FullName
}
The above could produce empty or whitespace only files if the top line is empty or only contains whitespaces..
Perhaps then the best option to quickly truncate these files to the top NON-EMPTY line would be:
Get-ChildItem -Path 'D:\Test' -File | Where-Object { $_.Extension -match '\.(csv|txt)'} | ForEach-Object {
$newcontent = switch -Regex -File $_.FullName {
'\S' { $_ ; break} # output the first line that is not empty or whitespace-only and exit the switch
}
# write back to the file
$newcontent | Set-Content -Path $_.FullName
}
P.S. Using -Filter as parameter on Get-ChildItem would work faster, but unfortunately, the filter can only be used for ONE file pattern only, like '*.csv'.
If you need recursion (search subfolders as well), then you could user the -Include parameter which accepts an array of file patterns. However, for that to work, you also need to add switch -Recurse OR have the path end in \*.
-Include is not as fast as -Filter, just about the same speed as by using a Where-Object clause in the examples above

Filter multiple CSV for text and create new files

I have about 2500 CSV files each around 20MB in terms of file size. I am trying to filter out certain rows from each file and save that to a new file.
So, if i have :
File 1 :
Row1
Row2
Row3
File 2 :
Row2
Row3
and so on..
If i filter for all files and select "Row2" as filter text the new folder should have all the files with only rows that match the filter text.
Looking through some forums, I came up with the following that might help me filter the rows, but Im not sure how I can do it recursively, plus I also don't know if this is a fast enough method. Any help is appreciated.
Get-Content "C:\Path to file" | Where{$_ -match "Rowfiltertext*"} | Out-File "Path to Out file"
I'm using windows so I guess Powershell type of solution would be the best here.
The text to be filtered will always be in the first column.
Thanks
Siddhant
Here's two fast ways of searching for a string inside (text) files:
1) using switch
$searchPattern = [regex]::Escape('Rowfiltertext') # for safety escape regex special characters
$sourcePath = 'X:\Path\To\The\Csv\Files'
$outputPath = 'X:\FilteredCsv.txt'
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
Get-ChildItem -Path $sourcePath -Filter '*.csv' -File | ForEach-Object {
# iterate through the lines in the file and output the ones that match the search pattern
switch -Regex -File $_.FullName {
$searchPattern { $_ }
}
} | Set-Content -Path $outputPath # add -PassThru to also show on screen
2) using Select-String
$searchPattern = [regex]::Escape('Rowfiltertext') # for safety escape regex special characters
$sourcePath = 'X:\Path\To\The\Csv\Files'
$outputPath = 'X:\FilteredCsv.txt'
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
Get-ChildItem -Path $sourcePath -Filter '*.csv' -File | ForEach-Object {
($_ | Select-String -Pattern $searchPattern).Line
} | Set-Content -Path $outputPath # add -PassThru to also show on screen
In case you want to output a new csv file for every original file,
use:
3) using switch
$searchPattern = [regex]::Escape('Rowfiltertext') # for safety escape regex special characters
$sourcePath = 'X:\Path\To\The\Csv\Files'
$outputPath = 'X:\FilteredCsv'
if (!(Test-Path -Path $outputPath -PathType Container)) {
$null = New-Item -Path $outputPath -ItemType Directory
}
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
(Get-ChildItem -Path $sourcePath -Filter '*.csv' -File) | ForEach-Object {
# create a full target filename for the filtered output csv
$outFile = Join-Path -Path $outputPath -ChildPath ('New_{0}' -f $_.Name)
# iterate through the lines in the file and output the ones that match the search pattern
$result = switch -Regex -File $_.FullName {
$searchPattern { $_ }
}
$result | Set-Content -Path $outFile # add -PassThru to also show on screen
}
4) using Select-String
$searchPattern = [regex]::Escape('Rowfiltertext') # for safety escape regex special characters
$sourcePath = 'X:\Path\To\The\Csv\Files'
$outputPath = 'X:\FilteredCsv'
# if you also need to search inside subfolders, append -Recurse to the Get-ChildItem cmdlet
(Get-ChildItem -Path $sourcePath -Filter '*.csv' -File) | ForEach-Object {
# create a full target filename for the filtered output csv
$outFile = Join-Path -Path $outputPath -ChildPath ('New_{0}' -f $_.Name)
($_ | Select-String -Pattern $searchPattern).Line | Set-Content -Path $outFile # add -PassThru to also show on screen
}
Hope that helps
Re. "fast enough method":
Get-Content is extremely slow.
You could use "System.IO.StreamReader" instead, i.e. read the complete file content into a string, then split this string into rows and so on, e.g.:
[System.IO.FileStream]$objFileStream = New-Object System.IO.FileStream($Csv.FullName, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [System.IO.FileShare]::ReadWrite)
[System.IO.StreamReader]$objStreamReader = New-Object System.IO.StreamReader($objFileStream, [System.Text.Encoding]::UTF8)
$strFileContent = ($objStreamReader.ReadToEnd())
$objStreamReader.Close()
$objStreamReader.Dispose()
$objFileStream.Close()
$objFileStream.Dispose()
[string[]]$arrFileContent = $strFileContent -split("`r`n")

How do I select files with specific words in their content from a folder?

I am trying to filter files having any of the words January or February in their content.
$files = Get-ChildItem "C:\Users\Desktop\NewFolder\" -Recurse -Filter "*Support*"
$count = 0
$p = 'january', 'February'
foreach ($file in $files){
if((Get-Content $file.FullName) | Select-String -Pattern '^%january%'){
Write-Host "File found"
#write-host $file.FullName
$count++
}
else {
Write-Host "File NOt found"
}
}
Write-Host $count
Currently I am just getting "File NOt found" even though the file exists
Your issue might simply be your regex string although improvement could still be made as a whole. The percent sign is not a wildcard character in regex also are you expecting the month to appear at the start of a line? That is what the anchor ^ represents.
So likely your files do not have the string %January% at the start of any line. Like I mentioned earlier I don't think that is what you wanted.
So lets find all the files you want and filter those files based on the presence of either of the works in $p (like in your example above)
$p ='january','February'
$regexPattern = ($p | ForEach-Object{[regex]::Escape($_)}) -join "|"
$files = Get-ChildItem -Path "c:\temp\" -filter "*.txt"
$files | Where-Object{Select-String -Path $_.Fullname -Pattern $regexPattern}
That will spit out any file objects that have the work January or February in them anywhere in the line.
$regexPattern would end up being a pipeline delimited string of the words in $p. [regex]::Escape() is a good way to avoid special regex characters in your strings especially if you are just using examples.
You would of course need to change the -Path and -Filter accordingly as well as including -Recurse if the situation calls for it.
I think
Select-String -Path "C:\Users\Desktop\NewFolder\*Support*" -Pattern January,february
or (if you need to recurse the path)
Get-ChildItem -Path "C:\Users\Desktop\NewFolder" -Include *Support* -Recurse | Select-String -Pattern January,february
should get you what you want?
(Select-String also has a -CaseSensitive switch if you should need that)