I am using the below script to search for credit card numbers inside a folder that contains many subfolders:
Get-ChildItem -rec | ?{ findstr.exe /mprc:. $_.FullName }
| select-string "[456][0-9]{15}","[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}"
However, this will return all instances found in every folder/subfolder.
How can I amend the script to skip the current folder on the first instance found? meaning that if it finds a credit card number it will stop processing the current folder and move to the next folder.
Appreciate you answers and help.
Thanks in advance,
You could use this recursive function:
function cards ($dir)
Get-ChildItem -Directory $dir | % { cards($_.FullName) }
Get-ChildItem -File $dir\* | % {
if ( Select-String $_.FullName "[456][0-9]{15}","[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}" ) {
write-host "card found in $dir"
return
}
}
}
cards "C:\path\to\base\dir"
It'll keep going through subdirectories of the top level directory you specify. Whenever it gets to a directory with no subdirectories, or its been through all the subdirectories of the current directory, it'll start looking through the files for the matching regex, but will bail out of the function when the first match is found.
So really what you want is the first file in every folder that has a credit card number in the contents.
Break it into two parts. Get a list of all your folders, recursively. Then, for each folder, get the list of files, non-recursively. Search each file until you find one that matches.
I don't see any easy way to do this with pipes alone. That means more traditional programming techniques.
This requires PowerShell 3.0. I've eliminated ?{ findstr.exe /mprc:. $_.FullName } because all I can see that it does is eliminate folders (and zero length files) and this already handles that.
Get-ChildItem -Directory -Recurse | ForEach-Object {
$Found = $false;
$i = 0;
$Files = $_ | Get-ChildItem -File | Sort-Object -Property Name;
for ($i = 0; ($Files[$i] -ne $null) -and ($Found -eq $false); $i++) {
$SearchResult = $Files[$i] | Select-String "[456][0-9]{15}","[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}";
if ($SearchResult) {
$Found = $true;
Write-Output $SearchResult;
}
}
}
Didn't have the time to test it fully, but I thought about something like this:
$Location = 'H:\'
$Dirs = Get-ChildItem $Location -Directory -Recurse
$Regex1 = "[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}"
$Regex2 = "[456][0-9]{15}"
Foreach ($d in $Dirs) {
$Files = Get-ChildItem $d.FullName -File
foreach ($f in $Files) {
if (($f.Name -match $Regex1) -or ($f.Name -match $Regex2)) {
Write-Host 'Match found'
Return
}
}
}
Here is another one, why not, the more the merrier.
I'm assuming that your Regex is correct.
Using break in the second loop will skip looking for a credit card in the remaining files if one is found and continue to the next folder.
$path = '<your path here>'
$folders = Get-ChildItem $path -Directory -rec
foreach ($folder in $folders)
{
$items = Get-ChildItem $folder.fullname -File
foreach ($i in $items)
{
if (($found = $i.FullName| select-string "[456][0-9]{15}","[456][0-9]{3}[-| ][0-9]{4} [-| ][0-9]{4}[-| ][0-9]{4}") -ne $null)
{
break
}
}
}
I think the intention was to look inside each file for the PII data right?
If so, you need to open the load the file and search each line. The code you posted will only run a regex on the name of the file.
Related
In a directory, there are files with the following filenames:
ExampleFile.mp3
ExampleFile_pn.mp3
ExampleFile2.mp3
ExampleFile2_pn.mp3
ExampleFile3.mp3
I want to iterate through the directory, and IF there is a filename that contains the string '_pn.mp3', I want to test if there is a similarly named file without the '_pn.mp3' in the same directory. If that file exists, I want to remove it.
In the above example, I'd want to remove:
ExampleFile.mp3
ExampleFile2.mp3
and I'd want to keep ExampleFile3.mp3
Here's what I have so far:
$pattern = "_pn.mp3"
$files = Get-ChildItem -Path '$path' | Where-Object {! $_.PSIsContainer}
Foreach ($file in $files) {
If($file.Name -match $pattern){
# filename with _pn.mp3 exists
Write-Host $file.Name
# search in the current directory for the same filename without _pn
<# If(Test-Path $currentdir $filename without _pn.mp3) {
Remove-Item -Force}
#>
}
enter code here
You could use Group-Object to group all files by their BaseName (with the pattern removed), and then loop over the groups where there are more than one file. The result of grouping the files and filtering by count would look like this:
$files | Group-Object { $_.BaseName.Replace($pattern,'') } |
Where-Object Count -GT 1
Count Name Group
----- ---- -----
2 ExampleFile {ExampleFile.mp3, ExampleFile_pn.mp3}
2 ExampleFile2 {ExampleFile2.mp3, ExampleFile2_pn.mp3}
Then if we loop over these groups we can search for the files that do not end with the $pattern:
#'
ExampleFile.mp3
ExampleFile_pn.mp3
ExampleFile2.mp3
ExampleFile2_pn.mp3
ExampleFile3.mp3
'# -split '\r?\n' -as [System.IO.FileInfo[]] | Set-Variable files
$pattern = "_pn"
$files | Group-Object { $_.BaseName.Replace($pattern,'') } |
Where-Object Count -GT 1 | ForEach-Object {
$_.Group.Where({-not $_.BaseName.Endswith($pattern)})
}
This is how your code would look like, remove the -WhatIf switch if you consider the code is doing what you wanted.
$pattern = "_pn.mp3"
$files = Get-ChildItem -Path -Filter *.mp3 -File
$files | Group-Object { $_.BaseName.Replace($pattern,'') } |
Where-Object Count -GT 1 | ForEach-Object {
$toRemove = $_.Group.Where({-not $_.BaseName.Endswith($pattern)})
Remove-Item $toRemove -WhatIf
}
I think you can get by here by adding file names into a hash map as you go. If you encounter a file with the ending you are interested in, check if a similar file name was added. If so, remove both the file and the similar match.
$ending = "_pn.mp3"
$files = Get-ChildItem -Path $path -File | Where-Object { ! $_.PSIsContainer }
$hash = #{}
Foreach ($file in $files) {
# Check if file has an ending we are interested in
If ($file.Name.EndsWith($ending)) {
$similar = $file.Name.Split($ending)[0] + ".mp3"
# Check if we have seen the similar file in the hashmap
If ($hash.Contains($similar)) {
Write-Host $file.Name
Write-Host $similar
Remove-Item -Force $file
Remove-Item -Force $hash[$similar]
# Remove similar from hashmap as it is removed and no longer of interest
$hash.Remove($similar)
}
}
else {
# Add entry for file name and reference to the file
$hash.Add($file.Name, $file)
}
}
Just get a list of the files with the _pn then process against the rest.
$pattern = "*_pn.mp3"
$files = Get-ChildItem -Path "$path" -File -filter "$pattern"
Foreach ($file in $files) {
$TestFN = $file.name -replace("_pn","")
If (Test-Path -Path $(Join-Path -Path $Path -ChildPath $TestFN)) {
$file | Remove-Item -force
}
} #End Foreach
I have a line of code that prints out all the files and folders with that are similar to $filename e.g. keyword "abc" will also include a file/folder "abcdef"
Get-ChildItem -Path 'C:\' -Filter $filename -Recurse | %{$_.FullName}
I'd like to have make it so that the search for these files does not go into the sub-directories of folders
e.g. a folder with name "abc" and subfolder "abcdef" only prints out "C:\abc"
Currently the line of code would print out "C:\abc" and "C:\abc\abcdef"
What would be the best way to do this?
This will do it.
Get-ChildItem is performed at the top level to populate the processing queue ($ProcessingQueue)
Then, a loop will run until the processing queue does not have any element left.
Each element in the queue will undergo the same process.
Either it match the filter, in which case it will be added to the $Result variable or it does not, in which case Get-ChildItem will be called on that directory and its result appended to the queue.
This ensure we do not process any further a directory tree once we have a match and that that the recursion is only applied if the directory did not match the folder in the first place.
--
Function Get-TopChildItem($Path, $Filter) {
$Results = [System.Collections.Generic.List[String]]::New()
$ProcessingQueue = [System.Collections.Queue]::new()
ForEach ($item in (Get-ChildItem -Directory $Path)) {
$ProcessingQueue.Enqueue($item.FullName)
}
While ($ProcessingQueue.Count -gt 0) {
$Item = $ProcessingQueue.Dequeue()
if ($Item -match $Filter) {
$Results.Add($Item)
}
else {
ForEach ($el in (Get-ChildItem -Path $Item -Directory)) {
$ProcessingQueue.Enqueue($el.FullName)
}
}
}
return $Results
}
#Example
Get-TopChildItem -Path "C:\_\111" -Filter 'obj'
I have a folder full of 500,00+ files. I'm trying to iterate through this folder and run some logic to determine if we can delete unneeded files. The problem is this process needs to run semi-regularly and the new files that need to be deleted are currently at the end of the list it seems.
I put together the following list of code to sort through it all:
gci $RPT | %{
$flag = 0;
$number = [int]($_.Name | select-string -pattern "\d{12}" -Allmatches).Matches.Value
if ($submidlist -match "^$number$"){
if ($_ -notmatch "acct\.csv|jpd\.csv|jss\.pdf|jman\.pdf|3600\.pdf|cont\.pdf|msl\.txt|pres\.pdf|tray\.pdf|qual\.pdf|zipl\.pdf"){
echo "DELETE SUBMID $_"
remove-item $RPT\$_
$count++
$totalcount++
$flag = 1;
}
}
if ($jobidlist -match "^$number$"){
if ($_ -match "acct\.csv|jpd\.csv|jss\.pdf|jman\.pdf|3600\.pdf|cont\.pdf|msl\.txt|pres\.pdf|tray\.pdf|qual\.pdf|zipl\.pdf"){
echo "DELETE JOBID $_"
remove-item $RPT\$_
$count++
$totalcount++
$flag = 1;
}
}
}
Currently, running the above script takes over 24 hours and it still doesn't make it to the end of the list. Is there a way to optimize this or reverse the order that get-childitem iterates through this folder?
function Delete-Items($List, [string]$ListName){
$DoNotDelete = #("acct.csv","jpd.csv","jss.pdf","jman.pdf","3600.pdf","cont.pdf","msl.txt","pres.pdf","tray.pdf","qual.pdf","zipl.pdf")
$List = $List | %{
"*$_*"
}
Get-ChildItem C:\TEST\56381643\ -Recurse -Include $List -Directory | %{
Get-ChildItem $_.FullName -Exclude $DoNotDelete -Recurse | %{
echo "DELETE $ListName $($_.name | select-string -pattern "\d{12}")"
Remove-Item -Path $_.FullName -WhatIf
}
}
}
#Example Usage
$JobList = #(
098765432109
123456789012
)
$SubmitList = #(
234567890123
)
Delete-Items -List $JobList -ListName JOBID
Delete-Items -List $SubmitList -ListName SUBMID
Lets go over a basic rundown of whats happening in the function.
We have a array of files not to delete
We turn the $list numbers into wildcards by adding a * before and after each item in the array. We then only search for those directories that contain those numbers.
We then use another Get-ChildItem to get the files in each directory but exclude the ones mentioned in$DoNotDelete`.
If you want to delete the files delete the -Whatif on the remove-item
I am new to WinPowerShell. Please, would you be so kind to give me some code or information, how to write a program which will do for all *.txt files in a folder next:
1.Count characters for each line in the file
2. If length of line exceeds 1024 characters to create a subfolder within that folder and to move file there (that how I will know which file has over 1024 char per line)
I've tried though VB and VBA (this is more familiar to me), but I want to learn some new cool stuff!
Many thanks!
Edit: I found some part of a code that is beginning
$fileDirectory = "E:\files";
foreach($file in Get-ChildItem $fileDirectory)
{
# Processing code goes here
}
OR
$fileDirectory = "E:\files";
foreach($line in Get-ChildItem $fileDirectory)
{
if($line.length -gt 1023){# mkdir and mv to subfolder!}
}
If you are willing to learn, why not start here.
You can use the Get-Content command in PS to get some information of your files. http://blogs.technet.com/b/heyscriptingguy/archive/2013/07/06/powertip-counting-characters-with-powershell.aspx and Getting character count for each row in text doc
With your second edit I did see some effort so I would like to help you.
$path = "D:\temp"
$lengthToNotExceed = 1024
$longFiles = Get-ChildItem -path -File |
Where-Object {(Get-Content($_.Fullname) | Measure-Object -Maximum Length | Select-Object -ExpandProperty Maximum) -ge $lengthToNotExceed}
$longFiles | ForEach-Object{
$target = "$($_.Directory)\$lengthToNotExceed\"
If(!(Test-Path $target)){New-Item $target -ItemType Directory -Force | Out-Null}
Move-Item $_.FullName -Destination $target
}
You can make this a one-liner but it would be unnecessarily complicated. Use measure object on the array returned by Get-Content. The array being, more or less, a string array. In PowerShell strings have a length property which query.
That will return the maximum length line in the file. We use Where-Object to filter only those results with the length we desire.
Then for each file we attempt to move it to the sub directory that is in the same location as the file matched. If no sub folder exists we make it.
Caveats:
You need at least 3.0 for the -File switch. In place of that you can update the Where-Object to have another clause: $_.PSIsContainer
This would perform poorly on files with a large number of lines.
Here's my comment above indented and line broken in .ps1 script form.
$long = #()
foreach ($file in gci *.txt) {
$f=0
gc $file | %{
if ($_.length -ge 1024) {
if (-not($f)) {
$f=1
$long += $file
}
}
}
}
$long | %{
$dest = #($_.DirectoryName, '\test') -join ''
[void](ni -type dir $dest -force)
mv $_ -dest (#($dest, '\', $_.Name) -join '') -force
}
I was also mentioning labels and breaks there. Rather than $f=0 and if (-not($f)), you can break out of the inner loop with break like this:
$long = #()
foreach ($file in gci *.txt) {
:inner foreach ($line in gc $file) {
if ($line.length -ge 1024) {
$long += $file
break inner
}
}
}
$long | %{
$dest = #($_.DirectoryName, '\test') -join ''
[void](ni -type dir $dest -force)
mv $_ -dest (#($dest, '\', $_.Name) -join '') -force
}
Did you happen to notice the two different ways of calling foreach? There's the verbose foreach command, and then there's command | %{} where the iterative item is represented by $_.
I am trying to count the files in all subfolders in a directory and display them in a list.
For instance the following dirtree:
TEST
/VOL01
file.txt
file.pic
/VOL02
/VOL0201
file.nu
/VOL020101
file.jpg
file.erp
file.gif
/VOL03
/VOL0301
file.org
Should give as output:
PS> DirX C:\TEST
Directory Count
----------------------------
VOL01 2
VOL02 0
VOL02/VOL0201 1
VOL02/VOL0201/VOL020101 3
VOL03 0
VOL03/VOL0301 1
I started with the following:
Function DirX($directory)
{
foreach ($file in Get-ChildItem $directory -Recurse)
{
Write-Host $file
}
}
Now I have a question: why is my Function not recursing?
Something like this should work:
dir -recurse | ?{ $_.PSIsContainer } | %{ Write-Host $_.FullName (dir $_.FullName | Measure-Object).Count }
dir -recurse lists all files under current directory and pipes (|) the result to
?{ $_.PSIsContainer } which filters directories only then pipes again the resulting list to
%{ Write-Host $_.FullName (dir $_.FullName | Measure-Object).Count } which is a foreach loop that, for each member of the list ($_) displays the full name and the result of the following expression
(dir $_.FullName | Measure-Object).Count which provides a list of files under the $_.FullName path and counts members through Measure-Object
?{ ... } is an alias for Where-Object
%{ ... } is an alias for foreach
Similar to David's solution this will work in Powershell v3.0 and does not uses aliases in case someone is not familiar with them
Get-ChildItem -Directory | ForEach-Object { Write-Host $_.FullName $(Get-ChildItem $_ | Measure-Object).Count}
Answer Supplement
Based on a comment about keeping with your function and loop structure i provide the following. Note: I do not condone this solution as it is ugly and the built in cmdlets handle this very well. However I like to help so here is an update of your script.
Function DirX($directory)
{
$output = #{}
foreach ($singleDirectory in (Get-ChildItem $directory -Recurse -Directory))
{
$count = 0
foreach($singleFile in Get-ChildItem $singleDirectory.FullName)
{
$count++
}
$output.Add($singleDirectory.FullName,$count)
}
$output | Out-String
}
For each $singleDirectory count all files using $count ( which gets reset before the next sub loop ) and output each finding to a hash table. At the end output the hashtable as a string. In your question you looked like you wanted an object output instead of straight text.
Well, the way you are doing it the entire Get-ChildItem cmdlet needs to complete before the foreach loop can begin iterating. Are you sure you're waiting long enough? If you run that against very large directories (like C:) it is going to take a pretty long time.
Edit: saw you asked earlier for a way to make your function do what you are asking, here you go.
Function DirX($directory)
{
foreach ($file in Get-ChildItem $directory -Recurse -Directory )
{
[pscustomobject] #{
'Directory' = $File.FullName
'Count' = (GCI $File.FullName -Recurse).Count
}
}
}
DirX D:\
The foreach loop only get's directories since that is all we care about, then inside of the loop a custom object is created for each iteration with the full path of the folder and the count of the items inside of the folder.
Also, please note that this will only work in PowerShell 3.0 or newer, since the -directory parameter did not exist in 2.0
Get-ChildItem $rootFolder `
-Recurse -Directory |
Select-Object `
FullName, `
#{Name="FileCount";Expression={(Get-ChildItem $_ -File |
Measure-Object).Count }}
My version - slightly cleaner and dumps content to a file
Original - Recursively count files in subfolders
Second Component - Count items in a folder with PowerShell
$FOLDER_ROOT = "F:\"
$OUTPUT_LOCATION = "F:DLS\OUT.txt"
Function DirX($directory)
{
Remove-Item $OUTPUT_LOCATION
foreach ($singleDirectory in (Get-ChildItem $directory -Recurse -Directory))
{
$count = Get-ChildItem $singleDirectory.FullName -File | Measure-Object | %{$_.Count}
$summary = $singleDirectory.FullName+" "+$count+" "+$singleDirectory.LastAccessTime
Add-Content $OUTPUT_LOCATION $summary
}
}
DirX($FOLDER_ROOT)
I modified David Brabant's solution just a bit so I could evaluate the result:
$FileCounter=gci "$BaseDir" -recurse | ?{ $_.PSIsContainer } | %{ (gci "$($_.FullName)" | Measure-Object).Count }
Write-Host "File Count=$FileCounter"
If($FileCounter -gt 0) {
... take some action...
}