Using Variables with Directories & Filtering - powershell

I'm new to PowerShell, and trying to do something pretty simple (I think). I'm trying to filter down the results of a folder, where I only look at files that start with e02. I tried creating a variable for my folder path, and a variable for the filtered down version. When I get-ChildItem for that filtered down version, it brings back all results. I'm trying to run a loop where I'd rename these files.
File names will be something like e021234, e021235, e021236, I get new files every month with a weird extension I convert to txt. They're always the same couple names, and each file has its own name I'd rename it to. Like e021234 might be Program Alpha.
set-location "C:\MYPATH\SAMPLE\"
$dir = "C:\MYPATH\SAMPLE\"
$dirFiltered= get-childItem $dir | where-Object { $_.baseName -like "e02*" }
get-childItem $dirFiltered |
Foreach-Object {
$name = if ($_.BaseName -eq "e024") {"Four"}
elseif ($_.BaseName -eq "e023") {"Three"}
get-childitem $dirFiltered | rename-item -newname { $name + ".txt"}
}

There are a few things I can see that could use some adjustment.
My first thought on this is to reduce the number of places a script has to be edited when changes are needed. I suggest assigning the working directory variable first.
Next, reduce the number of times information is pulled. The Get-ChildItem cmdlet offers an integrated -Filter parameter which is usually more efficient than gathering all the results and filtering afterward. Since we can grab the filtered list right off the bat, the results can be piped directly to the ForEach block without going through the variable assignment and secondary filtering.
Then, make sure to initialize $name inside the loop so it doesn't accidentally cause issues. This is because $name remains set to the last value it matched in the if/elseif statements after the script runs.
Next, make use of the fact that $name is null so that files that don't match your criteria won't be renamed to ".txt".
Finally, perform the rename operation using the $_ automatic variable representing the current object instead of pulling the information with Get-ChildItem again. The curly braces have also been replaced with parenthesis because of the change in the Rename-Item syntax.
Updated script:
$dir = "C:\MYPATH\SAMPLE\"
Set-Location $dir
Get-ChildItem $dir -Filter "e02*" |
Foreach-Object {
$name = $null #initialize name to prevent interference from previous runs
$name = if ($_.BaseName -eq "e024") {"Four"}
elseif ($_.BaseName -eq "e023") {"Three"}
if ($name -ne $null) {
Rename-Item $_ -NewName ($name + ".txt")
}
}

Related

Rename files in a foreach loop with Powershell - Strange behaviour

I am renaming the files of a directory. The way to rename them is to add a letter to the beginning of the file name, depending on the option a user chooses ("F" or "O") A possible solution to this exercise is the following:
$Path="C:\app\SandBox\"
Get-ChildItem -Path $Path -Filter *.xlsx | ForEach-Object {
$opt = Read-Host "Do you want modify (F) this file or not (O)?"
$name=$_.name
$PathFinal=$Path+$name
if ($opt -eq "F") {
$newName="F"+$name
Rename-Item -NewName $newName -Path $PathFinal
}
if ($opt -eq "O") {
$newName="F"+$name
Rename-Item -NewName $newName -Path $PathFinal
}
}
The loop iterates as many times as there are files in the directory.
However, when I try to change the code as follows:
$Path="C:\app\SandBox\"
Get-ChildItem -Path $Path -Filter *.xlsx | ForEach-Object {
$name=$_.name
$opt = Read-Host "$name - Do you want modify (F) this file or not (O)?"
if ($opt -eq "O") {
$name=$_.Name
Rename-Item -NewName "O$name" -Path $Path$name
}
if ($opt -eq "F") {
$name=$_.Name
Rename-Item -NewName "F$name" -Path $Path$name
}
}
It turns out that, in some cases, the loop iterates one more time!
If I have two files in the folder, sometimes the loop oterates three.
Which may be due?
It should iterate twice, but iterate three. I can't think of what it could be due to, since when channeling it should pass only two files.
It should iterate twice, but iterate three. I can't think of what it could be due to, since when channeling it should pass only two files.
Get-ChildItem works against the file system by asking for the first file matching a given filter, and then it continues asking the OS "what's the next matching file name after fileX", until the OS says "no more files".
In this case, A.xlsx is renamed to FA.xlsx on the first iteration, and the OS is thus able to answer the question after the next iteration: "what's the next matching file name after B.xlsx" with "Next file is FA.xlsx".
To enumerate only the files already in the directory when you start the script, place the call to Get-ChildItem in a subexpression or nested pipeline:
$(Get-ChildItem -Path $Path -Filter *.xlsx) | ForEach-Object { ... }
This will force PowerShell to wait for Get-ChildItem to finish executing before sending the first output item to ForEach-Object - and since Get-ChildItem is already "done" by the time the renaming starts, you don't risk seeing the same file multiple times.

Script lists all files that don't contain needed content

I'm trying to find all files in a dir, modified within the last 4 hours, that contain a string. I can't have the output show files that don't contain needed content. How do I change this so it only lists the filename and content found that matches the string, but not files that don't have that string? This is run as a windows shell command. The dir has a growing list of hundreds of files, and currently output looks like this:
File1.txt
File2.txt
File3.txt
... long long list, with none containing the needed string
(powershell "Set-Location -Path "E:\SDKLogs\Logs"; Get-Item *.* | Foreach { $lastupdatetime=$_.LastWriteTime; $nowtime = get-date; if (($nowtime - $lastupdatetime).totalhours -le 4) {Select-String -Path $_.Name -Pattern "'Found = 60.'"| Write-Host "$_.Name Found = 60"; }}")
I tried changing the location of the Write-Host but it's still printing all files.
Update:
I'm currently working on this fix. Hopefully it's what people were alluding to in comments.
$updateTimeRange=(get-date).addhours(-4)
$fileNames = Get-ChildItem -Path "K:\NotFound" -Recurse -Include *.*
foreach ($file in $filenames)
{
#$content = Get-Content $_.FullName
Write-host "$($file.LastWriteTime)"
if($file.LastWriteTime -ge $($updateTimeRange))
{
#Write-Host $file.FullName
if(Select-String -Path $file.FullName -Pattern 'Thread = 60')
{
Write-Host $file.FullName
}
}
}
If I understood you correctly, you just want to display the file name and the matched content? If so, the following will work for you:
$date = (Get-Date).AddHours(-4)
Get-ChildItem -Path 'E:\SDKLogs\Logs' | Where-Object -FilterScript { $date -lt $_.LastWriteTime } |
Select-String -Pattern 'Found = 60.' |
ForEach-Object -Process {
'{0} {1}' -f $_.FileName, $_.Matches.Value
}
Get-Date doesn't need to be in a variable before your call but, it can become computationally expensive running a call to it again and again. Rather, just place it in a variable before your expression and call on the already created value of $date.
Typically, and for best practice, you always want to filter as far left as possible in your command. In this case we swap your if statement for a Where-Object to filter as the objects are passed down the pipeline. Luckily for us, Select-String returns the file name of a match found, and the matched content so we just reference it in our Foreach-Object loop; could also use a calculated property instead.
As for your quoting issues, you may have to double quote or escape the quotes within the PowerShell.exe call for it to run properly.
Edit: swapped the double quotes for single quotes so you can wrap the entire expression in just PowerShell.exe -Command "expression here" without the need of escaping; this works if you're pattern to find doesn't contain single quotes.

Compress File per file, same name

I hope you are all safe in this time of COVID-19.
I'm trying to generate a script that goes to the directory and compresses each file to .zip with the same name as the file, for example:
sample.txt -> sample.zip
sample2.txt -> sample2.zip
but I'm having difficulties, I'm not that used to powershell, I'm learning and improving this script. In the end it will be a script that deletes files older than X days, compresses files and makes them upload in ftp .. the part of excluding with more than X I've already managed it for days, now I grabbed a little bit on this one.
Last try at moment.
param
(
#Future accept input
[string] $InputFolder,
[string] $OutputFolder
)
#test folder
$InputFolder= "C:\Temp\teste"
$OutputFolder="C:\Temp\teste"
$Name2 = Get-ChildItem $InputFolder -Filter '*.csv'| select Name
Set-Variable SET_SIZE -option Constant -value 1
$i = 0
$zipSet = 0
Get-ChildItem $InputFolder | ForEach-Object {
$zipSetName = ($Name2[1]) + ".zip "
Compress-Archive -Path $_.FullName -DestinationPath "$OutputFolder\$zipSetName"
$i++;
$Name2++
if ($i -eq $SET_SIZE) {
$i = 0;
$zipSet++;
}
}
You can simplify things a bit, and it looks like most of the issues are because in your script example $Name2 will contain a different set of items than the Get-ChildItem $InputFolder will return in the loop (i.e. may have other objects other than .csv files).
The best way to deal with things is to use variables with the full file object (i.e. you don't need to use |select name). So I get all the CSV file objects right away and store in the variable $CsvFiles.
We can additionally use the special variable $_ inside the ForEach-Object which represents the current object. We also can use $_.BaseName to give us the name without the extension (assuming that's what you want, otherwise use $_Name to get a zip with the name like xyz.csv).
So a simplified version of the code can be:
$InputFolder= "C:\Temp\teste"
$OutputFolder="C:\Temp\teste"
#Get files to process
$CsvFiles = Get-ChildItem $InputFolder -Filter '*.csv'
#loop through all files to zip
$CsvFiles | ForEach-Object {
$zipSetName = $_.BaseName + ".zip"
Compress-Archive -Path $_.FullName -DestinationPath "$OutputFolder\$zipSetName"
}

Referencing targeted object in ForEach-Object in Powershell

I am fairly new to Powershell. I am attempting to re-write the names of GPO backup folders to use their friendly name rather than their GUID by referencing the name in each GPO backup's 'gpresult.xml' file that is created as part of the backup. However, I do not understand how I can reference the specific object (in this case, the folder name) that is being read into the ForEach-Object loop in order to read into the file beneath this folder.
function Backup_GPO {
$stamp = Get-Date -UFormat "%m%d"
New-Item -ItemType "directory" -Name $stamp -Path \\dc-16\share\GPO_Backups -Force | out-null # create new folder to specify day backup is made
Backup-GPO -All -Path ("\\dc-16\share\GPO_Backups\" + "$stamp\" )
Get-ChildItem -Path \\dc-16\share\GPO_Backups\$stamp | ForEach-Object {
# I want to reference the current folder here
[xml]$file = Get-Content -Path (folder that is being referenced in for loop)\gpresult.xml
$name = $file.GPO.Name
}
I'm coming from Python, where if I want to reference the object I'm currently iterating on, I can do so very simply -
for object in list:
print(object)
How do you reference the currently in-use object in Powershell's ForEach-Object command?
I'm coming from Python, where if I want to reference the object I'm currently
iterating on, I can do so very simply -
for object in list:
print(object)
The direct equivalent of that in PowerShell is the foreach statement (loop):
$list = 1..3 # create a 3-element array with elements 1, 2, 3
foreach ($object in $list) {
$object # expression output is *implicitly* output
}
Note that you cannot directly use a foreach statement in a PowerShell pipeline.
In a pipeline, you must use the ForEach-Object cmdlet instead, which - somewhat confusingly - can also be referred to as foreach, via an alias - it it is only the parsing mode that distinguishes between the statement and the cmdlet's alias.
You're using the ForEach-Object cmdlet in the pipeline, where different rules apply.
Script blocks ({ ... }) passed to pipeline-processing cmdlets such as ForEach-Object and Where-Object do not have an explicit iteration variable the way that the foreach statement provides.
Instead, by convention, such script blocks see the current pipeline input object as automatic variable $_ - or, more verbosely, as $PSItem.
While the foreach statement and the ForEach-Object cmdlet operate the same on a certain level of abstraction, there's a fundamental difference:
The foreach statement operates on collections collected up front in memory, in full.
The ForEach-Object cmdlet operates on streaming input, object by object, as each object is being received via the pipeline.
This difference amounts to the following trade-off:
Use the foreach statement for better performance, at the expense of memory usage.
Use the ForEach-Object cmdlet for constant memory use and possibly also for the syntactic elegance of a single pipeline, at the expense of performance - however, for very large input sets, this may be the only option (assuming you don't also collect a very large dataset in memory on output).
Inside the ForEach-Object scriptblock, the current item being iterated over is copied to $_:
Get-ChildItem -Filter gpresult.xml |ForEach-Object {
# `$_` is a FileInfo object, `$_.FullName` holds the absolute file system path
[xml]$file = Get-Content -LiteralPath $_.FullName
}
If you want to specify a custom name, you can either specify a -PipelineVariable name:
Get-ChildItem -Filter gpresult.xml -PipelineVariable fileinfo |ForEach-Object {
# `$fileinfo` is now a FileInfo object, `$fileinfo.FullName` holds the absolute file system path
[xml]$file = Get-Content -LiteralPath $fileinfo.FullName
}
or use a foreach loop statement, much like for object in list in python:
foreach($object in Get-ChildItem -Filter gpresult.xml)
{
[xml]$file = Get-Content -LiteralPath $object.FullName
}
Another way...
$dlist = Get-ChildItem -Path "\\dc-16\share\GPO_Backups\$stamp"
foreach ($dir in $dlist) {
# I want to reference the current folder here
[xml]$file = Get-Content -Path (Join-Path -Path $_.FullName -ChildPath 'gpresult.xml')
$name = $file.GPO.Name
}
Here's my solution. It's annoying that $_ doesn't have the full path. $gpath is easier to work with than $_.fullname for joining the two strings together on the next line with get-content. I get a gpreport.xml file when I try backup-gpo. Apparently you can't use relative paths like .\gpo_backups\ with backup-gpo.
mkdir c:\users\js\gpo_backups\
get-gpo -all | where displayname -like '*mygpo*' |
backup-gpo -path c:\users\js\gpo_backups\
Get-ChildItem -Path .\GPO_Backups\ | ForEach-Object {
$gpath = $_.fullname
[xml]$file = Get-Content -Path "$gpath\gpreport.xml"
$file.GPO.Name
}

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory