PowerShell: Select-Object: Select half of files - powershell

Windows 10, version 10.0.19042.868
Hello,
I have PowerShell script for selecting first XXX files and moving into another folder.
Get-ChildItem -File *.txt | Sort-Object Name | Select-Object -First 25000 | Move-Item -Destination C:\txtFilesToProcess\
But how to select half of files from Get-ChildItem?

Assign the output of Get-ChildItem to a variable so you know the number of files in advance.
This was the accepted solution:
$files = Get-ChildItem -File *.txt | Sort-Object Name
$files | Select-Object -First ($files.Count / 2) | Move-Item -Destination C:\txtFilesToProcess\
Alternative way using a single pipeline (more for academic purposes as it makes the code harder to understand):
,(Get-ChildItem -File *.txt) | ForEach-Object {
$_ | Select-Object -First ($_.Count / 2) | Move-Item -Destination C:\txtFilesToProcess\ -WhatIf
}
The parentheses (aka grouping operator) around the Get-ChildItem command collects all output of the command in an array, before proceeding with the next pipeline command.
Additionally the comma operator is required as a way to prevent enumeration of the array elements (see this Q&A for details). It creates an array that contains the output array from Get-ChildItem as a single element.
Now ForEach-Object operates on the whole Get-ChildItem output array, so we can use $_.Count to get the total number of files.

Related

Using Powershell, how to return a list of files based on the existence of duplicate files with a different naming convention?

There are multiple .webp files in a project folder. Some .webps are the original picture and some function as thumbnail (their size is different). The used naming convention is: original files are just called NAME.webp and tumbnails are NAME-thumb.webp.
I am trying to return all .webp files based on if the corresponding thumb-webp exists. So if picture SAMPLE.webp has a SAMPLE-thumb.webp, don't add this file to the list. But if SAMPLE.webp doesn't have a corresponding SAMPLE-thumb.webp, then do at it to the list.
This is what i've tried so far:
$example = Get-ChildItem -File $dir\*.webp |
Group-Object { $_.BaseName } |
Where-Object { $_.Name -NotContains "-thumb" } |
ForEach-Object Group
You can get this without the grouping with a Where-Object and testing paths.
Get-ChildItem -File $dir\*.webp |
Where-Object {$_.Name -notmatch "-thumb" -and -not(Test-Path ($_.FullName -replace ".webp","-thumb.webp"))}
This should get you a list of all the files that do not have a corresponding thumbnail file.
You can do the following:
(Get-ChildItem $dir\*.webp -File |
Group-Object {$_.BaseName -replace '-thumb$'} |
Where Count -eq 1).Group
You must have a commonality with grouping. Replacing the ending -thumb in the BaseName property creates that. If there is no filename and filename-thumb the resulting GroupInfo will have a count value of 1.
Using the syntax ().Group returns all file objects. If you want to process code against each file, you may use Foreach-Object instead:
Get-ChildItem $dir\*.webp -File |
Group-Object {$_.BaseName -replace '-thumb$'} |
Where Count -eq 1 | Foreach-Object {
$_.Group
}

How do I delete all .webp files in a directory based on if the duplicate .jpeg file (with the same .BaseName) has changed in the last 24 hours?

So, I have a directory of pictures for a website. There are pictures that are the same with both a .jpeg extension and a .webp extension. I want to write a PowerShell script that finds all the existing .jpeg files that changed in the last 24 hours, and then find the respective .webp file and delete the .webp file.
I've tried this to get all the .webp files that can be deleted but it doesn't seem to work:
$images = Get-ChildItem -Path $dir\*.jpg, $dir\*.webp |
Group-Object { $_.BaseName } |
Where-Object {($_.CreationTime -gt (Get-Date).AddDays(-1)) -or ($_.Group.Extension -notcontains '.jpg')} |
ForEach-Object Group
I think this is easier:
Get a list of basenames for the jpg files that were last modified as of yesterday, next get a list of files in the same directory with the .webp extension that have a BaseName matching one of the jpg basenames and then remove these.
$dir = 'D:\Test'
$refdate = (Get-Date).AddDays(-1).Date
$jpegs = (Get-ChildItem -Path $dir -Filter '*.jpg' -File | Where-Object { $_.LastWriteTime -gt $refdate }).BaseName
Get-ChildItem -Path $dir -Filter '*.webp' -File | Where-Object { $jpegs -contains $_.BaseName } | Remove-Item -WhatIf
Onece you are satisfied the correct files are getting deleted, remove the safety -WhatIf switch and run again.
Theo's helpful answer shows you an alternative approach; if you want to stick with the Group-Object approach:
$webpFilesToDelete =
Get-ChildItem -Path $dir\*.jpg, $dir\*.webp |
Group-Object BaseName | Where-Object Count -eq 2 |
ForEach-Object {
# Test the creation date of the *.jpg file.
if ($_.Group[0].CreationTime -gt (Get-Date).AddDays(-1)) {
$_.Group[1] # Output the corresponding *.webp file
}
}
Note that, as in your own attempt, .CreationTime is used, though note if there's a chance that the files are updated again after creation and that is the timestamp you care about, you should use .LastWriteTime.
The command relies on the fact that Group-Object sorts the elements of the groups it creates by the sort criteria, which in this case, due to grouping by a string property - means lexical sorting in which .jpg files are listed before .webp files.
Therefore, for groups that have 2 elements (Where-Object Count -eq 2), implying that for the given base name both a .jpg and a .webp file exist, $_.Group[0] refers to the .jpg file, and $_.Group[1] to the .webp file.
As for what you tried:
$_.CreationTime yields $null, because $_ in your command refers to the group-information object at hand (an instance of Microsoft.PowerShell.Commands.GroupInfo), as output by Group-Object, and this type has no such property.
Also, since you're using Where-Object, you're simply filtering groups, so that any group that passes the filter tests is passed through as-is, and ForEach-Object Group then outputs both files in the group.
Thanks for your helpfull answers and the clarification! I thougth initially that I needed .LastWriteTime, but I needed .CreationTime. This serves my needes. Here is the result:
$refdate = (Get-Date).AddDays(-1).Date
$jpegs = (Get-ChildItem -Path $dir -Filter '*.jpg' -File | Where-Object { $_.CreationTime -gt $refdate }).BaseName
$webp = Get-ChildItem -Path $dir -Filter '*.webp' -File | Where-Object { $jpegs -contains $_.BaseName }

Filenames matching regex but in folder with wildcard

This is a follow up question of: PowerShell concatenate output of Get-ChildItem
This code works fine:
Get-ChildItem -Path "D:\Wim\TM1\TI processes" -Filter "*.vue" -Recurse -File |
Where-Object { $_.BaseName -match '^[0-9]+$' } |
ForEach-Object { ($_.FullName -split '\\')[-2,-1] -join '\' } |
Out-File D:\wim.txt
But I would need to restrict the search folder to only certain folders, basically, this filter: D:\Wim\TM1\TI processes\\*}vues (so all subfolders ending in }vues).
If I add that wildcard condition I get no result. Without the restriction, I get the correct result. Is this possible please?
The idea is to get rid of the 3rd line in the first output (which was a copy/paste by me) and also to minimize the number of folders to look at.
You can nest two Get-ChildItem calls:
An outer Get-ChildItem -Directory -Recurse call to filter directories of interest first,
an inner Get-ChildItem -File call that, for each directory found, examines and processes the files of interest.
Get-ChildItem -Path "D:\Wim\TM1\TI processes" -Filter "*}vues" -Recurse -Directory |
ForEach-Object {
Get-ChildItem -LiteralPath $_.FullName -Filter "*.vue" -File |
Where-Object { $_.BaseName -match '^[0-9]+$' } |
ForEach-Object { ($_.FullName -split '\\')[-2,-1] -join '\' }
} | Out-File D:\wim.txt
Note: The assumption is that all *.vue files of interest are located directly in each *}vues folder.
As for what you tried:
Given that you're limiting items being enumerated to files (-File), your directory-name wildcard pattern *}vues never gets to match any directory names and, in the absence of files matching that pattern, returns nothing.
Generally, with -Recurse it is conceptually cleaner not to append the wildcard pattern directly to the -Path argument, so as to better signal that the pattern will be matched in every directory in the subtree.
In your case you would have noticed your attempt to filter doubly, given that you're also using the -Filter parameter.

Copy files listed in a txt document, keeping multiple files of the same name in PowerShell

I have a bunch of lists of documents generated in powershell using this command:
Get-ChildItem -Recurse |
Select-String -Pattern "acrn164524" |
group Path |
select Name > test.txt
In this example it generates a list of files containing the string acrn164524 the output looks like this:
Name
----
C:\data\logo.eps
C:\data\invoice.docx
C:\data\special.docx
InputStream
C:\datanew\special.docx
I have been using
Get-Content "test.txt" | ForEach-Object {
Copy-Item -Path $_ -Destination "c:\destination\" -Recurse -Container -Force
}
However, this is an issue if two or more files have the same name and also throws a bunch of errors for any lines in the file that are not a path.
sorry if I was not clear enough I would like to keep files with the same name by appending something to the end of the file name.
You seem to want the files, not the output of Select-String. So let's keep the files.
Get-ChildItem -Recurse -File | Where-Object {
$_ | Select-String acrn164524 -Quiet
} | Select-Object -ExpandProperty FullName | Out-File test.txt
Here
-File will make Get-ChildItem only return actual files. Think
about using a filter like *.txt to reduce the workload more.
-Quiet will make Select-String return $true or $false, which
is perfect for Where-Object.
Instead of Select-Object -ExpandProperty X in order to retrieve an array of raw property values (as opposed to an array of PSObjects, which is what Select-Object would normally do), it's simpler to use ForEach-Object X instead.
Get-ChildItem -Recurse -File | Where-Object {
$_ | Select-String acrn164524 -Quiet
} | ForEach-Object FullName | Out-File test.txt

How to retrieve recursively any files with a specific extensions in PowerShell?

For a specific folder, I need to list all files with extension .js even if nested in subfolders at any level.
The result for the output console should be a list of file names with no extension line by line to be easily copy and pasted in another application.
At the moment I am trying this, but in output console I get several meta information and not a simple list.
Get-ChildItem -Path C:\xx\x-Recurse -File | sort length –Descending
Could you please provide me some hints?
If sorting by Length is not a necessity, you can use the -Name parameter to have Get-ChildItem return just the name, then use [System.IO.Path]::GetFileNameWithoutExtension() to remove the path and extension:
Get-ChildItem -Path .\ -Filter *.js -Recurse -File -Name| ForEach-Object {
[System.IO.Path]::GetFileNameWithoutExtension($_)
}
If sorting by length is desired, drop the -Name parameter and output the BaseName property of each FileInfo object. You can pipe the output (in both examples) to clip, to copy it into the clipboard:
Get-ChildItem -Path .\ -Filter *.js -Recurse -File| Sort-Object Length -Descending | ForEach-Object {
$_.BaseName
} | clip
If you want the full path, but without the extension, substitute $_.BaseName with:
$_.FullName.Remove($_.FullName.Length - $_.Extension.Length)
The simple option is to use the .Name property of the FileInfo item in the pipeline and then remove the extension:
Get-ChildItem -Path "C:\code\" -Filter *.js -r | % { $_.Name.Replace( ".js","") }
There are two methods for filtering files: globbing using an Wildcard, or using a Regular Expression (Regex).
Warning: The globbing method has the drawback that it also matches files which should not be matched, like *.jsx.
# globbing with Wildcard filter
# the error action prevents the output of errors
# (ie. directory requires admin rights and is inaccessible)
Get-ChildItem -Recurse -Filter '*.js' -ErrorAction 'SilentlyContinue'
# filter by Regex
Where-Object { $_.Name -Match '.*\.js$' }
You then can sort by name or filesize as needed:
# sort the output
Sort-Object -PropertyName 'Length'
Format it a simple list of path and filename:
# format output
Format-List -Property ('Path','Name')
To remove the file extension, you can use an select to map the result:
Select-Item { $_.Name.Replace( ".js", "") }
Putting it all together, there is also a very short version, which you should not use in scripts, because it's hardly readable:
ls -r | ? { $_.Name -matches '.*\.js' } | sort Length | % { $_.Name.Replace( ".js", "") | fl
If you like brevity, you can remove the ForEach-Object and quotes. -Path defaults to the current directory so you can omit it
(Get-ChildItem -Filter *.js -Recurse).BaseName | Sort length -Descending
The above Answers works fine. However in WIndows there is a alias called ls the same as on linux so another shorter command that works too would be ls -Filter *.exe
Use BaseName for the file name without the file extension.
Get-ChildItem -Path ".\*.js" | Sort-Object Length -Descending | ForEach-Object {
$_.BaseName
}
I always used cygwin for this in the past. My last employer locked down our environments and it wasn't available. I like to review the latest files I've modified often. I created the following environment variable named LatestCode to store the script. I then execute it with: iex $env:latest code.
Here is the script: get-childitem “.” -recurse -include *.ts, *.html , *.sass, *.java, *.js, *.css | where-object {$_.mode -notmatch “d”} | sort lastwritetime -descending | Select-Object -First 25 | format-table lastwritetime, fullname -autosize