Getting most recent objects in a folder excluding certain strings - powershell

I have folders that contain files, for the sake of the question, named as follows:
a-001.txt
a-002.txt
b-001.txt
b-002.txt
d-001.txt
d-002.txt
Now I am using PowerShell to initially order these files so that the top of the list is the most recent file in the folder:
d-002.txt
b-002.txt
a-001.txt
a-002.txt
b-001.txt
d-001.txt
EDIT: I then store the top X recent number of files into a variable. However, I want to ignore anything that starts with A if I already have one that begins with A in my array but still ensure I end up with X files which are the most recent. I.e. from above, I would want to end up with below if X was 4.
d-002.txt
b-002.txt
a-001.txt
b-001.txt
This is a simple example, the folders I am dealing with contain 1000s of files - with more complex naming conventions but the logic is the same. How can I handle this in PowerShell?

Removing the logic for any other Sort-Object and Select-Object criteria as you already have that addressed I present the following.
Get-ChildItem $somePath | Select-Object *,#{Label="Prefix";Expression={(($_.Name) -Split "-",2)[0]}} | Group-Object prefix | ForEach-Object{
$_.Group | Select-Object -First 1 -Property Fullname
}
What happens here is that we add a property to the output of Get-ChildItem called "Prefix". Now, your criteria might be more complicated but given the sample I assumed the files were being grouped by the contents of the name before the first "-". So we take every file name and build its prefix based on that. The magic comes from Group-Object which will group all items and then we just select the first one. In your case that would be the newest X amount. Let me know if you are having trouble integrating this.
Aside from grouping logic any sorting an what not would need to exists before the Select-Object in our example above.
FYI for other readers
There were issues with OP's actual data since the above code didnt work exactly. We worked it out in chat and using the same logic we able to address the OPs concern. The test data in the question and my answer work as intended.

Related

Sort files by number in the end of a file name

I need help with a tricky task I have.
I have a list of *.xml files which have the same ending.
The names below show only the naming convention rule.
EEEEEEEEEE-JJJ-ZZZZZ_DDDDD-XML--0000001629358212.xml
EEEEEEEEEE-JJJ-OOOOOO-XML--0000001533506936.xml
EEEEEEEEEE-JJJ-AAAAAA-XML--0000001572627196.xml
Filename length maybe is different but it's important for me to sort it by this number in the end.
With SQL syntax it would be easier but I need a PS solution).
Sort-Object by LastWriteTime works better than other simple ways - but when it comes to a few files with the same hh:mm PS mixes the order.
At the beginning of the chains of steps that should happen with these files, I remove a time stamps from the beginning of each file name.
I was able to do it with this:
dir *EEEEEEEEEE*.xml | Rename-Item -NewName {$_.name.substring($_.BaseName.IndexOf("EEEEEEEEE"))}
But I'm unable to write something similar for sorting.
Maybe someone can advise how to solve it? Maybe You have more experience with PS Substring.
Thanks in advance.
Read and follow Sort-Object, scrolling down to the -Property parameter:
… The Property parameter's value can be a calculated property. To
create a calculated property, use a hash table.
Valid keys for a hash table are as follows:
expression - <string> or <script block>
ascending or descending - <boolean>
For more information, see about_Calculated_Properties.
Read the Examples section as well. Use
Get-ChildItem -File -Filter "*EEEEEEEEEE*-*.xml" |
Sort-Object -Property #{Expression = { ($_.BaseName -split '-')[-1] }}
Thank you guys for the help.
I used this line below and it worked in my case.
GCI EEEEEEEEEE*.xml | Sort {$_.Name.substring($_.Name.Length - 20,14)}
It reads only the number at the end of the file name and sorts the files in the way I need.
Best regards

Powershell "if more than one, then delete all but one"

Is there a way to do something like this in Powershell:
"If more than one file includes a certain set of text, delete all but one"
Example:
"...Cam1....jpg"
"...Cam2....jpg"
"...Cam2....jpg"
"...Cam3....jpg"
Then I would want one of the two "...Cam2....jpg" deleted, while the other one should stay.
I know that I can use something like
gci *Cam2* | del
but I don't know how I can make one of these files stay.
Also, for this to work, I need to look through all the files to see if there are any duplicates, which defeats the purpose of automating this process with a Powershell script.
I searched for a solution to this for a long time, but I just can't find something that is applicable to my scenario.
Get a list of files into a collection and use range operator to select a subset of its elements. To remove all but first element, start from index one. Like so,
$cams = gci "*cam2*"
if($cams.Count -gt 1) {
$cams[1..$cams.Count] | remove-item
}
Expanding on the idea of commenter boxdog:
# Find all duplicately named files.
$dupes = Get-ChildItem c:\test -file -recurse | Group-Object Name | Where-Object Count -gt 1
# Delete all duplicates except the 1st one per group.
$dupes | ForEach-Object { $_.Group | Select-Object -Skip 1 | Remove-Item -Force }
I've split this up into two sub tasks to make it easier to understand. Also it is a good idea to always separate directory iteration from file deletion, to avoid inconsistent results.
First statement uses Group-Object to group files by names. It outputs a Count property containing the number of files per group. Then Where-Object is used to get only groups that contain more than one file, which will be the dupes. The result is stored in variable $dupes, which is an array that looks like this:
Count Name Group
----- ---- -----
2 file1.txt {C:\test\subdir1\file1.txt, C:\test\subdir2\file1.txt}
2 file2.txt {C:\test\subdir1\file2.txt, C:\test\subdir2\file2.txt}
The second statement uses ForEach-Object to iterate over all groups of duplicates. From the Group-Object call of the 1st statement we got a Group property that contains an array of file informations. Using Select-Object -Skip 1 we select all but the 1st element of this array, which are passed to Remove-Item to delete the files.

Count files recursively in several paths, with exclusions and eliminating redundant files in Powershell

I am trying to create a set of instructions in Powershell on how to:
Retrieve files from several non related folders and count them, with exceptions of certain files and/or subfolders implemented
Give me the last modified file (most recent)
Remove duplicate files based on name, date-time, and filesize, and not just name (same name but different files can be in several folders), because the file could be repeated in the backup parameters as redundant file wildcard/folders, which will mean the exact same file in same path can be counted twice or more, and then ruin my file count.
What I have done so far (example of a browser profile path, after I enter it):
(The "arrows" are pointing to the various parameters)
File File folder many JSON inside folder Folder
| | | | |
v v v v v
GCI bookmarks, 'Current Session', databases\*, extensions\manifest.json, 'Local Storage\*' -Recurse | ? { $_.FullName -inotmatch 'Local Storage\\http* | Databases\\http*'} | Get-Unique | measure-object -line
^ ^
| |
EXCLUSIONS: HTTP* files inside folder HTTP* subfolders inside folder
This already filters all the files I want from the ones I don't want, count them, and remove duplicates, BUT: also removes many Json files with the same name inside different folders, without taking into account file size (though I think it still differentiates dates)
Bottomline what I want is that capability of command line RAR and 7Zip, to know exactly what to include in the archive: We give an input of files and folders, we may by mistake include a subfolder already covered by a previous wild card, we program exceptions (-x! in case of 7zip), and the program knows exactly what files to include and exclude, and without compressing the same file twice
This is so I can know if a new backup is necessary or not, relatively to the previous one (different number of files, or most recently modified file). I know about the "update" function on rar and 7zip, but it's not what I want.
Speaking about the most recently written file, is there a way of some sort of "parallel piping"? A recursive file search that can output the results to 2 commands down the chain, instead of doing a (long) scan for the file count, and repeating the scan to find the most recent file?
What I mean is:
**THIS: **Instead of THIS:
_______ >FILE COUNT
|
SCAN --+ SCAN -->FILE COUNT ; SCAN -->MOST RECENT FILE
|_______ >MOST RECENT FILE
I've done almost all the work, but I hit a wall. All I'm missing is the removal of redundant files (e.g. same exact file in the same path being counted twice or more due to redundant parameters entered, though I want same name files in differents folders to still be counted); and while at it I wouldn't mind to get the last modified file also, so I don't have to repeat the same scan again (powershell can be very slow sometimes).
This last point is less important but it would be nice if it worked though.
Any help you can give me on htis would be greatly appreciated.
Thanks for reading :-)
something along the lines of
#generate an example list with the exact same files listed more than once, and possibly files by the same name in sub-folders
$lst = ls -file; $lst += ls -file -recurse
$UniqueFiles = ($lst | sort -Property FullName -Unique) #remove exact dupes
$UniqueFiles = ($UniqueFiles| sort -Property Directory,Name) #make it look like ls again
# or most recent file per filename
$MostRecent = ($lst | sort -Property LastWriteTime -Descending | group -Property Name | %{$_.group[0]})
Although I don't understand how the file size plays in unless you're looking for files with the same size and name to only be listed once regardless of where it lives in the folder tree. In which case you may want to group by hash value so even if it has a different name it'll still be listed only once.
$MostRecentSameSize = ($lst | sort -Property LastWriteTime -Descending | group -Property #{exp='Name'},#{exp='Length'} | %{$_.group[0]})
# or by hash
$MostRecentByHash = ($lst | sort -Property LastWriteTime -Descending | group -Property #{exp={(Get-FileHash $_ -a md5).hash}} | %{$_.group[0]})

Rename Files with Index(Excel)

Anyone have any ideas on how to rename files by finding an association with an index file?
I have a file/folder structure like the following:
Folder name = "Doe, John EO11-123"
Several files under this folder
The index file(MS Excel) has several columns. It contains the names in 2 columns(First and Last). It also has a column containing the number EO11-123.
What I would like to do is write maybe a script to look at the folder names in a directory, compare/find an associated value in the index file(like that number EO11-123) and then rename all the files under the folder using a 4th column value in the index.
So,
Folder name = "Doe, John EO11-123", index column1 contains same value "EO11-123", use column2 value "111111_000000" and rename all the files under that directory folder to "111111_000000_0", "111111_000000_1", "111111_000000_2" and so on.
This possible with powershell or vbscript?
Ok, I'll answer your questions in your comment first. Importing the data into PowerShell allows you to make an array in powershell that you can match against, or better yet make a HashTable to reference for your renaming purposes. I'll get into that later, but it's way better than trying to have PowerShell talk to Excel and use Excel's search functions because this way it's all in PowerShell and there's no third party application dependencies. As for importing, that script is a function that you can load into your current session, so you run that function and it will automatically take care of the import for you (it opens Excel, then opens the XLS(x) file, saves it as a temp CSV file, closes Excel, imports that CSV file into PowerShell, and then deletes the temp file).
Now, you did not state what your XLS file looks like, so I'm going to assume it's got a header row, and looks something like this:
FirstName | Last Name | Identifier | FileCode
Joe | Shmoe | XA22-573 | JS573
John | Doe | EO11-123 | JD123
If that's not your format, you'll need to either adapt my code, or your file, or both.
So, how do we do this? First, download, save, and if needed unblock the script to Import-XLS. Then we will dot source that file to load the function into the current PowerShell session. Once we have the function we will run it and assign the results to a variable. Then we can make an empty hashtable, and for each record in the imported array create an entry in the hashtable where the 'Identifier' property (in your example above that would be the one that has the value "EO11-123" in it), make that the Key, then make the entire record the value. So, so far we have this:
#Load function into current session
. C:\Path\To\Import-XLS.ps1
$RefArray = Import-XLS C:\Path\To\file.xls
$RefHash = #{}
$RefArray | ForEach( $RefHash.Add($_.Identifier, $_)}
Now you should be able to reference the identifier to access any of the properties for the associated record such as:
PS C:\> $RefHash['EO11-123'].FileCode
JD123
Now, we just need to extract that name from the folder, and rename all the files in it. Pretty straight forward from here.
Get-ChildItem c:\Path\to\Folders -directory | Where{$_.Name -match "(?<= )(\S+)$"}|
ForEach{
$Files = Get-ChildItem $_.FullName
$NewName = $RefHash['$($Matches[1])'].FileCode
For($i = 1;$i -lt $files.count;$i++){
$Files[$i] | Rename-Item -New "$NewName_$i"
}
}
Edit: Ok, let's break down the rename process here. It is a lot of piping here, so I'll try and take it step by step. First off we have Get-ChildItem that gets a list of folders for the path you specify. That part's straight forward enough. Then it pipes to a Where statement, that filters the results checking each one's name to see if it matches the Regular Expression "(?<= )(\S+)$". If you are unfamiliar with how regular expressions work you can see a fairly good breakdown of it at https://regex101.com/r/zW8sW1/1. What that does is matches any folders that have more than one "word" in the name, and captures the last "word". It saves that in the automatic variable $Matches, and since it captured text, that gets assigned to $Matches[1]. Now the code breaks down here because your CSV isn't laid out like I had assumed, and you want the files named differently. We'll have to make some adjustments on the fly.
So, those folder that pass the filter will get piped into a ForEach loop (which I had a typo in previously and had a ( instead of {, that's fixed now). So for each of those folders it starts off by getting a list of files within that folder and assigning them to the variable $Files. It also sets up the $NewName variable, but since you don't have a column in your CSV named 'FileCode' that line won't work for you. It uses the $Matches automatic variable that I mentioned earlier to reference the hashtable that we setup with all of the Identifier codes, and then looks at a property of that specific record to setup the new name to assign to files. Since what you want and what I assumed are different, and your CSV has different properties we'll re-work both the previous Where statement, and this line a little bit. Here's how that bit of the script will now read:
Get-ChildItem c:\Path\to\Folders -directory | Where{$_.Name -match "^(.+?), .*? (\S+)$"}|
ForEach{
$Files = Get-ChildItem $_.FullName
$NewName = $Matches[2] + "_" + $Matches[1]
That now matches the folder name in the Where statement and captures 2 things. The first thing it grabs is everything at the beginning of the name before the comma. Then it skips everything until it gets tho the last piece of text at the end of the name and captures everything after the last space. New breakdown on RegEx101: https://regex101.com/r/zW8sW1/2
So you want the ID_LName, which can be gotten from the folder name, there's really no need to even use your CSV file at this point I don't think. We build the new name of the files based off the automatic $Matches variable using the second capture group and the first capture group and putting an underscore between them. Then we just iterate through the files with a For loop basing it off how many files were found. So we start with the first file in the array $Files (record 0), add that to the $NewName with an underscore, and use that to rename the file.

Comparing two text files and only keeping unique values

All,
I am VERY new to powershell and am attempting to write a script and have run into an issue.
I currently have two text files. For argument sake the first can be called required.txt and the second can be called exist.txt.
I have a script which queries a server and determines a list of all existing groups and writes these to a text file. At the same time the customer has a list of new groups they wish to create. I want to compare the new list (required.txt) with the existing list (exist.txt) and anything which doesn't exist be piped out to a new text file which is then picked up and imported using another process.
I've got the scripting done to gather the list from the server I just need to know how to do the comparison between the existing and required.
Any suggestions welcome.
Richard
you don't have to use as much variables :
$FinalGroups=Compare-Object (get-content .\required.txt) (get-content .\existing.txt) |
where {$_.SideIndicator -eq "<="} |
select -ExpandProperty inputObject |
sort