Compress File per file, same name - powershell

I hope you are all safe in this time of COVID-19.
I'm trying to generate a script that goes to the directory and compresses each file to .zip with the same name as the file, for example:
sample.txt -> sample.zip
sample2.txt -> sample2.zip
but I'm having difficulties, I'm not that used to powershell, I'm learning and improving this script. In the end it will be a script that deletes files older than X days, compresses files and makes them upload in ftp .. the part of excluding with more than X I've already managed it for days, now I grabbed a little bit on this one.
Last try at moment.
param
(
#Future accept input
[string] $InputFolder,
[string] $OutputFolder
)
#test folder
$InputFolder= "C:\Temp\teste"
$OutputFolder="C:\Temp\teste"
$Name2 = Get-ChildItem $InputFolder -Filter '*.csv'| select Name
Set-Variable SET_SIZE -option Constant -value 1
$i = 0
$zipSet = 0
Get-ChildItem $InputFolder | ForEach-Object {
$zipSetName = ($Name2[1]) + ".zip "
Compress-Archive -Path $_.FullName -DestinationPath "$OutputFolder\$zipSetName"
$i++;
$Name2++
if ($i -eq $SET_SIZE) {
$i = 0;
$zipSet++;
}
}

You can simplify things a bit, and it looks like most of the issues are because in your script example $Name2 will contain a different set of items than the Get-ChildItem $InputFolder will return in the loop (i.e. may have other objects other than .csv files).
The best way to deal with things is to use variables with the full file object (i.e. you don't need to use |select name). So I get all the CSV file objects right away and store in the variable $CsvFiles.
We can additionally use the special variable $_ inside the ForEach-Object which represents the current object. We also can use $_.BaseName to give us the name without the extension (assuming that's what you want, otherwise use $_Name to get a zip with the name like xyz.csv).
So a simplified version of the code can be:
$InputFolder= "C:\Temp\teste"
$OutputFolder="C:\Temp\teste"
#Get files to process
$CsvFiles = Get-ChildItem $InputFolder -Filter '*.csv'
#loop through all files to zip
$CsvFiles | ForEach-Object {
$zipSetName = $_.BaseName + ".zip"
Compress-Archive -Path $_.FullName -DestinationPath "$OutputFolder\$zipSetName"
}

Related

Find files with partial name match and remove desired file

I have a little over 12000 files that I need to sort through.
18-100-00000-LOD-H.pdf
18-100-00000-LOD-H-1C.pdf
21-200-21197-LOD-H.pdf
21-200-21197-LOD-H-1C.pdf
21-200-21198-LOD-H.pdf
21-200-21198-LOD-H-1C.pdf
I need a way to go through all the files and delete the LOD-H version of the files.
EX:
21-200-21198-LOD-H.pdf
21-200-21198-LOD-H-1C.pdf
With the partial match being the 5 digit code I need a script that would delete the LOD-H case of the partial match.
So far this is what I have but it won't work because I need to supply values for the pattern but since there isn't one set pattern and more like multiple patterns I don't know what to supply it with
$source = "\\Summerhall\GLUONPREP\Market Centers\~Pen Project\Logos\ALL Office Logos"
$destination = "C:\Users\joshh\Documents\EmptySpace"
$toDelete = "C:\Users\joshh\Documents\toDelete"
$allFiles = #(Get-ChildItem $source -File | Select-Object -ExpandProperty FullName)
foreach($file in $allFiles) {
$content = Get-Content -Path $file
if($content | Select-String -SimpleMatch -Quiet){
$dest = $destination
}
else{
$dest = $toDelete
}
}
Any help would be super appreciated, even links to something similar or even links to documentation so I can start piecing a script of my own would be super helpful.
Thank you!
This should work for what you need:
# Get a list of the files with -1C preceeding the extension
$1cFiles = #( ( Get-ChildItem -File "${source}/*-LOD-H-1C.pdf" ).Name )
# Retreive files that match the same pattern without 1C, and iterate over them
Get-ChildItem -File "${source}/*-LOD-H.pdf" | ForEach-Object {
# Get the name of the file if it had the -1C suffix preceeding the .ext
$useName = $_.Name.Insert($_.Name.LastIndexOf('.pdf'), '-1C')
# If the -1C version of the file exists, remove the current (non-1C) file
if( $1cFiles -contains $useName ) {
Remove-Item -Force $_
}
}
Basically, look for the 1C files in $toDelete, then iterate over the non-1C files in $toDelete, removing the non-1C file if adding -1C before the file extension matches an existing file with 1C in the name.

How to select [n] Items from a CSV list to assign them to a variable and afterwards remove those items and save the file using PowerShell

I'm parsing a CSV file to get the names of folders which I need to copy to another location. Because there are hundreds of them, I need to select the first 10 or so and run the copy routine but to avoid copying them again I'm removing them from the list and saving the file.
I'll run this on a daily scheduled task to avoid having to wait for the folders to finish copying. I'm having a problem using the 'Select' and 'Skip' options in the code (see below), if I remove those lines the folders are copied (I'm using empty folders to test) but if I have them in, then nothing happens when I run this in PowerShell.
I looked around in other questions about similar issues but did not find anything that answers this particular issue selecting and skipping rows in the CSV.
$source_location = 'C:\Folders to Copy'
$folders_Needed = gci $source_location
Set-Location -Path $source_location
$Dest = 'C:\Transferred Folders'
$csv_name = 'C:\List of Folders.csv'
$csv_Import = Get-Content $csv_name
foreach($csv_n in $csv_Import | Select-Object -First 3){
foreach ($folder_Tocopy in $folders_Needed){
if("$folder_Tocopy" -contains "$csv_n"){
Copy-Item -Path $folder_Tocopy -Destination $Dest -Recurse -Verbose
}
}
$csv_Import | Select-Object -Skip 3 | Out-File -FilePath $csv_name
}
It should work with skip/first as in your example, but I cannot really test it without your sample data. Also, it seems wrong that you write the same output to the csv file at every iteration of the loop. And I assume it's not a csv file but actually just a plain text file, a list of folders? Just folder names or full paths? (I assume the first.)
Anyways, here is my suggested update to the script (see comments):
$source_location = 'C:\Folders to Copy'
$folders_Needed = Get-ChildItem $source_location
$Dest = 'C:\Transferred Folders'
$csv_name = 'C:\List of Folders.csv'
$csv_Import = #(Get-Content $csv_name)
# optional limit
# set this to $csv_Import.Count if you want to copy all folders
$limit = 10
# loop over the csv entries
for ($i = 0; $i -lt $csv_Import.Count -and $i -lt $limit; $i++) {
# current line in the csv file
$csv_n = $csv_Import[$i]
# copy the folder(s) which name matches the csv entry
$folders_Needed | where {$_.Name -eq $csv_n} | Copy-Item -Destination $Dest -Recurse -Verbose
# update the csv file (skip all processed entries)
$csv_Import | Select-Object -Skip ($i + 1) | Out-File -FilePath $csv_name
}

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory

Powershell - Assigning unique file names to duplicated files using list inside a .csv or .txt

I have limited experience with Powershell doing very basic tasks by itself (such as simple renaming or moving files), but I've never created one that has the need to actually extract information from inside a file and apply that data directly to a file name.
I'd like to create a script that can reference a simple .csv or text file containing a list of unique identifiers and have it assign those to a batch of duplicated files (they all have the same contents) that share a slightly different name in the form of a 3-digit number appended as the prefix of a generic name.
For example, let's say my list of files are something like this:
001_test.txt
002_test.txt
003_test.txt
004_test.txt
005_test.txt
etc.
Then my .csv contains an alphabetical list of what I would like those to become:
Alpha.txt
Beta.txt
Charlie.txt
Delta.txt
Echo.txt
etc.
I tried looking at similar examples, but I'm failing miserably trying to tailor them to get it to do the above.
EDIT: I didn't save what I already modified, but here is the baseline script I was messing with:
$file_server = Read-Host "Enter the file server IP address"
$rootFolder = 'C:\TEMP\GPO\source\5'
Get-ChildItem -LiteralPath $rootFolder -Directory |
Where-Object { $_.Name -as [System.Guid] } |
ForEach-Object {
$directory = $_.FullName
(Get-Content "$directory\gpreport.xml") |
ForEach-Object { $_ -replace "99.999.999.999", $file_server } |
Set-Content "$directory\gpreport.xml"
# ... etc
}
I think this is to replace a string inside a file though. I need to replace the file name itself using a list from another file (that is not getting renamed), while not changing the contents of the files that are being renamed.
So you want to rename similar files with those listed in a text file. Ok, here's what you are going to need for my solution (alias listed in parenthesis): Get-Content (GC), Get-ChildItem (GCI), Where (?), Rename-Item, ForEach (%)
$NewNames = GC c:\temp\Namelist.txt #Path, including file name, to list of new names
$Name = "dog.txt" #File name without the 001_ prefix
$Path = "C:\Temp" #Path to search
$i=0
GCI $path | ?{$_.Name -match "\d{3}_$Name"}|%{Rename-Item $_.FullName $NewNames[$i];$i++}
Tested as working. That gets your list of new names and saves it as an array. Then it defines your file name, path, and sets $i to 0 as a counter. Then for each file that matches your pattern it renames it based off of item number $i in the array of new names, and then increments $i up one number and moves to the next file.
I haven't tested this, but it should be pretty close. It assumes you have a CSV with a column named FileNames and that you have at least as many names in that list as there are on disk.
$newNames = Import-Csv newfilenames.csv | Select -ExpandProperty FileNames
$existingFiles = Get-ChildItem c:\someplace
for ($i = 0; $i -lt $existingFiles.count; $i++)
{
Rename-Item -Path $existingFiles[$i].FullName -NewName $newNames[$i]
}
Basically, you create two arrays and using a basic for loop steping through the list of files on disk and pull the name from the corresponding index in the newNames array.
Does your CSV file map the identifiers to the file names?
Identifier,NewName
001,Alpha
002,Beta
If so, you'll need to look up the identifier before renaming the file:
# Define the naming convention
$Suffix = '_test'
$Extension = 'txt'
# Get the files and what to rename them to
$Files = Get-ChildItem "*$Suffix.$Extension"
$Csv = Import-Csv 'Names.csv'
# Rename the files
foreach ($File in $Files) {
$NewName = ($Csv | Where-Object { $File.Name -match '^' + $_.Identifier } | Select-Object -ExpandProperty NewName)
Rename-Item $File "$NewName.$Extension"
}
If your CSV file is just a sequential list of filenames, logicaldiagram's answer is probably more along the lines of what you're looking for.

Renaming a new folder file to the next incremental number with powershell script

I would really appreciate your help with this
I should first mention that I have been unable to find any specific solutions and I am very new to programming with powershell, hence my request
I wish to write (and later schedule) a script in powershell that looks for a file with a specific name - RFUNNEL and then renames this to R0000001. There will only be one of such 'RFUNELL' files in the folder at any time. However when next the script is run and finds a new RFUNNEL file I will this to be renamed to R0000002 and so on and so forth
I have struggled with this for some weeks now and the seemingly similar solutions that I have come across have not been of much help - perhaps because of my admittedly limited experience with powershell.
Others might be able to do this with less syntax, but try this:
$rootpath = "C:\derp"
if (Test-Path "$rootpath\RFUNNEL.txt")
{ $maxfile = Get-ChildItem $rootpath | ?{$_.BaseName -like "R[0-9][0-9][0-9][0-9][0-9][0-9][0-9]"} | Sort BaseName -Descending | Select -First 1 -Expand BaseName;
if (!$maxfile) { $maxfile = "R0000000" }
[int32]$filenumberint = $maxfile.substring(1); $filenumberint++
[string]$filenumberstring = ($filenumberint).ToString("0000000");
[string]$newName = ("R" + $filenumberstring + ".txt");
Rename-Item "$rootpath\RFUNNEL.txt" $newName;
}
Here's an alternative using regex:
[cmdletbinding()]
param()
$triggerFile = "RFUNNEL.txt"
$searchPattern = "R*.txt"
$nextAvailable = 0
# If the trigger file exists
if (Test-Path -Path $triggerFile)
{
# Get a list of files matching search pattern
$files = Get-ChildItem "$searchPattern" -exclude "$triggerFile"
if ($files)
{
# store the filenames in a simple array
$files = $files | select -expandProperty Name
$files | Write-Verbose
# Get next available file by carrying out a
# regex replace to extract the numeric part of the file and get the maximum number
$nextAvailable = ($files -replace '([a-z])(.*).txt', '$2' | measure-object -max).Maximum
}
# Add one to either the max or zero
$nextAvailable++
# Format the resulting string with leading zeros
$nextAvailableFileName = 'R{0:000000#}.txt' -f $nextAvailable
Write-Verbose "Next Available File: $nextAvailableFileName"
# rename the file
Rename-Item -Path $triggerFile -NewName $nextAvailableFileName
}