Looking for a line to add that pulls the file information as below but includes an MD5 hash
It can be from certutil, but there is not a means to download that module so looking for a means that uses PowerShell without an additional update of PowerShell.
We are looking to compare two disks for missing files even when the file might be located in an alternate location.
cls
$filPath="G:/"
Set-Location -path $filPath
Get-ChildItem -Path $filPath -recurse |`
foreach-object{
$Item=$_
$Path =$_.FullName
$ParentS=($_.FullName).split("/")
$Parent=$ParentS[#($ParentS.Length-2)]
$Folder=$_.PSIsContainer
#$Age=$_.CreationTime
#$Age=$_.ModifiedDate
$Modified=$_.LastWriteTime
$Type=$_.Extension
$Path | Select-Object `
#{n="Name";e={$Item}},`
#{n="LastModified";e={$Modified}},`
#{n="Extension";e={$Type}},`
#{n="FolderName";e={if($Parent){$Parent}else{$Parent}}},`
#{n="filePath";e={$Path}}`
} | Export-csv Q:/lpdi/fileDump.csv -NoTypeInformation
Possible answer here: (Thanks Guenther)
#{name="Hash";expression={(Get-FileHash -Algorithm MD5 -Path $Path).hash}}
In this script it meets the filehash condition along with the name of the file which allows a way to find the file on the folder and know it matches another one in another location based on the hash.
I'm not sure what happens on the file hash itself. If it includes the name of the file, the hash will be different. If it is only the file itself and the path doesn't matter, it should meet the requirement. I'm not sure how to include it in the code above however
Your code could be simplified so you don't need all those 'in-between' variables.
Also, the path separator character in Windows is a backslash (\), not a forward slash (/) which makes this part of your code $ParentS=($_.FullName).split("/") not doing what you expect from it.
Try
$SourcePath = 'G:\'
Get-ChildItem -Path $SourcePath -File -Recurse | ForEach-Object {
# remove the next line if you do not want console output
Write-Host "Processing file '$($_.FullName)'.."
$md5 = ($_ | Get-FileHash -Algorithm MD5).Hash
$_ | Select-Object #{Name = 'Name'; Expression = { $_.Name }},
#{Name = 'LastModified'; Expression = { $_.LastWriteTime }},
#{Name = 'Extension'; Expression = { $_.Extension }},
#{Name = 'FolderName'; Expression = { $_.Directory.Name }},
#{Name = 'FilePath'; Expression = { $_.FullName }},
#{Name = 'FileHash'; Expression = { $md5 }}
} | Export-Csv -Path 'Q:/lpdi/fileDump.csv' -NoTypeInformation
Because getting hash values is a time consuming process I've added a Write-Host line, so you know the script did not 'hang'..
Edit: Okay so, here is my workaround as promised.
Before we start, requirements are:
Have python 3.8 or above installed and registered in windows PATH
edit the ps1 file variables accordingly
edit the python file variables accordingly
bypass powershell script execution policies
There are 4 files in the working directory (different from your target directory):
addMD5.ps1 (static)
addMD5.py (static)
fileDump-original.csv (auto-generated)
fileDump-modified.csv (auto-generated)
Here are the contents of those 4 files:
addMD5.ps1
$targetDir="C:\Users\USERname4\Desktop\myGdrive"
$workingDir="C:\Users\USERname4\Desktop\myWorkingDir"
$pythonName="addMD5.py"
$exportName = "fileDump-original.csv"
Set-Location -path $workingDir
if (Test-Path $exportName)
{
Remove-Item $exportName
}
Get-ChildItem -Path $targetDir -recurse |`
foreach-object{
$Item=$_
$Path =$_.FullName
$ParentS=($_.FullName).split("/")
$Parent=$ParentS[#($ParentS.Length-2)]
$Folder=$_.PSIsContainer
#$Age=$_.CreationTime
#$Age=$_.ModifiedDate
$Modified=$_.LastWriteTime
$Type=$_.Extension
$Path | Select-Object `
#{n="Name";e={$Item}},`
#{n="LastModified";e={$Modified}},`
#{n="Extension";e={$Type}},`
#{n="FolderName";e={if($Parent){$Parent}else{$Parent}}},`
#{n="filePath";e={$Path}}`
} | Export-csv $exportName -NoTypeInformation
python $pythonName
addMD5.py
import os, hashlib
def file_len(fname):
with open(fname) as fp:
for i, line in enumerate(fp):
pass
return i + 1
def read_nth(fname,intNth):
with open(fname) as fp:
for i, line in enumerate(fp):
if i == (intNth-1):
return line
def getMd5(fname):
file_hash = hashlib.md5()
with open(fname, "rb") as f:
chunk = f.read(8192)
while chunk:
file_hash.update(chunk)
chunk = f.read(8192)
return file_hash.hexdigest()
file1name = "fileDump-original.csv"
file2name = "fileDump-modified.csv"
try:
os.remove(file2name)
except:
pass
file2 = open(file2name , "w")
for linenum in range(file_len(file1name)):
if (linenum+1) == 1:
file2.write(read_nth(file1name,linenum+1).strip()+',"md5"\n')
else:
innerfilename = read_nth(file1name,linenum+1).split(",")[4].strip()[1:-1]
file2.write(read_nth(file1name,linenum+1).strip()+',"'+getMd5(innerfilename)+'"\n')
file2.close()
fileDump-original.csv
"Name","LastModified","Extension","FolderName","filePath"
"test1.txt","20-Jun-21 12:50:44 PM",".txt","C:\Users\USERname4\Desktop\myGdrive\test1.txt","C:\Users\USERname4\Desktop\myGdrive\test1.txt"
"test2.txt","20-Jun-21 12:50:37 PM",".txt","C:\Users\USERname4\Desktop\myGdrive\test2.txt","C:\Users\USERname4\Desktop\myGdrive\test2.txt"
fileDump-modified.csv
"Name","LastModified","Extension","FolderName","filePath","md5"
"test1.txt","20-Jun-21 12:50:44 PM",".txt","C:\Users\USERname4\Desktop\myGdrive\test1.txt","C:\Users\USERname4\Desktop\myGdrive\test1.txt","d659c1bc0a3010b0bdd45d9a8fee3196"
"test2.txt","20-Jun-21 12:50:37 PM",".txt","C:\Users\USERname4\Desktop\myGdrive\test2.txt","C:\Users\USERname4\Desktop\myGdrive\test2.txt","d55749658669d28f8549d94cd01b72ba"
Related
I have a base folder as:
D:\St\Retail\AMS\AMS\FTP-FromClient\AMS
It contains various folders of dates:
2022-04-01
2022-04-02
...
...
2022-02-02
2021-05-05
2019-04-12
And each of these folders contains own files inside the folder. So, I want to retrieve all the filename inside the folder if it has 2022-04. So if the folder has '2022-04' as the base name ,I need to retreive all the file inside the folder like '2022-04-01','2022-04-02','2022-04-03'. The way I tried is:
cls
$folerPath = 'D:\St\Retail\AMS\AMS\FTP-FromClient\AMS'
$files = Get-ChildItem $folerPath
[System.Collections.ArrayList]$data = #()
foreach ($f in $files) {
$a = Get-ChildItem $f.FullName
foreach ($inner in $a) {
echo $inner.FullName
$outfile = $inner.FullName -match '*2022-04*'
$datepart = $inner.FullName.split('\')[-1]
if ($outfile) {
$data.add($datepart + '\' + $inner.Name.Trim())
}
}
}
My final $data may contains like this:
2022-04-01/abc.txt
2022-04-02/cde.pdf
2022-04-03/e.xls
You can do this by first collecting the directories you want to explore and then loop over these to get the files inside.
Using a calculated property you can output in whatever format you like:
$folderPath = 'D:\St\Retail\AMS\AMS\FTP-FromClient\AMS'
$data = Get-ChildItem -Path $folderPath -Filter '2022-04*' -Directory | ForEach-Object {
$dir = $_.Name
(Get-ChildItem -Path $_.FullName -File |
Select-Object #{Name = 'FolderFile'; Expression = {'{0}\{1}' -f $dir, $_.Name}}).FolderFile
}
After this, $data would be a string array with this content:
2022-04-01\abc.txt
2022-04-02\cde.pdf
2022-04-03\e.xls
By using wildcards for both directory and file name, you only need a single Get-ChildItem call:
$folderPath = 'D:\St\Retail\AMS\AMS\FTP-FromClient\AMS'
$folderDate = '2022-04'
[array] $data = Get-ChildItem "$folderPath/$folderDate*/*" -File | ForEach-Object{
# Join-Path's implicit output will be captured as an array in $data.
Join-Path $_.Directory.Name $_.Name
}
$data will be an array of file paths like this:
2022-04-01\abc.txt
2022-04-02\cde.pdf
2022-04-03\e.xls
Notes:
[array] $data makes sure that the variable always contains an array. Otherwise PowerShell would output a single string value when only a single file is found. This could cause problems, e. g. when you want to iterate over $data by index, you would iterate over the characters of the single string instead.
To make this answer platform-independent I'm using forward slashes in the Get-ChildItem call which work as path separators under both Windows and *nix platforms.
Join-Path is used to make sure the output paths use the expected default path separator (either / or \) of the platform.
I am trying to create directories and subdirectories based on the names of existing files. After that I want to move those files into the according directories. I have already come pretty far, also with the help of
here and here, but I am failing at some point.
Existing Test Files Actually about 5000 files
Folder structure This is how it should look like afterwards
MM0245AK625_G03_701.txt
MM\MM0245\625\G03\MM0245AK625_G03_701.txt
MM0245AK830_G04_701.txt
MM\MM0245\830\G04\MM0245AK830_G04_701.txt
VY0245AK_G03.txt
VY\VY0245\VY0245AK_G03.txt
VY0245AK_G03_701.txt
VY\VY0245\G03\VY0245AK_G03_701.txt
VY0245AK625_G03.txt
VY\VY0245\625\VY0245AK625_G03.txt
VY0245AK625_G03_701.txt
VY\VY0245\625\G03\VY0245AK625_G03_701.txt
VY0345AK625_G03_701.txt
VY\VY0345\625\G03\VY0345AK625_G03_701.txt
Code for creating those files is at the end of this post.
As you can see, the files do match some kind of pattern, but not consistently. I use multiple copies of my code with different 'parameters' to sort each type of filepattern, but there gotta be a more streamline way.
Existing code
$dataPath = "$PSScriptRoot\Test"
#$newDataPath = "$PSScriptRoot\"
Get-ChildItem $dataPath -Filter *.txt | % {
$g1 = $_.BaseName.Substring(0, 2)
$g2 = $_.BaseName.Substring(0, 6)
$g3 = $_.BaseName.Substring(8, 3)
$g4 = $_.BaseName.Substring(12, 3)
$path = "$DataPath\$g1\$g2\$g3\$g4"
if (-not (Test-Path $path)) {
New-Item -ItemType Directory -Path $path
}
Move-Item -Path $_.FullName -Destination $path
}
This code also creates directories in the 3rd $g3 layer for files in "the shorter format", e.g. XX0000AK_G00.txt. This file should however not be moved further than layer $g2. Of course the code above is not capable of doing this, so I tried it with regex below.
This is an alternative idea (not worked out furhter than creating directories), but I failed to continue after
Select-Object -Unique. I am failing to use $Matches[1] in New-Item, because I can only Select-Object -unique the variable $_, not $Matches[1] or even the subdirectory "$($Matches[1])$($Matches[2])". The following code is my attempt.
cd $PSScriptRoot\Test
# Create Folder Layer 1
Get-ChildItem |
% {
$_.BaseName -match "^(\w{2})(\d{4})AN(\d{3})?_(G\d{2})(_\d{3})?$" | Out-Null
$Matches[1]
"$($Matches[1])$($Matches[2])"
} |
Select-Object -Unique |
% {
New-Item -ItemType directory $_
} | Out-Null
I am fairly new to powershell, please don't be too harsh :) I also don't have a programming background, so please excuse the use of incorrect wording.
new-item $dataPath\MM0245AK830_G04_701.txt -ItemType File
new-item $dataPath\VY0245AK_G03.txt -ItemType File
new-item $dataPath\VY0245AK_G03_701.txt -ItemType File
new-item $dataPath\VY0245AK625_G03.txt -ItemType File
new-item $dataPath\VY0245AK625_G03_701.txt -ItemType File
new-item $dataPath\VY0345AK625_G03_701.txt -ItemType File
i am truly bad at complex regex patterns [blush], so this is done with simple string ops, mostly.
what the code does ...
fakes reading in some files
when you have tested this and it works as needed on all your test files, replace the entire #region/#endregion block with a Get-ChildItem call.
iterates thru the collection
splits the BaseName on ak & saves it for later use
checks for a the two short file layouts
checks for 1 _ versus 2
builds the $Dir string for each of those 2 filename layouts
builds the long file name $Dir
uses the previous $Dir stuff to build the $FullDest for each file
shows the various stages for each file
that last section would be replaced with your mkdir & Move-Item commands.
the code ...
#region >>> fake reading in files
# when ready to use the real things, use $Get-ChildItem
$InStuff = #'
MM0245AK625_G03_701.txt
MM0245AK830_G04_701.txt
VY0245AK_G03.txt
VY0245AK_G03_701.txt
VY0245AK625_G03.txt
VY0245AK625_G03_701.txt
VY0345AK625_G03_701.txt
'# -split [System.Environment]::NewLine |
ForEach-Object {
[System.IO.FileInfo]$_
}
#endregion >>> fake reading in files
foreach ($IS_Item in $InStuff)
{
$BNSplit_1 = $IS_Item.BaseName -split 'ak'
if ($BNSplit_1[-1].StartsWith('_'))
{
if (($BNSplit_1[-1] -replace '[^_]').Length -eq 1)
{
$Dir = '{0}\{1}' -f $IS_Item.BaseName.Substring(0, 2),
$IS_Item.BaseName.Substring(0, 6)
}
else
{
$Dir = '{0}\{1}\{2}' -f $IS_Item.BaseName.Substring(0, 2),
$IS_Item.BaseName.Substring(0, 6),
$IS_Item.BaseName.Split('_')[1]
}
}
else
{
$Dir = '{0}\{1}\{2}\{3}' -f $IS_Item.BaseName.Substring(0, 2),
$IS_Item.BaseName.Substring(0, 6),
$BNSplit_1[-1].Split('_')[0],
$BNSplit_1[-1].Split('_')[1]
}
$FullDest = Join-Path -Path $Dir -ChildPath $IS_Item
#region >>> show what was done with each file
# replace this block with your MkDir & Move-Item commands
$IS_Item.Name
$Dir
$FullDest
'depth = {0}' -f ($FullDest.Split('\').Count - 1)
'=' * 20
#endregion >>> show what was done with each file
}
the output ...
MM0245AK625_G03_701.txt
MM\MM0245\625\G03
MM\MM0245\625\G03\MM0245AK625_G03_701.txt
depth = 4
====================
MM0245AK830_G04_701.txt
MM\MM0245\830\G04
MM\MM0245\830\G04\MM0245AK830_G04_701.txt
depth = 4
====================
VY0245AK_G03.txt
VY\VY0245
VY\VY0245\VY0245AK_G03.txt
depth = 2
====================
VY0245AK_G03_701.txt
VY\VY0245\G03
VY\VY0245\G03\VY0245AK_G03_701.txt
depth = 3
====================
VY0245AK625_G03.txt
VY\VY0245\625\G03
VY\VY0245\625\G03\VY0245AK625_G03.txt
depth = 4
====================
VY0245AK625_G03_701.txt
VY\VY0245\625\G03
VY\VY0245\625\G03\VY0245AK625_G03_701.txt
depth = 4
====================
VY0345AK625_G03_701.txt
VY\VY0345\625\G03
VY\VY0345\625\G03\VY0345AK625_G03_701.txt
depth = 4
====================
I would first split each file BaseName on the underscore. Then use a regex to split the first part into several array elements, combine that with a possible second part of the split in order to create the destination folder path for the files.
$DataPath = "$PSScriptRoot\Test"
$files = Get-ChildItem -Path $DataPath -Filter '*_*.txt' -File
foreach ($file in $files) {
$parts = $file.BaseName -split '_'
# regex your way to split the first part into path elements (remove empty items)
$folders = [regex]::Match($parts[0], '(?i)^(.{2})(\d{4})[A-Z]{2}(\d{3})?').Groups[1..3].Value | Where-Object { $_ -match '\S'}
# the second part is a merge with the first part
$folders[1] = $folders[0] + $folders[1]
# if there was a third part after the split on the underscore, add $part[1] (i.e. 'Gxx') to the folders array
if ($parts.Count -gt 2) { $folders += $parts[1] }
# join the array elements with a backslash (i.e. [System.IO.Path]::DirectorySeparatorChar)
# and join all tat to the $DataPath to create the full destination for the file
$target = Join-Path -Path $DataPath -ChildPath ($folders -join '\')
# create the folder if that does not yet exist
$null = New-Item -Path $target -ItemType Directory -Force
# move the file to that (new) directory
$file | Move-Item -Destination $target -WhatIf
}
The -WhatIf switch makes the code not move anything to the new destination, it will only display where the file would go to. Once you are happy with that information, remove -WhatIf and run the code again
After moving your filestructure will look like this:
D:\TEST
+---MM
| \---MM0245
| +---625
| | \---G03
| | MM0245AK625_G03_701.txt
| |
| \---830
| \---G04
| MM0245AK830_G04_701.txt
|
\---VY
+---VY0245
| | VY0245AK_G03.txt
| |
| +---625
| | | VY0245AK625_G03.txt
| | |
| | \---G03
| | VY0245AK625_G03_701.txt
| |
| \---G03
| VY0245AK_G03_701.txt
|
\---VY0345
\---625
\---G03
VY0345AK625_G03_701.txt
I have a little over 12000 files that I need to sort through.
18-100-00000-LOD-H.pdf
18-100-00000-LOD-H-1C.pdf
21-200-21197-LOD-H.pdf
21-200-21197-LOD-H-1C.pdf
21-200-21198-LOD-H.pdf
21-200-21198-LOD-H-1C.pdf
I need a way to go through all the files and delete the LOD-H version of the files.
EX:
21-200-21198-LOD-H.pdf
21-200-21198-LOD-H-1C.pdf
With the partial match being the 5 digit code I need a script that would delete the LOD-H case of the partial match.
So far this is what I have but it won't work because I need to supply values for the pattern but since there isn't one set pattern and more like multiple patterns I don't know what to supply it with
$source = "\\Summerhall\GLUONPREP\Market Centers\~Pen Project\Logos\ALL Office Logos"
$destination = "C:\Users\joshh\Documents\EmptySpace"
$toDelete = "C:\Users\joshh\Documents\toDelete"
$allFiles = #(Get-ChildItem $source -File | Select-Object -ExpandProperty FullName)
foreach($file in $allFiles) {
$content = Get-Content -Path $file
if($content | Select-String -SimpleMatch -Quiet){
$dest = $destination
}
else{
$dest = $toDelete
}
}
Any help would be super appreciated, even links to something similar or even links to documentation so I can start piecing a script of my own would be super helpful.
Thank you!
This should work for what you need:
# Get a list of the files with -1C preceeding the extension
$1cFiles = #( ( Get-ChildItem -File "${source}/*-LOD-H-1C.pdf" ).Name )
# Retreive files that match the same pattern without 1C, and iterate over them
Get-ChildItem -File "${source}/*-LOD-H.pdf" | ForEach-Object {
# Get the name of the file if it had the -1C suffix preceeding the .ext
$useName = $_.Name.Insert($_.Name.LastIndexOf('.pdf'), '-1C')
# If the -1C version of the file exists, remove the current (non-1C) file
if( $1cFiles -contains $useName ) {
Remove-Item -Force $_
}
}
Basically, look for the 1C files in $toDelete, then iterate over the non-1C files in $toDelete, removing the non-1C file if adding -1C before the file extension matches an existing file with 1C in the name.
I'm trying copy logs in numerical order and I want my output.txt to log the last file copied however I'm running to a problem where when my script goes from log_9.txt to Log_10.txt the value that gets put into my text file stays at log_9.txt even though it copies all the files
dir c:\PS1 *.bat | ForEach {
$variable = "$($_.Name) 'n$(Get-content $_.FullName)"
Set-Content -Value $variable -Path c:\PS1\Output.txt
$pull = Get-Content C:\PS1\Output.txt
copy-item $source\$pull -Destination $dest -Verbose
}
}
The following command shows you how you sort the base name (file name without extension) of your input files first lexically, by the text before the _, and then numerically, by the number following the _:
# The input simulates dir (Get-ChildItem) output.
#{ BaseName = 'log_10' }, #{ BaseName ='log_9' }, #{ BaseName = 'log_2' } |
Sort-Object { ($_.BaseName -split '_')[0] }, { [int] ($_.BaseName -split '_')[-1] }
The above yields the following - note the correct numerical sorting:
Name Value
---- -----
BaseName log_2
BaseName log_9
BaseName log_10
I would really appreciate your help with this
I should first mention that I have been unable to find any specific solutions and I am very new to programming with powershell, hence my request
I wish to write (and later schedule) a script in powershell that looks for a file with a specific name - RFUNNEL and then renames this to R0000001. There will only be one of such 'RFUNELL' files in the folder at any time. However when next the script is run and finds a new RFUNNEL file I will this to be renamed to R0000002 and so on and so forth
I have struggled with this for some weeks now and the seemingly similar solutions that I have come across have not been of much help - perhaps because of my admittedly limited experience with powershell.
Others might be able to do this with less syntax, but try this:
$rootpath = "C:\derp"
if (Test-Path "$rootpath\RFUNNEL.txt")
{ $maxfile = Get-ChildItem $rootpath | ?{$_.BaseName -like "R[0-9][0-9][0-9][0-9][0-9][0-9][0-9]"} | Sort BaseName -Descending | Select -First 1 -Expand BaseName;
if (!$maxfile) { $maxfile = "R0000000" }
[int32]$filenumberint = $maxfile.substring(1); $filenumberint++
[string]$filenumberstring = ($filenumberint).ToString("0000000");
[string]$newName = ("R" + $filenumberstring + ".txt");
Rename-Item "$rootpath\RFUNNEL.txt" $newName;
}
Here's an alternative using regex:
[cmdletbinding()]
param()
$triggerFile = "RFUNNEL.txt"
$searchPattern = "R*.txt"
$nextAvailable = 0
# If the trigger file exists
if (Test-Path -Path $triggerFile)
{
# Get a list of files matching search pattern
$files = Get-ChildItem "$searchPattern" -exclude "$triggerFile"
if ($files)
{
# store the filenames in a simple array
$files = $files | select -expandProperty Name
$files | Write-Verbose
# Get next available file by carrying out a
# regex replace to extract the numeric part of the file and get the maximum number
$nextAvailable = ($files -replace '([a-z])(.*).txt', '$2' | measure-object -max).Maximum
}
# Add one to either the max or zero
$nextAvailable++
# Format the resulting string with leading zeros
$nextAvailableFileName = 'R{0:000000#}.txt' -f $nextAvailable
Write-Verbose "Next Available File: $nextAvailableFileName"
# rename the file
Rename-Item -Path $triggerFile -NewName $nextAvailableFileName
}