How do I remove a "base" portion of a path? - powershell

I have a series of folders in a source directory that I need to recreate in another environment. The source folder look like this:
C:\temp\stuff\folder01\
C:\temp\stuff\folder02\
C:\temp\stuff\folder03\
C:\temp\stuff\folder02\dog
C:\temp\stuff\folder02\cat
I need to remove the base portion of the path - C:\temp\stuff - and grab the rest of t he path to concatenate with the destination base folder in order to create the folder structure somewhere else.
I have following script. The problem seems to be with the variable $DIR. $DIR never gets assigned the expected "current path minus the base path".
The variable $tmp is assigned something close to the expected folder name. It contains the folder name minus vowels, is split across multiple lines, and includes a bunch of leading whitespace. For example, a folder named folder01 would look like the following:
f
ld
r01
Here's the PowerShell script:
$SOURCES_PATH = "C:\temp\stuff"
$DESTINATION_PATH = "/output001"
Get-ChildItem "$SOURCES_PATH" -Recurse -Directory |
Foreach-Object {
$tmp = $_.FullName.split("$SOURCES_PATH/")
$DIR = $_.FullName.split("$SOURCES_PATH/")[1]
$destination = "$DESTINATION_PATH/$DIR"
*** code omitted ***
}
I suspect the $DIR appears to be unassigned because the [1] element is whitespace but I'm don't know why there is whitespace and what's happening to the folder name.
What's going on and how do I fix this?

String.Split("somestring") will split on every occurrence of any of the characters in "somestring", which is why you're seeing the paths being split into many more parts than you're expecting.
I'd suggest using -replace '^base' to remove the leading part of the path:
$SOURCES_PATH = "C:\temp\stuff"
$DESTINATION_PATH = "/output001"
Get-ChildItem "$SOURCES_PATH" -Recurse -Directory |Foreach-Object {
# This turns "C:\temp\stuff\folder01" into "\folder01"
$relativePath = $_.FullName -replace "^$([regex]::Escape($SOURCES_PATH))"
# This turns "\folder01" into "/folder01"
$relativeWithForwardSlash = $relativePath.Replace('\','/')
# Concatenate destination root and relative path
$rootedByDestination = "${DESTINATION_DIR}${relativeWithForwardSlash}"
# Create/Copy $rootedByDestination here
}
-replace is a regular expression operator, which is why I run [regex]::Escape() against the input path, to double-escape the backslashes :)

Consider replacing the source path with an empty string. Then you can either concat what's left onto destination path, or use Path::Combine to take care of the concatenation and any separator drama.
Source:
Get-ChildItem | ForEach-Object {
$destination = [System.IO.Path]::Combine($DESTINATION_PATH, $_.FullName.Replace($SOURCES_PATH, ''))
}

Related

Powershell - how to return results of filenames that don't have a "partner"

I'm attempting to find files in a folder of filenames that look like the following:
C:\XMLFiles\
in.blahblah.xml
out.blahblah.xml
in.blah.xml
out.blah.xml
I need to return results of only files that do not have it's "counterpart". This folder contains thousands of files with randomized "center" portions of the file names....the commonality is in/out and ".xml".
Is there a way to do this in Powershell? It's an odd ask.
Thanks.
Your question is a little vague. I hope I got it right. Here is how I would do it.
$dir = 'my_dir'
$singleFiles = [System.Collections.Generic.HashSet[string]]::new()
Get-ChildItem $dir -Filter '*.xml' | ForEach-Object {
if ($_.BaseName -match '^(?<prefix>in|out)(?<rest>\..+)') {
$oppositeFileName = if ($Matches.prefix -eq 'in') {
'out'
}
else {
'in'
}
$oppositeFileName += $Matches.rest + $_.Extension
$oppositeFileFullName = Join-Path $_.DirectoryName -ChildPath $oppositeFileName
if ($singleFiles.Contains($oppositeFileFullName)) {
$singleFiles.Remove($oppositeFileFullName) | Out-Null
}
else {
$singleFiles.Add($_.FullName) | Out-Null
}
}
}
$singleFiles
I'm getting all the XML files from the directory and I'm iterating the results. I check the base name of the file (the name of the file doesn't include the directory path and the extension) if they match a regex. The regex says: match if the name starts with in or out followed by at least 1 character.
The $Matches automatic variable contains the matched groups. Based on these groups I'm building the name of the counter-part file: i.e. if I'm currently on in.abc I build out.abc.
After that, I'm building the absolute path of the file counter-part file and I check if it exists in the HashSet. if It does, I remove it because that means that at some point I iterated that file. Otherwise, I'm adding the current file.
The resulting HashSet will contain the files that do not have the counter part.
Tell me if you need a more detailed explanation and I will go line by line. It could be refactored a bit, but it does the job.

Windows cmd command for stripping versions from filenames?

Need Windows cmd command to rename files to names without version numbers, e.g.:
filename.exa.1 => filename.exa
filename_a.exb.23 => filename_a.exb
filename_b.exc.4567 => filename_b.exc
Filenames are variable in number of characters, and the primary extension is always 3 characters.
I once had a Solaris script "stripv" to accomplish this. I could enter "stripv *" in a directory and get a nice clean set of non-versioned files. If the command would result in duplicate filenames because multiple versions exist, then it would just skip the operation altogether.
TIA
Don't know how to do it in CMD, but here is some Powershell that would work for you:
# Quick way to get an array of filenames. You could also create a proper array,
# or read each line into an array from a file.
$filepaths = #"
C:\full\path\to\filename.exa.1
C:\full\path\to\filename_a.exb.23
\\server\share\path\to\filename_b.exc.4567
"# -Split "`n"
# For each path in $filepaths
$filepaths | Foreach-Object {
$path = $_
# Split-Path -Leaf gets only the filename
# -Replace expression just means to match on the ".number" at the end of the
# filename and replace it with an empty string (effectively removing it)
$newFilename = ( Split-Path -Leaf $path ) -Replace '\.\d+$', ''
# Warning output
Write-Warning "Renaming '${path}' to '${newFilename}'"
# Rename the file to the new name
Rename-Item -Path $path -NewName $newFilename
}
Basically, this code creates an array of full paths to files. For each path, it strips the filename from the full path and replaces the .number pattern at the end with nothing, which removes it from the filename. Now that we have the new filename, we use Rename-Item to rename the file to the new name.
Supply the folder name to this script block's $Folder variable, and it will enumerate the items within that folder, locate the last '.' character within the file name, and rename it as everything prior to the '.'.
E.g.: Filename.123.wrcrw.txt.123 would be renamed as Filename.123.wrcrw.txt or in your case, your files would lose the extraneous characters from the final '.' onwards. If the new name for the file already exists, it will write a warning stating that it could not rename the file, and continue on without trying.
$Folder = "C:\ProgramData\Temp"
Get-ChildItem -Path $Folder | Foreach {
$NewName = $_.Name.Substring(0,$_.Name.LastIndexOf('.'))
IF (!(Test-Path $Folder\$NewName))
{
Rename-Item $Folder\$_ -NewName $NewName
}
Else
{
Write-Warning "$($_.Name) cannot be renamed, $NewName already exists."
}
}
This should effectively mimic the behaviour you described for stripv *. This could easily be turned into a function with name stripv and added to your PowerShell profile to make it available at the command-line interactively and used in the same way as your Solaris script.

Get a specific folder name from path

How to get the 4th folder name and store it in a variable while looping through the files stored in a parent folder. For example, if the path is
C:\ParentFolder\Subfolder1\subfolder2\subfolder3\file.extension
C:\ParentFolder\Subfolder1\subfolder2\subfolder4\file.extension
C:\ParentFolder\Subfolder1\subfolder2\subfolder5\file.extension
then subfolder2 name should be stored in a variable. Can any one please help me on this?
get-childitem $DirectorNane -Recurse | Where-Object{!($_.PSIsContainer)} | % {
$filePath = $_.FullName
#Get file name
$fileName = Split-Path -Path $filePath -Leaf
} $FileI = Split-Path -Path $filePath -Leaf
Thanks in advance!
You can use the -split operator on the $filePath variable to get what you want.
$split = $filePath -split '\\'
$myvar = $split[3]
We use the double backslash to split with, because the first slash escapes the slash character to split the path by. Then, we can reference the part of the path we want in the array that gets generated in the "split" variable.
Additionally, you can solve this with a one liner using the following code:
$myvar = $filepath.split('\')[3]
This would ensure that you're always getting the fourth element in the array, but is a little less flexible since you can't specify what exactly you want based on conditions with additional scripting.
If you are asking how to get the parent directory of the directory containing a file, you can call Split-Path twice. Example:
$filePath = "C:\ParentFolder\Subfolder1\subfolder2\subfolder3\file.extension"
$parentOfParent = Split-Path (Split-Path $filePath)
# $parentOfParent now contains "C:\ParentFolder\Subfolder1\subfolder2"

Only convert files with the string "DUPLICATE" in the name

I'm trying to make a script that converts PDF's to Tif.
It copies the right files from one folder to another (thanks to the communities previous help).
Next it converts all of the pdfs to tiff.
Lastly it converts the tiff to tif (name change)
What I want to do now is to only convert pdf's with "DUPLICATE" in its file name to tiff. And finally remove the "DUPLICATE" from the new tiff's filename.
Does anyone know how to do that?
gci X:\IT\PDFtoTIFF\1 -filter {VKF*} | Move-Item -destination X:\IT\PDFtoTIFF\2
$tool = 'C:\Program Files (x86)\GPLGS\gswin32c.exe'
$pdfs = get-childitem . -recurse | where {$_.Extension -match "pdf"}
foreach($pdf in $pdfs)
{
$tiff = $pdf.FullName.split('.')[0] + '.tiff'
if(test-path $tiff)
{
"tiff file already exists " + $tiff
}
else
{
'Processing ' + $pdf.Name
$param = "-sOutputFile=$tiff"
& $tool -q -dNOPAUSE -sDEVICE=tiffg4 $param -r300 $pdf.FullName -c quit
}
}
Dir *.tiff | rename-item -newname { $_.name -replace ".tiff",".tif" }
More details:
The script needs to work like this:
All file in the folder \itgsrv028\invoices$\INST that start with vkf need to be moved to this folder: \itgsrv028\invoices$\INST\V3
(This is currently working in the script)
Only convert the files with “Duplicaat” in it’s name to Tiff
Rename VKF_320150309DUPLICAAT.Tiff to 320150309.tif
Example:
These files in the folder:
VKF_320150309.PDF
VKF_320150309DUPLICAAT.PDF
Need to become:
VKF_320150309.PDF
VKF_320150309DUPLICAAT.PDF
320150309.TIF (Converted from: VKF_320150309DUPLICAAT.PDF)
About using only "DUPLICAAT": You have to change your filtering a bit, to include a match for "DUPLICAAT" in there, like this:
$pdfs = get-childitem . -recurse | where {$_.Extension -match "pdf" -and $_.basename -match "DUPLICAAT"}
About building a new name for the TIFF: You can use group placeholders in a regular expression to retrieve your valuable part from the middle of known characters. With your VKF_320150309DUPLICAAT.PDF as an example, you can convert it to a proper TIFF file name with this construction:
$tiff="$($pdf.directory)\$($pdf.basename -replace "VKF_([\w\s]+)DUPLICAAT",'$1').tiff"
This combines a -replace operator over a string, a replacement of $(expression) with its evaluated value in a string and combining proper extension string with path separator within a formatted string. This resolves as follows:
This is a string, as indicated by double quotes wrapping.
$(expression) at first occurrence evaluates to the value of $pdf.directory which contains path to parent without a trailing backslash. With $pdf equal to X:\IT\PDFtoTIFF\2\VKF_320150309DUPLICAAT.PDF this will return X:\IT\PDFtoTIFF\2.
The $(expression) at the second occurrence evaluates to $pdf.basename -replace "VKF_(\w+)DUPLICAAT",'$1'. With the same PDF this equals to "VKF_320150309DUPLICAAT"-replace "VKF_(\w+)DUPLICAAT",'$1'. The round braces regexp portion in the expression matches "320150309" and this value is assigned to $1 which is then placed instead of the whole matched region. Thus your name gets stripped of both "VKF_" and "DUPLICAAT" letters in one go.
The two returned strings get formed into one with a backslash in between and trailing .tiff, resulting in a X:\IT\PDFtoTIFF\2\320150309.tiff.
Hope this would help you in building better scripts that play with strings in Powershell.

Powershell Copying files with varying folders in the path

Right up front apologies for my lack of knowledge with Powershell. Very new to the language . I need to copy some files located in a certain path to another similar path. For example:
C:\TEMP\Users\<username1>\Documents\<varyingfoldername>\*
C:\TEMP\Users\<username2>\Documents\<varyingfoldername>\*
C:\TEMP\Users\<username3>\Documents\<varyingfoldername>\*
C:\TEMP\Users\<username4>\Documents\<varyingfoldername>\*
etc....
to
C:\Files\Users\<username1>\Documents\<varyingfoldername>\*
C:\Files\Users\<username2>\Documents\<varyingfoldername>\*
C:\Files\Users\<username3>\Documents\<varyingfoldername>\*
C:\Files\Users\<username4>\Documents\<varyingfoldername>\*
etc....
So basically all files and directories from path one need to be copied to the second path for each one of the different paths. The only known constant is the first part of the path like C:\TEMP\Users...... and the first part of the destination like C:\Files\Users.....
I can get all the different paths and files by using:
gci C:\TEMP\[a-z]*\Documents\[a-z]*\
but I am not sure how to then pass what's found in the wildcards so I can use them when I do the copy. Any help would be appreciated here.
This should work:
Get-ChildItem "C:\TEMP\*\Documents\*" | ForEach-Object {
$old = $_.FullName
$new = $_.FullName.Replace("C:\TEMP\Users\","C:\Files\Users\")
Move-Item $old $new
}
For additional complexity in matching folder levels, something like this should work:
Get-ChildItem "C:\TEMP\*\Documents\*" -File | ForEach-Object {
$old = $_.FullName
$pathArray = $old.Split("\") # Splits the path into an array
$new = [system.String]::Join("\", $pathArray[0..1]) # Creates a starting point, in this case C:\Temp
$new += "\" + $pathArray[4] # Appends another folder level, you can change the index to match the folder you're after
$new += "\" + $pathArray[6] # You can repeat this line to keep matching different folders
Copy-Item -Recurse -Force $old $new
}