Download multiple large files asynchronously with Powershell - powershell

I have four large OS installation media I need to download. This will take a long time if I wait for each download to finish before moving to the next one.
Before downloading, I want to check if the media is already present.
The solution is likely a combination of a hash table, test-path and invoke-webrequest but I just can't crack it.
So in pseudo something on the lines of:
Check if file1 exists
if true then download and check file2
if false check file 2
check if file 2 exists...
So check if files exists and if not, start downloading all the ones that are missing.
I'm not very experienced with PS so all help is much appreciated, thank you very much! Researching the answer was fun but I feel I'm missing a keyword here...

There is a fairly simple way for async downloads using WebClient class, although it's probably not available on older version of PS. See the example below
$files = #(
#{url = "https://github.com/Microsoft/TypeScript/archive/master.zip"; path = "C:\temp\TS.master.zip"}
#{url = "https://github.com/Microsoft/calculator/archive/master.zip"; path = "C:\temp\calc.master.zip"}
#{url="https://github.com/Microsoft/vscode/archive/master.zip"; path = "C:\temp\Vscode.master.zip"}
)
$workers = foreach ($f in $files) {
$wc = New-Object System.Net.WebClient
Write-Output $wc.DownloadFileTaskAsync($f.url, $f.path)
}
# wait until all files are downloaded
# $workers.Result
# or just check the status and then do something else
$workers | select IsCompleted, Status

Related

Powershell: Go through all files (PDF's) in a directory and move them based on what's written in the first 6 bytes

I am currently trying to write a powershell script that does the following:
Go through all PDF-Files in the directory in which the script is in
Check the first few bytes of those PDF-Files
If those bytes say something along the lines of "PK", move them to a different location
If the bytes say something else (ex: PDF1.4), dont move them at all and go to the next one.
Context: We have around 70k PDF-Files that cant be opened. After checking them with a certain tool, it looks like around 99% of those are damaged and the remaining 1% are zip files.
The first bytes of a zipped PDF file start with "PK", the first bytes of a broken PDF-File start with PDF1.4 for example.
I need to unzip all zip files and relocate them. Going through 70k PDF-Files by hand is kinda painful, so im looking for a way to automate it.
I know im supposed to provide a code sample, but the truth is that i am absolutely lost. I have written a few powershell scripts before, but i have no idea how to do something like this.
So, if anyone could kindly point me to the right direction or give me a useful function, i would really appreciate it a lot.
You can use Get-Content to get your first 6 bytes as you asked.
We can then tie that into a loop on all the documents and configure a simple if statement to decide what to do next, e.g. move the file to another dir
EDITED BASED ON YOUR COMMENT:
$pdfDirectory = 'C:\Temp\struktur_id_1225\ext_dok'
$newLocation = 'C:\Path\To\New\Folder'
Get-ChildItem "$pdfDirectory" -Filter "*.pdf" | foreach {
if((Get-Content $_.FullName | select -first 1 ) -like "%PDF-1.5*"){
$HL7 = $_.FullName.replace("ext_dok","MDM")
$HL7 = $HL7.replace(".pdf",".hl7")
move $_.FullName $newLocation;
move $HL7 $newLocation
}
}
Try using the above, which is also a bit easier to edit.
$pdfDirectory will need to be set to the folder containing the PDF Files
$newLocation will obviously be the new directory!
And you will still need to change the -like "%PDF-1.5*" to suit your search!
It should do the rest for you, give it a shot
Another Edit
I have mimicked your folder structure on my computer, and placed a few PDF files and matching HL7 files and the script is working perfectly.
Get-Content is not suited for PDF's, you'd want to use iTextSharp to read PDF's.
Download the iTextSharp(found in releases) and put the itextsharp.dll somewhere easy to find (ie. the folder your script is located in).
You can install the .nupkg by using Install-Package, or simply using an archive tool to extract the contents of the .nupkg file (it's basically a .zip file)
The code below adds every word on page 1 for each PDF separated by whitespace to an array. You can then test if the array contains your keyword
Add-Type -Path "C:\path\to\itextsharp.dll"
$pdfs = Get-ChildItem "C:\path\to\pdfs" *.pdf
foreach ($pdf in $pdfs) {
$reader = New-Object itextsharp.text.pdf.pdfreader -ArgumentList $pdf.Fullname
$text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader,1).Split("")
foreach($line in $text) {
# do your test here
}
}

How to download multiple files with powershell

Okay, so, here is what I ended up editing from my original answer. Kudos to #Matt for pointing out that I should be more descriptive with my answer and explain clearly what my edits were so that other users might be able to benefit from my answer in the futre. Users like #Matt are a great part of this community and put emphasis on keeping the standards high here.
The first thing I edited/added is the ability to delete the previous log from each run. Since this script will be scheduled it is important to remove the previous logs in order to prevent utilizing up too much disk space. This can be noted under the comment: "delete log files from prev run"
# delete log files from prev Run
Remove-Item C:\alerts\logs\*.*
The next thing I edited/added is the ability to switch between host names. I did this to prevent the overwriting of the files. You can see this under the comment "change filename in order to prevent overwriting of log file". I accomplished this by checking the index of "$url" in the foreach loop, and checked to see if it was at the position where I needed to change the host name. I suspect there was a much more intuitive way to do this and I would just love it if someone chimed in with a better way to do this as its driving me crazy that I don't know a better way. It should be noted that there are a total of 44 urls where I'm downloading from, hence the magic numbers (11, 22, 33) where I change the host name. Again, if you know a better way please don't hesitate to let me know.
If ($urls.IndexOf($url) -eq 11){
$currentDir = "goxsd1704"
}
ElseIf ($urls.IndexOf($url) -eq 22){
$currentDir = "goxsd1705"
}
ElseIf ($urls.IndexOf($url) -eq 33){
$currentDir = "goxsd1706"
}
The next thing I edited/added, thanks to #Matt for the recommendation is the try catch blocks which are clearly noted in the code. I should of had these to start with as by not having them before I was assuming that the script was always going to work. Rookie mistake and point taken.With that being said, these are all my edits. The code is working fine, but improvement is always possible. Thank you for your time and answers.
# set date
$date = Get-Date -UFormat "%Y-%m-%d-%H_EST"
# delete log files from prev Run
Remove-Item C:\alerts\logs\*.*
# setup download links
$urls = "http://subdomain.domain.com:portnumber/LogA/API_DBG_CS_Logs/dbg_a.$date.log"
function DownloadFMOSLogs()
{
try
{
# assign working dir to currentDir
$currentDir = "goxsd1703"
# instantiate web-client.
$wc = New-Object System.Net.WebClient
# loop through each url
foreach ($url in $urls)
{
# change filename to prevent overwriting of log file
If ($urls.IndexOf($url) -eq 11){
$currentDir = "goxsd1704"
}
ElseIf ($urls.IndexOf($url) -eq 22){
$currentDir = "goxsd1705"
}
ElseIf ($urls.IndexOf($url) -eq 33){
$currentDir = "goxsd1706"
}
# get file name
$fileName = $url.SubString($url.LastIndexOf('/')+1)
# create target file name
$targetFileName = "C:\alerts\logs\" + $currentDir + "_" + $fileName
$wc.DownloadFile($url, $targetFileName)
Write-Host "Downloaded $url to file location $targetFileName"
}
} catch [System.Net.WebException],[System.IO.IOException]
{
"An error occurred. Files were not downloaded."
}
}
DownloadFMOSLogs
Write-Host ""
Write-Host "Download of application log files has successfully completed!"
Invoke-WebRequest is a good way in Powershell to download files and the OutFile parameter will put this straight to disk, docs are here.
Have a go with Invoke-WebRequest -Uri $link -OutFile $targetFileName
You have a couple of problems and an issue or two.
$urls is not an array like you think it is. It is actually one whole string. Try something like this instead:
$urls = "http://subdomain.domain.com:port/LogA/API_DBG_CS_Logs/dbg_a.$date.log",
"http://subdomain.domain.com:port/LogA/API_DBG_CS_Logs/dbg_b.$date.log"
The variable will expand in that string just fine. The issue before is that you were concatenating the string starting from the first part because of the order of operations. When you add an array to a string on the left hand side the array gets converted to a space delimited string. Have a look at a smaller example which is exactly what you tried to do.
"hello" + 2,2 + "there"
You could have made what you had work if you wrapped each one in a set of brackets first.
("hello" + 2),(2 + "there")
This code might make sense elsewhere but as others have pointed out you have a useless loop about lines in a file. foreach($line in Get-Content .\hosts.txt). If you don't use it get rid of it.
You don't really use $targetDir to its full potential. If you are going to use the working directory of the script at least use some absolute paths. Side note the comments don't really match what is happening which is likely related to 2. above
# preprend host to file name to keep file names diff.
$targetFilePath = [io.path]::combine($pwd,"test.txt")
# download the files.
$wc.DownloadFile($link, $targetFilePath)
You should try and make that unique somehow since the files will overwrite eachother as you have it coded.
I would also wrap that in a try block in case the download fails and you can report properly on that. As of now you assume it will work every time.

Jenkins + PowerShell - Converting Zip File to Bytes, Transfer Bytes, Then Reconstruct Bytes to Zip

As this complex problem now has a viable solution, I wanted to update this to accommodate the solution so that others can benefit from this.
Problem
I am working with Jenkins and PowerShell to perform a series of actions. One such action is to gather up and consolidate all associate log files into a single folder named accordingly to associate it with the originating machine/server, zip the folder up, transfer it back to the Jenkins workspace environment somehow, store the zips into a single master folder, and convert said master folder into an artifact which can be downloaded at a later time/date for debugging. However, since this action tends to involve the two-hop exchange, I have hit a number of walls that have inhibited my ability to perform simple file transfers. So I am turning to the community to seek assistance from those who hopefully have more experience doing this kind of thing.
The only way (that I have found so far) for me to perform many of the actions I need is to connect to the target machine through an Invoke-Command ; however, it did not take long for me to start running into walls. One such wall being the issues with file transfer between Jenkins and the target machine. I have found that by creating an object that equals the invoked command ( ex: $returnMe = Invoke-Command {} ), I am able to return objects from the action to be stored in the variable. This has given me a possible resolution to the problem of returning an item to Jenkins through the session... but now brings forward the question:
Question
Using PowerShell, is it possible to zip a folder up, then convert
that zip file into an object to be passed, and then reconstruct that
zip file using the contents of the object? If so, how?
Resolution:
A special shout out to #ArcSet for assistance in getting this to work right.
Alright, so it is extremely possible to do this; however, it takes a series of key steps to make it happen. One such thing that this is a four to five step process:
1) Gather the files into the target folder and zip the folder
2) Convert the zip file into bytes that can be easily transfered
3) Pass those byte through a special custom object
4) Parse the object to ensure excess bytes did not find their way into the file.
5) Convert the bytes back into a zip file on Jenkins
Below is the code (with comments) that shows how this was achieved:
# Parse Through Each Server One By One
ForEach ($server in $servers) {
# Create a Variable to Take in the Invoke-Command and Receive what is Returned
$packet = Invoke-Command -ComputerName $server -ArgumentList $server -Credential $creds -ScriptBlock {
# Function for Creating a Zip File
Function Create-ZipFromFolder ([string]$Source, [string]$SaveAs) {
Add-Type -Assembly "System.IO.Compression.FileSystem"
[IO.Compression.ZipFile]::CreateFromDirectory($Source, $SaveAs)
}
# Function for Converting Zip to Bytes For File Transfer
Function Get-BytesFromFile($Path) {
return [System.IO.File]::ReadAllBytes($Path)
}
# Absorb and Maske Server IP for the Server to Use For Logging Purposes and FileNaming
$maskedAddress = $args[0] -Replace ('.+(?=\.\d+$)', 'XX.XX.XX')
# Construct the Path For Consolidated Log Files
$masterLogFolderPath = ("C:\Logs\RemoteLogs\$maskedAddress")
# Series of Code to Create Log Folder, Consolidate Files, And Delete Unnecessary Data For Cleanup
# Establish what to Save the Zip As. You Will Want The Path Included to Prevent it From Finding Path of Least Resistance Upon Saving
$zipFile = ($masterLogFolderPath + ".zip")
Try {
# Here is Where We Call Our Compression Function
Create-ZipFromFolder -Source $masterLogFolderPath -SaveAs ZipFile
# Next We Convert the Zip to Bytes
[byte[]]$FileAsBytes = Get-BytesFromFile -Path $zipFile
}
Catch {
Write-Error $_.Exception.Message
}
# Now We Return the New Object to Jenkins
return New-Object -TypeName PSCustomObject -Property #{Response=$FileAsBytes}
}
# Function to Convert the Bytes Back Into a Zip File
Function Create-FileFromBytes([byte[]]$Bytes, [string]$SaveAs) {
# It was Discovered that Depending Upon the Environment, Extra Bytes Were Sometimes Added
# These Next Lines Will Help to Remove Those
For (($k = 0), ($kill = 0); $kill -lt 1; $k++) {
If ($Bytes[$k] -gt 0) {
$kill = 1
# Truncate the Excess Bytes
$newByteArray = ($Bytes)[$k..(($Bytes).Length -1)]
}
}
# Reconstruct the Zip File
[System.IO.File]::WriteAllBytes($SaveAs, $NewByteArray)
}
Try {
# Call the Function to Begin Reconstruction
Create-FileFromBytes -SaveAs "$env:Workspace\MasterLog\$maskedAddress.zip" -Bytes $packet.response
}
Catch {
Write-Error $_.Exception.Message
}
Now this is just a base skeleton without all the heavy stuff, but these are the key pieces that put everything together. For the most part, following this formula will return the desired results. I hope this helps others who find themselves in a similar situation.
Thank you all again for your assistance
So first you will need to create a zip file.
You will need to load the assembly system first by using add-type.
Use the CreateFromDirectory method to build and save the zip file.
Use the System.IO.File class to read all the bytes from the zip. At this point you can send this data to another computer. Finally use the System.IO.File class to turn the byte array back into a file
function Create-ZipFromFrolder([string]$Source,[string]$SaveAs){
Add-Type -assembly "system.io.compression.filesystem"
[io.compression.zipfile]::CreateFromDirectory($Source, $SaveAs)
return $SaveAs
}
function Get-BytesFromFile([string]$File){
return [byte[]]$ZipFileBytes = [System.IO.File]::ReadAllBytes($File)
}
function Create-FileFromBytes([byte[]]$Bytes, [string]$SaveAs){
return [System.IO.File]::WriteAllBytes($SaveAs, $Bytes)
}
[byte[]]$FileAsBytes = Get-BytesFromFile -File (Create-ZipFromFrolder -Source "C:\ZipFolder" -SaveAs "C:\Test\test.zip")
Create-FileFromBytes -SaveAs "C:\ZipFolder\Test.zip" -Bytes $FileAsBytes

iTextSharp to merge PDF files in PowerShell

I have a folder which contains thousands of PDF files. I need to filter through these files based on file name (which will group these into 2 or more PDF's) and then merge these 2 more more PDF's into 1 PDF.
I'm OK with group the files but not sure the best way of then merging these into 1 PDF. I have researched iTextSharp but have been unable to get this to work in PowerShell.
Is iTextSharp the best way of doing this? Any help with the code for this would be much appreciated.
Many thanks
Paul
Have seen a few of these PowerShell-tagged questions that are also tagged with itextsharp, and always wondered why answers are given in .NET, which can be very confusing unless the person asking the question is proficient in PowerShell to begin with. Anyway, here's a simple working PowerShell script to get you started:
$workingDirectory = Split-Path -Parent $MyInvocation.MyCommand.Path;
$pdfs = ls $workingDirectory -recurse | where {-not $_.PSIsContainer -and $_.Extension -imatch "^\.pdf$"};
[void] [System.Reflection.Assembly]::LoadFrom(
[System.IO.Path]::Combine($workingDirectory, 'itextsharp.dll')
);
$output = [System.IO.Path]::Combine($workingDirectory, 'output.pdf');
$fileStream = New-Object System.IO.FileStream($output, [System.IO.FileMode]::OpenOrCreate);
$document = New-Object iTextSharp.text.Document;
$pdfCopy = New-Object iTextSharp.text.pdf.PdfCopy($document, $fileStream);
$document.Open();
foreach ($pdf in $pdfs) {
$reader = New-Object iTextSharp.text.pdf.PdfReader($pdf.FullName);
$pdfCopy.AddDocument($reader);
$reader.Dispose();
}
$pdfCopy.Dispose();
$document.Dispose();
$fileStream.Dispose();
To test:
Create an empty directory.
Copy code above into a Powershell script file in the directory.
Copy the itextsharp.dll to the directory.
Put some PDF files in the directory.
Not sure how you intend to group filter the PDFs based on file name, or if that's your intention (couldn't tell if you meant just pick out PDFs by extension), but that shouldn't be too hard to add.
Good luck. :)

VBscript or Powershell script to delete files older than x number of days, following shortcuts/links

I'm trying to create a script to delete all files in a folder and it's subfolders that are older than 45 days - I know how to do this, the problem is the parent folder in question has several links to other folders within itself - how do I prevent the script from deleting the links (as a link is a file), but instead treats the links like folders and to look "inside" the links for files older than 45 days.
If that's not possible, then is it possible to create a dynamic variable or array so that the script looks inside each folder I need it to and delete any files older than 45 days? If so, how do I do that.
Currently my only other option would be to create a separate script for each folder (or create code for each script in one file) and either call them individually or use yet another script to call each script.
For reference, this is in a Windows Server 2008 R2 environment
I can't work out a full solution right now. If I get time I'll come back and edit with one. Essentially I would create a function that would call itself recursively for folders anf for links where the .TargetPath was a folder. The creation of the recursive function is pretty standard fair. The only slightly opaque part is getting the .TargetPath of a .lnk file:
$sh = New-Object -COM WScript.Shell
$sc = $sh.CreateShortcut('E:\SandBox\ScriptRepository.lnk')
$targetPath = $sc.TargetPath
That is the PS way. The VBScript version is pretty much the same with a different variable naming convention and a different method for COM object instantiation.
So here is a more complete solution. I have not set up test folders and files to test it completely, but it should be pretty much what you need:
function Remove-OldFile{
param(
$Folder
)
$sh = New-Object -COM WScript.Shell
foreach($item in Get-ChildItem $Folder){
if ($item.PSIsContainer){
Remove-OldFile $item.FullName
}elseif($item.Extension -eq '.lnk'){
Remove-OldFile $sh.CreateShortcut($item.FullName).TargetPath
}else{
if(((Get-Date) - $item.CreationTime).Days -gt 45){
$item.Delete()
}
}
}
}
Remove-OldFile C:\Scripts
Just for completeness, here is an untested off the cuff VBS solution. I warn you that it may have some syntax errors, but the logic should be fine.
RemoveOldFiles "C:\Scripts"
Sub RemoveOldFiles(strFolderPath)
Dim oWSH : Set oWSh = CreateObject("WScript.Shell")
Dim oFSO : Set oFSO = CreateObject("Scripting.FileSystemObject")
For Each oFolder in oFSO.GetFolder(strFolderPath).SubFolders
RemoveOldFiles oFolder.Path
Next
For Each oFile in oFSO.GetFolder(strFolderPath).Files
if LCase(oFSO.GetExtensionName(oFile.Name)) = "lnk" Then
RemoveOldFiles oWSH.CreateShortcut(oFile.Path).TargetPath
Else
If DateDiff("d", oFile.DateCreated, Date) > 45 Then
oFSO.DeleteFile(oFile)
End If
End If
Next
End Sub
Very high level answer:
Loop through all files in current folder.
If `file.name` ends with `.lnk` (we have a link/shortcut).
Get the path of the shortcut with `.TargetPath`
You can now pass .TargetPath the same way you would pass the name of a subdirectory when you find one to continue recursing through the directory tree.