Trying to find a way in powershell that allowed me to move a file based on its size. I could not find exactly what I was looking for. I found how to move files of only a certain size and to do other if/then statements but not to move a file to different locations based on there size.
Why did I need/want to do this? A exe I am running creates and output even if it has no data. so sometimes the file is empty and sometimes it has data. When it has data I need it sent to someone, when its empty I just wanted it in a backup folder for reference.
This part let me move a file based on size: -cle is less than or equal to
$BlankFiles = Get-ChildItem c:\test\*.rej | where { $_.Length -cle 0kb}
This part let me check if an empty file exist: After lots of reading went with system.io.file over test-path
[System.IO.File]::Exists($BlankFiles)
Putting this all in a IF/ELSE statement was the problem i struggled with. Answer I came up with is below.
I am mainly posting this since I could not find the exact scenario and if any one sees a problem with this approach that I missed.
Here is the solution I came up with and it all the test I did it appears to be working as intended. Note: I only need to do this on one file at a time, which is why this works and why I left out recursive or loop steps.
If the file is blank it moves it to a backup folder and appends it with the date, if it has data it makes a copy with the date append to the backup folder and moves the file with date append to a different location that is accessible to the necessary users.
I was thinking about going with check to see how many lines are in the file over the size of the file, but it appears the file when blank sometimes has a return in it and sometimes it doesn't. So I went with size method instead
$BlankFiles = Get-ChildItem c:\test\*.rej | where { $_.Length -cle 0kb}
$date = Get-Date
$fndate = $date.ToString("MMddyyyy")
If ([System.IO.File]::Exists($BlankFiles) -eq "True") {
Move-Item C:\test\*.rej c:\test\blankfiles -"$fndate".rej
}
Else {
Copy-Item c:\test\*.rej c:\test\realfiles-"$fndate".rej
Move-Item c:\test\*.rej c:\user\accessible\realfiles-"$fndate".rej -Force
}
If anyone see any issues with doing this way or has a better suggestions, but as I mentioned from my test it appears to be working wonderfully and I thought I would share.
Related
I know there are a lot of questions asked/answered related to this but my question has twists.
So I'm comparing 2 folders that has huge amount of data (over 20gb and can go up to 40gb) one of them being OneDrive.
I'm trying to compare and find the missing ones along with which ones are newer. I can accomplish either or but regardless which one I try because the folders are huge, it takes a long time and sometimes even crashes. On top of that, when you run the script, it tries to download the files on OneDrive (even tho they are present when you do Test-Path.
I found a post that does both (link below) but wondering if there is an easier way to accomplish this without downloading or putting it in a variable?
Thank you everyone in advance!
https://serverfault.com/questions/532065/how-do-i-diff-two-folders-in-windows-powershell/637776?newreg=b08ad3ef3c8e45d48ac0d17676a28df4
you can try with compare-object but you have to get all child items before like this:
$gci1 = Get-ChildItem -Recurse "Path to Folder"
$gci2 = Get-ChildItem -Recurse "Path to Folder"
Compare-Object $gci1 $gci2
So right now I have a program that moves files automatically from one folder to another only once.
So if that file gets into that folder again, it shouldn't be moved.The application is being executed every 30 minutes. So right now what I have is if LastWriteTime is older than 30 minutes, don't move it.
# Check if file is older than 30 minutes
$olderthan = #(Get-ChildItem -Path $src\$_.pdf | ? { $_.LastWriteTime -ge $date} -ov olderthan)
if (-not $olderthan){
# If it's older than 30 minutes, move no file
$timesall = #(Get-ChildItem -Path $src\$_.pdf | Select-Object -Property BaseName)
write-LogRecord -Typ WARNING "'$($timesall.BaseName)' file(s) are not being moved because they're older than 30 minutes"
$timesall = 0
} else {
#Move File
}
And yes it works, but are there other, better ways to do it?
Thanks in advance!
The other alternative to inspecting file attributes is to do file tracking. I'll assume that the files do not continue to live in the destination folder (otherwise you can use TEST-PATH to see if a file exists before moving).
To me, the most straight forward tracking system would be to create a parallel folder where you can put files with the same name into it. Assuming the file has not been submitted before you would copy A.txt into your destination, and create a A.txt in your tracking path (which could be a empty file, or not, see below). Now you test is to see if the same named file exists in your tracking folder.
Note: this method allows you to easily reprocess a file by removing it from your tracking folder. It also just works when the scheduler does not fire, for whatever reason.
If you need more complex options, like accommodating a file that has changed, you could store finger print information, like size and a hash, in your tracking file. Your test the could also inspect those as part of it's test.
Lastly, at some point you'd probably want to groom your tracking folder. Using LastWriteTime and removing everything past, say, 1 month (or whatever if right for your circumstances) would keep your tracking folder from getting too big. You could run this every time after the transfers, or on a separate schedule.
I am currently trying to write a powershell script that does the following:
Go through all PDF-Files in the directory in which the script is in
Check the first few bytes of those PDF-Files
If those bytes say something along the lines of "PK", move them to a different location
If the bytes say something else (ex: PDF1.4), dont move them at all and go to the next one.
Context: We have around 70k PDF-Files that cant be opened. After checking them with a certain tool, it looks like around 99% of those are damaged and the remaining 1% are zip files.
The first bytes of a zipped PDF file start with "PK", the first bytes of a broken PDF-File start with PDF1.4 for example.
I need to unzip all zip files and relocate them. Going through 70k PDF-Files by hand is kinda painful, so im looking for a way to automate it.
I know im supposed to provide a code sample, but the truth is that i am absolutely lost. I have written a few powershell scripts before, but i have no idea how to do something like this.
So, if anyone could kindly point me to the right direction or give me a useful function, i would really appreciate it a lot.
You can use Get-Content to get your first 6 bytes as you asked.
We can then tie that into a loop on all the documents and configure a simple if statement to decide what to do next, e.g. move the file to another dir
EDITED BASED ON YOUR COMMENT:
$pdfDirectory = 'C:\Temp\struktur_id_1225\ext_dok'
$newLocation = 'C:\Path\To\New\Folder'
Get-ChildItem "$pdfDirectory" -Filter "*.pdf" | foreach {
if((Get-Content $_.FullName | select -first 1 ) -like "%PDF-1.5*"){
$HL7 = $_.FullName.replace("ext_dok","MDM")
$HL7 = $HL7.replace(".pdf",".hl7")
move $_.FullName $newLocation;
move $HL7 $newLocation
}
}
Try using the above, which is also a bit easier to edit.
$pdfDirectory will need to be set to the folder containing the PDF Files
$newLocation will obviously be the new directory!
And you will still need to change the -like "%PDF-1.5*" to suit your search!
It should do the rest for you, give it a shot
Another Edit
I have mimicked your folder structure on my computer, and placed a few PDF files and matching HL7 files and the script is working perfectly.
Get-Content is not suited for PDF's, you'd want to use iTextSharp to read PDF's.
Download the iTextSharp(found in releases) and put the itextsharp.dll somewhere easy to find (ie. the folder your script is located in).
You can install the .nupkg by using Install-Package, or simply using an archive tool to extract the contents of the .nupkg file (it's basically a .zip file)
The code below adds every word on page 1 for each PDF separated by whitespace to an array. You can then test if the array contains your keyword
Add-Type -Path "C:\path\to\itextsharp.dll"
$pdfs = Get-ChildItem "C:\path\to\pdfs" *.pdf
foreach ($pdf in $pdfs) {
$reader = New-Object itextsharp.text.pdf.pdfreader -ArgumentList $pdf.Fullname
$text = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($reader,1).Split("")
foreach($line in $text) {
# do your test here
}
}
Using Powershell to Strip Content from PDF While Keeping PDF Format.
My Task:
I have been attempting to perform what would be a simple task if the documents were not in PDF format. I have a bunch of PDFs that have unwanted data before the bulk of usable data starts, this is anything that comes before ‘%PDF’ in the documents. A script that pulls all the desired data and exports it to a new file was needed. That part was super easy.
The Problem:
The data that is exported appears to be formatted correctly, except it doesn’t open as a PDF anymore. I can open it in Notepad++ and it looks identical to one that was clean manually and works. Examining the raw code of the Powershell altered PDF it appears that the ‘lines’ are much shorter than they should be.
$Path = 'C:\FileLocation'
$Output = '.\MyFile.pdf'
$LineArr = #()
$Target = Get-ChildItem -Path $Path -Filter *.pdf -Recurse -ErrorAction SilentlyContinue | Get-Content -Encoding default | Out-String -stream
$Target.Where({ $_ -like '*%PDF*' }, 'SkipUntil') | ForEach-Object{
If ($_.contains('%PDF')){
$LineArr += "%" + $_.Split('%')[1]
}
else{
$LineArr += $_
}
}
$LineArr | Out-File -Encoding Default -FilePath $Output
I understand the PDF format doesn't really use lines, so that might be where the problem is being created. Either when the data is being initially put into an array, or when it’s being written the PDF format is probably being broken. Is there a way to retain the format of the PDF while it is modified and then saved? It’s probably the case that I’m missing something simple.
So I was about to start looking at iTextSharp and decided to give an older language a try first, Winbatch. (bleh!) I almost made a screen scraper to do the work but the shame of taking that route got the better of me. So, the function library was the next stop.
This is just a little blurb I spit out with no error checking or logging going on at this point. All that will be added in along with file searches later. All in all it manages to clear all the unwanted extras in the PDF but keeping the exact format that is required by PDFs.
strPDFdoco = "C:\TestPDFs\Test.pdf"
strPDFString = "%%PDF"
strPDFendString = "%%%%END"
If FileExist(strPDFdoco)
strPDFName = ItemExtract(-1, strPDFdoco, "\")
strFixedPDFFullPath = ("C:\TestPDF\Fixed\": strPDFName)
strCurrentPDFFileSize = FileSize(strPDFdoco) ; Get size of PDF file
hndOldPDFFile = BinaryAlloc(strCurrentPDFFileSize) ; Allocate memory for reading PDF file
BinaryRead(hndOldPDFFile, strPDFdoco) ; Read PDF file
strStartIndex = BinaryIndexEx(hndOldPDFFile, 0, strPDFString, #FWDSCAN, #FALSE) ; Find start point for copy
strEndIndex = BinaryEodGet(hndOldPDFFile) ; find eof
strCount = strEndIndex - strStartIndex
strWritePDF = BinaryWriteEx( hndOldPDFFile, strStartIndex, strFixedPDFFullPath, 0, strCount)
BinaryFree(hndOldPDFFile)
ENDIF
Now that I have an idea how this works, making a tool to do this in PS sounds more doable. There's a PS function out there in the wild called Get-HexDump that might be a good base to educate myself on bits and hex in PS. Since this works in Winbatch I assume there is some sort of equivalent in AutoIt and it could be reproduced in most basic languages.
There appears to be a lot of people out there trying to clear crud from before the header and after the end of their PDF docos, Hopefully this helps, I've got a half mill to hit with whatever script I morph this into. I might update with a PS version if I decide to go that route again, and if I remember.
I have a folder with x amount of web log files and I need to prep them for bulk import to SQL
for that I have to run preplog.exe into each one of them.
I want to create a Power script to do this for me, the problem that I'm having is that preplog.exe has to be run in CMD and I need to enter the input path and the output path.
For Example:
D:>preplog c:\blah.log > out.log
I've been playing with Foreach but I haven't have any luck.
Any pointers will be much appreciated
I would guess...
Get-ChildItem "C:\Folder\MyLogFiles" | Foreach-Object { preplog $_.FullName | Out-File "preplog.log" -Append }
FYI it is good practice on this site to post your not working code so at least we have some context. Here I assume you're logging to the current directory into one file.
Additionally you've said you need to run in CMD but you've tagged PowerShell - it pays to be specific. I've assumed PowerShell because it's a LOT easier to script.
I've also had to assume that the folder contains ONLY your log files, otherwise you will need to include a Where statement to filter the items.
In short I've made a lot of assumptions that means this may not be an accurate answer, so keep all this in mind for your next question =)