Tool/Script to Automate File Naming From Metadata - powershell

I have a folder containing 1000+ PDF files with generic names (i.e. DOC (1)) that I need to rename with the order number, which is listed inside the file itself. I've had to manually open each document, look for the order number, then change the file name.
I have a spreadsheet list of all of the order numbers contained in that folder. I've used the search function of the file explorer to look up each individual order #, then renaming whichever file comes up. This has reduced the task time substantially but it's still very time-consuming.
Does anyone know of a macro or scrip that I can build to automate the task? Again, I would need something to grab the order number line by line from the spreadsheet, then go to the file explorer to search for any files with that specific term, and finally rename whichever file comes up with said term.
Any and all relevant help is greatly appreciated.
Thank you in advance for your time!

Related

Rename a group of .txt files based on content that appears after a specific string in the text file?

I have a folder with hundreds of text files in it. The file names have zero meaning. I am trying to extract a variable number that appears after a string in each file and and rename the files using it. I'm somewhat of a Powershell newbie (an old timer form the DOS world) but I have written many useful scripts with it. I've searched high an low, this one has me stumped.
Any and all suggestions are welcome, or I'd happily pay someone for the snippet of code. -Ed

how to move file after reading the file in ibm datastage

i have 1 folder which has 4 files, they are sales_jan, sales_feb, debt_jan, debt_feb.I created specific job for each sales and debt. The thing is, if i already run the job previously for sales_jan only and then there comes sales_feb after that, i dont wanna repeat reading the sales_jan again, i only want to read the newest file added that hasn't been processed. For reading the file, i pass the pattern of the specific file (ex. sales_*) but if i use it like that, then the stage will reprocessed the sales_jan again although it already has. I want to move the file already been read into another folder. How do i exactly do it in ibm datastage? if there's no way to do it, what's your suggestion for my problem. Any ideas would be appreciated.
The easiest solution is to use an after-job subroutine (ExecSH on Linux/UNIX, ExecDOS on Windows) to move the file to a different location.
Since you're using wildcards for the Sequential File stage, you're going to have to be a bit more clever in handling a situation where your job processes only some of the files. I would prefer to write this using a loop in a sequence, processing one file at a time, so that the move can be handled per-file.
you might make a flag for every file which already read by your job. For example add a maxdate field for each file. When the first file max date is less than the second file or new file Then read the latest file. It can be done by using simple linux command in sequence or tranformer. Just like Ray mentioned before

Is it possible to extract metadata such as Content Created date from files - I can't get this with PowerShell

I need to extract the "Content Created" date out of thousands of files, but haven't been able to find a way to do this using PowerShell / other Command Line utility.
Does someone out there know a way to obtain this metadata? If so, please can you advise me. Thanks.
I've looked at various resources online, including this site, but haven't been successful thus far.
Here's a screenshot explaining what I'm trying to do.
I've been unable to find a native powershell cmdlet which does what you want. However, I found this article: Use PowerShell to Find Metadata from Photograph Files and the script it used: get file meta data function.
The article talks about image files, but the function is not specific for image files.
I tested it out on a folder containing a Word and an Excel file and the returned Metadata from the Word file contains the Content Created date. The Excel file does not contain/return that value. This is not unexpected as the Details tab of properties for the Excel file does not contain a Content Created value so it seems to be specific for Word files, and maybe some other file or document types.
Update:
You write that you need to extract this info from thousands of files, but if those files are anything but Word-files you probably won't be able to do that.
As far as I can tell this should work with the file types exposing the type of metadata you want. However, it seems that the ContentCreated property is unique to Word. I tried adding a text file (.txt), Acrobat PDF (.pdf), MS Access (.mdb), Excel (.xlxs) and a Word doc (.docx) file to my test folder and the only one that has/returns that metadata property is the Word file.
You should also be aware that the script seems to return metadata localized, so for me to programatically get the info i wanted I had to pipe the output of the script to Select-Object -Property Name,'InnehÄll skapat' (which is the Swedish name for Content created). So if you're running on a non-english system you may need to check what the output looks like before creating your Select-Object statement.
PowerQuery in Excel 2013 or later (data tab). Connect to data> Folder.

Extracting file names from an online data server in Matlab

I am trying to write a script that will allow me to download numerous (1000s) of data files from a data server (e.g, http://hydro1.sci.gsfc.nasa.gov/thredds/catalog/GLDAS_NOAH10SUBP_3H/2011/345/). Unfortunately, the names of the files in each directory are not formatted in a similar way (the time that they were created were appended to the end of the file name). I need to be able to specify the file name to subset the data (I have a special tool for these data types) and download it. I cannot find a function in matlab that will extract the file names.
I have looked at URLREAD, but it downloads everything including html code.
Thanks for your help!
You can easily parse the link.
x=urlread(url)
links=regexp(x,'<a href=''([^>]+)''>','tokens')
Reads every link, you have to filter all unwanted links.
For example this gets all grb files:
a=regexp(x,'<a href=''([^>]+.grb)''>','tokens')

C# folder and subfolder

Upon numerous searches, I am here to see if someone has any idea on how I should go about tackling this issue.
I have a folder with sub-folders. The sub-folder containers each has files of different file types e.g. pdf, png, jpeg, tiff, avi and word documents.
My goal is to write a code in C# that will go into the subfolder, and combined all the files into one pdf using the name of the folder. The only exception is that a file such as avi will not be pdf'ed in which case I want a nudge as to which folder it is and possibly file name. I am trying to use the form approach, so that you can copy in the folder pathname and also destination of the created pdf.
Thanks.
to start, create a FolderBrowserDialog to get the root folder. Alternatively just make a textbox in which you paste the folder name ( less preferred since the first method gives you nicer error-handling straight out of the box )
In order to iterate through, see How to: Iterate Through a Directory Tree
To find the filetype, check System.IO.FileInfo.Extension for each file you iterate through. Add those to list with the data you need. ( hint, create a list of objects in which your object reflects the data you need such as path, type etc,... ). If its an avi don't toss it in the list but flash a warning (messagebox?) instead.
From here the original question gets fuzzy. What exactly do you need in the pdf. Just the filenames and locations or do you actually want to throw the actual contents of the file in pdf?