Does exiftool require the complete file for extracting metadata - exiftool

This question is about extraction of metadata only.
Is it required for exiftool to get a complete file for propperly working?
Scenario:
I want to extract the metadata of a 20 GB video file. Do I need to provide exiftool with the complete file (via stdin), or is it enough to provide it with a certain amount of bytes.
Motivation:
I am programatically (golang) calling exiftool in a streaming context and want to have the extraction as fast as possible. Magic numbers for filetypes work with the first 33 bytes and I am wondering if that is possible with the exiftool metadata as well.

The answer depends upon the file and the location of the metadata within that file.
There are a couple of threads on the subject on the ExifTool forums (link 1, link 2) and Phil Harvey, the author, says that about half the time the in the case of MP4/MOV videos, the metadata is at the end of the file.
Using the -fast option might help. I've done some quick tests using cURL and a large image file (see the second to the last example under Piping Examples) and in that case cURL didn't download the whole image file, just enough to extra the metadata. It might be different with a video file though, as I haven't tested that situation.

Related

Is it possible to extract metadata such as Content Created date from files - I can't get this with PowerShell

I need to extract the "Content Created" date out of thousands of files, but haven't been able to find a way to do this using PowerShell / other Command Line utility.
Does someone out there know a way to obtain this metadata? If so, please can you advise me. Thanks.
I've looked at various resources online, including this site, but haven't been successful thus far.
Here's a screenshot explaining what I'm trying to do.
I've been unable to find a native powershell cmdlet which does what you want. However, I found this article: Use PowerShell to Find Metadata from Photograph Files and the script it used: get file meta data function.
The article talks about image files, but the function is not specific for image files.
I tested it out on a folder containing a Word and an Excel file and the returned Metadata from the Word file contains the Content Created date. The Excel file does not contain/return that value. This is not unexpected as the Details tab of properties for the Excel file does not contain a Content Created value so it seems to be specific for Word files, and maybe some other file or document types.
Update:
You write that you need to extract this info from thousands of files, but if those files are anything but Word-files you probably won't be able to do that.
As far as I can tell this should work with the file types exposing the type of metadata you want. However, it seems that the ContentCreated property is unique to Word. I tried adding a text file (.txt), Acrobat PDF (.pdf), MS Access (.mdb), Excel (.xlxs) and a Word doc (.docx) file to my test folder and the only one that has/returns that metadata property is the Word file.
You should also be aware that the script seems to return metadata localized, so for me to programatically get the info i wanted I had to pipe the output of the script to Select-Object -Property Name,'InnehÄll skapat' (which is the Swedish name for Content created). So if you're running on a non-english system you may need to check what the output looks like before creating your Select-Object statement.
PowerQuery in Excel 2013 or later (data tab). Connect to data> Folder.

Troubleshooting "no writeable tags set" error

I'm trying to (ultimately) modify a batch of files but getting stuck in the basics as I try to modify a single file before running a batch command.
If someone could help me troubleshoot the command I'm inputting, that would be fantastic. I'm sure it's something very simple.
Thanks a lot for any help you can provide!
Here's the abbreviated image exif data:
-ExifToolVersion=10.10
-FileName=2018_11_13_1.jpeg
-Directory=.
-FileSize=2.8 MB
-FileModifyDate=2019:07:12 15:40:38-07:00
-FileAccessDate=2019:07:12 15:40:38-07:00
-FileInodeChangeDate=2019:07:23 10:38:02-07:00
-FilePermissions=rw-rw-r--
-FileType=JPEG
-FileTypeExtension=jpg
-MIMEType=image/jpeg
[...]
-ModifyDate=2018:11:13 12:00:53
[...]
-DateTimeOriginal=2018:11:13 12:00:53
-CreateDate=2018:11:13 12:00:53
My current input is: exiftool "-FileModifyDate<$filename00000" ./2018_11_13_1.jpeg
And the error message is:
Warning: No writable tags set from 2018_11_13_1.jpeg
0 image files updated
1 image files unchanged
And the exif data is, of course, unchanged.
I've confirmed that I can write a value to this tag, so there's definitely something going wrong in pulling from the filename.
( Continued from How to compensate for incomplete date/time info in filename )
The problem here is that you are trying to write from a tag named filename00000. If you check the example in the other post, you will see that there is a space after Filename. This sets it apart so that exiftool knows which is a tag name and which is other data.
There is possibly an additional problem here, though. Your filename has an extra number that is not the date. When exiftool tries to write the time stamp from the filename, it is going to end up with a value of "2018:11:13 10:00:00", which might become especially problematic if that last digit hits a value of 3 or more, resulting in a timestamp of "2018:11:13 30:00:00".
I would suggest using exiftool's Advanced Formatting Feature (a fancy way of saying that you can use perl code in the command) to strip the excess data. Something like
exiftool "-FileModifyDate<${filename;s/^(.*\d{4}_\d\d_\d\d).*/$1/} 000000" ./2018_11_13_1.jpeg
Though take note, if the filenames are in any other format, then it would require a different command.

Extracting file names from an online data server in Matlab

I am trying to write a script that will allow me to download numerous (1000s) of data files from a data server (e.g, http://hydro1.sci.gsfc.nasa.gov/thredds/catalog/GLDAS_NOAH10SUBP_3H/2011/345/). Unfortunately, the names of the files in each directory are not formatted in a similar way (the time that they were created were appended to the end of the file name). I need to be able to specify the file name to subset the data (I have a special tool for these data types) and download it. I cannot find a function in matlab that will extract the file names.
I have looked at URLREAD, but it downloads everything including html code.
Thanks for your help!
You can easily parse the link.
x=urlread(url)
links=regexp(x,'<a href=''([^>]+)''>','tokens')
Reads every link, you have to filter all unwanted links.
For example this gets all grb files:
a=regexp(x,'<a href=''([^>]+.grb)''>','tokens')

Details of header info for a video file format

I need to append 2 video files by appending their NSMutableData as I have already done this with audio files and it is done correctly but not with video files.
It may be because data bytes contain some header info and I will need to remove these bytes from the 2nd video but I don't know that how many bytes should I remove?
You haven't told us the file format in question, but generally:
Look up the specification for the file format in question and find out what the header looks like. Then code a solution based on this information.
There are many resources on the net with file format information, but one place you might want to look is http://www.wotsit.org/.

How to read a file from the disk if less than X days old, if older, refetch the html file

I wish to read an html file off of the internet and cache it. Then when I go back, because I'm debugging, I don't want to hammer the servers with the numerous requests I'll need. I don't want to get my IP banned for slamming the server over and over again just because I'm debugging. So my code needs to look something like:
if ((file > days_old) || !(file exists))
fetch html file from internet
save file to disk
else
read it from the disk
Because there will be multiple files, I'll need to include a variable name in the file name so the file is unique and I can easily look it up again.
I just learned Perl this semester and we only learned the basics & a bit of regex, once I get this I should be mostly fine.
Thanks!
Use an existing module:
Cache::Cache
HTTP::Cache::Transparent
If you really want to implement your own, you'll want to look at the If-Modified-Since and ETag HTTP headers to determine when to re-fetch a file, rather than an arbitrary days_old number you suck out of your thumb. You will also have to generate a unique filename, preferably with a hash function, while retaining the original URL to cater for hash collisions.