I am learning how png works and trying to create a simple PNG decoder with "pure" C++.
My problem is that I don't know how zlib headers are stored in multiple IDAT-PNG chunks. The first IDAT chunk looks fine - a normal "CM" and "CINFO", but when I read the next IDAT chunk the zlib header looks strange, the "CM" can be a random number - not 8 as default and the "CINFO" can be above 7 - I readed that "CINFO" with a number above 7 is marked as corrupted/not acceptable. So where can I find some information about this? -I didn't find anything about handling multiple IDAT chunks on the web. (Uh, I find something here - in "StackOverFlow", but it doesn't seem to describe how zlib headers are stored in multiple IDAT chunks so it doesn't answer my question)
I read the RFC 1950 about zlib.
https://www.rfc-editor.org/rfc/rfc1950
There is only one zlib header, in the first chunk. The series of IDAT chunks is a single zlib stream, broken up into pieces.
You need to read the PNG spec more carefully.
Related
In the PNG spec, uncompressed blocks include two pieces of header information:
LEN is the number of data bytes in the block. NLEN is the one's complement of LEN
Why would the file include the one's complement of a value? How would this be used and/or for what purpose?
Rather than inventing a new compression type for PNG, its authors decided to use an existing industry standard: zlib.
The link you provide does not point to the official PNG specifications at http://www.w3.org/TR/PNG/ but only to this part: the DEFLATE compression scheme. NLEN is not mentioned in the official specs; it only says the default compression is done according to zlib (https://www.rfc-editor.org/rfc/rfc1950), and therefore DEFLATE (https://www.rfc-editor.org/rfc/rfc1951).
As to "why": zlib precedes current day high-speed internet connections, and at the time it was invented, private internet communication was still done using audio line modems. Only few institutions could afford dedicated landlines for just data; the rest of the world was connected via dial-up. Due to this, data transmission was highly susceptible to corruption. For simple text documents, a corrupted file might still be usable, but in compressed data literally every single bit counts.
Apart from straight-on data corruption, a dumb (or badly configured) transmission program might try to interpret certain bytes, for instance changing Carriage Return (0x0D) into Newline (0x0A), which was a common option at the time. "One's complement" is the inversion of every single bit for 0 to 1 and the reverse. If either LEN or NLEN happened to be garbled or changed by the transmission software, then its one's complement would not match anymore.
Effectively, the presence of both LEN and NLEN doubles the level of protection against transmission errors: if they do not match, there is an error. It adds another layer of error checking over zlib's ADLER32, and PNGs own per-block checksum.
Using perl, what is the best way to determine whether a file is a PDF?
Apparently, not all PDFs start with %PDF. See the comments on this answer: https://stackoverflow.com/a/941962/327528
Detecting a PDF is not hard, but there are some corner cases to be aware of.
All conforming PDFs contain a one-line header identifying the PDF specification to which the file conforms. Usually it's %PDF-1.N where N is a digit between 0 and 7.
The third edition of the PDF Reference has an implementation note that Acrobat viewer require only that the header appears within the first 1024 bytes of a file. (I've seen some cases where a job control prefix was added to the start of a PDF file, so '%PDF-1.' weren't the first seven bytes of the file)
The subsequent implementation note from the third edition (PDF 1.4) states: Acrobat viewers will also accept a header of the form: %!PS-Adobe-N.n PDF-M.m but note that this isn't part of the ISO32000:2008 (PDF 1.7) specification.
If the file doesn't begin immediately with %PDF-1.N, be careful because I've seen a case where a zip file containing a PDF was mistakenly identified as a PDF because that part of the embedded file wasn't compressed. so a check for the PDF file trailer is a good idea.
The end of a PDF will contain a line with '%%EOF',
The third edition of the PDF Reference has an implementation note that Acrobat viewer requires only that the %%EOF marker appears within the last 1024 bytes of a file.
Two lines above the %%EOF should be the 'startxref' token and the line in between should be a number for the byte offset from the start of the file to the last cross reference table.
In sum, read in the first and last 1kb of the file into a byte buffer, check that the relevant identifying byte string tokens are approximately where they are supposed to be and if they are then you have a reasonable expectation that you have a PDF file on your hands.
The module PDF::Parse has method called IsaPDF which
Returns true, if the file could be parsed and is a PDF-file.
I have a bunch (about 1200) of jpg/jpeg files, which have a filename pattern of: IMG-YYYYMMDD-WA####.jpg or .jpeg. None of them have any exif data. I would like to (batch) add exif dates (created, modified, ...) using the date pattern in the filename. Time doesn really matter for me.
I have searched this (and other) forums, but i cannot find anything related to ADDING these dated to jpeg files. I was hoping someone here could help me out.
EDIT: Using Linux (Mint 17,1)
This should not be difficult to write. What you need to create is a filter that:
Removes the existing JPEG file APPn header
Inserts an EXIF header with the date.
You would not need to mess with the compressed data at all. You're going to need to read a bit of the JPEG standard, just enough to get an idea of the block structure. Do a byte-by-byte copy until you hit an APPn marker.The APPn markers have byte counts so you know how much to skip over. Insert your own EXIF marker into the stream. Then copy the rest of the data.
You're need to read the EXIF standard to figure out how to format the header.
I'm trying to read the binary content of a text file (which contains the compressed version of a different text file). The first two characters (01111011 and 00100110) are correct (going by the values that were original put there during the compression process.
However, when it gets to the third character, it should be reading 10010111 (again, going by what was added during the compression process), but instead it reads 10000000010100 (aka 8212). Does anyone know what is causing this discrepancy, or how to fix it? Thanks!
The Java FileReader should not be used to read binary data from files since it reads a character at a time using the default encoding (which is most likely not very good for binary reading)
Instead, use FileInputStream which has read methods that reads actual raw bytes without any encoding applied.
I have a file that only contains the mdat atom in a MP4 container. The data in the mdat contains AVC data. I know the encoding parameters for the data. The format does not appear to be in the Annex B byte stream format. I am wondering how I would go about parsing this. I have tried searching for the slice header, but have not had much luck.
Is it possible to parse the slices without the NAL's?
AVC NAL units are in the following format in MDAT section:
[4 bytes] = NAL length, network order;
[NAL bytes]
Shortly, start codes are simply replaced by lengths.
Be careful! The NAL Length is not required to be 4! The AvcConfigurationBox ('moov/trak/mdia/minf/stbl/stsd/avc1/avcC') contains a field 'lengthSizeMinusOne' specifying the length. But the default is 4.
I found what michael was talking about defined in section 5.2.3 of ISO 14496-15.
Sebastian's answer refers to section 5.2.4.1.1 and 5.3.4.1.2.
You will not be able to parse the slices in the 'mdat' box without copies of the SPS and PPS from the 'avcC' box (defined in section 5.2.4.1.1)