conversion from jpeg to png algorithm - png

On a linux machine, you can change file name from "abc.jpeg" to "abc.png" by just doing rename. I am still able to open my pic in image viewer.
I wanted to know if the compression technique also changes when we change the name, or just the name changes and the image viewer itself opens it the way it likes?

The file name has nothing whatsoever to do with the contents of a file stream. Every decoder I have ever encountered will decode the stream based upon its contents; and not the file name. Some decodes will warn if the file name does not correspond with the stream type.
To your question, renaming does not change the contents of the file on any system I have ever seen.

As pointed out by yacc in comments, renaming that file won't change the compression method, if we want a true png we will have to convert it.
The hex signature of image/file is checked before opening it in the image viewer. Changing name does not change this signature and hence, even though your file name is .png, it is actually a jpeg and your viewer knows it by the hex signature.
https://en.wikipedia.org/wiki/List_of_file_signatures

Related

doc or docx: Is there safeway to identify the type from 'requests' in python3?

1) How can I differentiate doc and docx files from requests?
a) For instance, if I have
url='https://www.iadb.org/Document.cfm?id=36943997'
r = requests.get(url,timeout=15)
print(r.headers['content-type'])
I get this:
application/vnd.openxmlformats-officedocument.wordprocessingml.document
This file is a docx.
b) If I have
url='https://www.iadb.org/Document.cfm?id=36943972'
r = requests.get(url,timeout=15)
print(r.headers['content-type'])
I get this
application/msword
This file is a doc.
2) Are there other options?
3) If I save a docx file as doc or vice-versa may I have recognition problems (for instance, for conversion to pdf?)? Is there any kind of best practice for dealing with this?
The mime headers you get appear to be the correct ones: What is a correct mime type for docx, pptx etc?
However, the sending software can only go on what file its user selected – and there still are a lot of people sending files with the wrong extension. Some software can handle this, others cannot. To see this in action, change the name of a PNG image to end with JPEG instead. I just did on my Mac and Preview still is able to open it. When I press ⌘+I in the Finder it says it is a JPEG file, but when opened in Preview it gets correctly identified as a "Portable Network Graphics" file. (Your OS may or may not be able to do this.)
But after the file is downloaded, you can unambiguously differ between a DOC and a DOCX file, even if the author got its extension wrong.
A DOC file starts with a Microsoft OLE Header, which is quite complicated structure. A DOCX file, on the other hand, is a compound file format containing lots of smaller XML files, compressed together using a standard ZIP file compression. Therefore, this file type always will start with the two characters PK.
This check is compatible with Python 2.7 and 3.x (only one needs the decode):
import sys
if len(sys.argv) == 2:
print ('testing file: '+sys.argv[1])
with open(sys.argv[1], 'rb') as testMe:
startBytes = testMe.read(2).decode('latin1')
print (startBytes)
if startBytes == 'PK':
print ('This is a DOCX document')
else:
print ('This is a DOC document')
Technically it will confidently state "This is a DOC document" for anything that does not start with PK, and, conversely, it will say "This is a DOCX document" for any zipped file (or even a plain text file that happens to start with those two characters). So if you further process the file based on this decision, you may find out it's not a Microsoft Word document after all. But at least you will have tried with the proper decoder.

merging PDFs with Ghostscipt ignoring outline and using pdfmark instead

I am using a Batch script to merge different PDFs in one complete file.
%gsc% -dBATCH -sDEVICE=pdfwrite -sPAPERSIZE=letter -dEPSFitPage -o %dsk%%zus%%ext% %mfd% %pth%tmp\pdfmarks
%dsk%%zus%%ext%: Path and name of final (complete) document
%mfd%: Path and name of docs to be merged (c:\test\1.pdf c:\test\2.pdf ...)
%pth%tmp = path to the pdfmarks file
Additionally, I am creating a pdfmark document inside the script which gs uses to create the bookmarks. But unfortunately, some of the docs I am merging, have already their own bookmarks and I did not yet find a solution how to ignore those. GS should only use the bookmarks inside the pdfmarks file.
How can this be done?
Firstly; you are not 'merging' PDF files when you use Ghotscript's pdfwrite device. The process is described in detail here
The important point is that the way the input file(s) are constructed has no bearing on the way the output file is constructed. If any other software you use relies on the file being constructed in a particular fashion it may not work on the output PDF file.
The -dEPSFitPage switch only has any effect when the input is an EPS file. If you want to 'fit' PostScript or PDF files then you need to use -dPDFFitPage, -dPSFitPage or just -dFitPage. However, all of these rely on you first selecting a media size, and then preventing it being altered by setting -dFIXEDMEDIA. For EPS files you would more normally use -dEPSCrop which sets the media size to the EPS declared BoundingBox.
You can prevent the PDF interpreter reading the Outlines tree (which you are calling Bookmarks) and then creating a pdfmark from it to pass to the pdfwrite device by using the -dNO_PDFMARK_OUTLINES switch which oddly isn't documented, presumably an oversight.

How to convert .SFF file format to .BMP or .PNG or .JPG?

I need to convert my SFF file to PDF, then i need verify the document. i.e SFF file and converted file.
For that, I think to convert SFF file to image file and PDF file to image file.
Then comparing the both file using image processing.
To do this method:
Im searching for a program to convert SFF to BMP
Does anyone know such a program or has another idea how to do the job?
Thank you in advance...
Looks like you need reaConvertor. It appears to be a matured tool you can rely on. There is an online version of the tool here
I think:
https://github.com/Sonderstorch/sfftools
will do what you need (convert sff -> tiff/jpeg/..) and then you can use imageMagic (for example) to go to PDF.
Clearly not a current well used image format, however if you have legacy.sff Structured Fax Format, they are similar (not exactly identical) to a Monochrome G4 format.
By far the simplest programmable method to convert is using IrfanView which can Read Modify and Resave as other formats in batches.
Out put can be any other modern image type including Mono.BMP, G4.fax or as PDF (with or without GhostScript)

Generate PNG file with pre-known CRC

Is it possible to create a PNG file with a predefined CRC? (kind of a programming challenge..)
I have a python script to generate hex codes with the target CRC, but I'm not sure how to make a valid PNG out of it.
BTW - it may be that I'm talking nonsense, but it sounds possible on theory (right?)
You can use spoof.c to do that, either at the level of a PNG chunk or at the level of the entire file. (Note that a PNG file does not contain a CRC of the whole thing, only CRCs of the chunks.)

load unix executable file to ascii

I am simply trying to load ascii files with two columns of data (spectral data).
They were saved originally as .asc.
I need to open and edit them using text editor before I can load them into Matlab to erase the headers, but some of them somehow got converted to unix executable foramt with the .asc extension. And others are plain text docs also with the same extension. I have no idea why they got saved with the same extension and with my same manipulation as different kind formats.
When I use the load command in Matlab, the plain text docs load normally as expected but the ones saved as unix executable kinds give me this error:
Error using load Unable to read file filename.asc: No such file or
directory.
How can I either resave them (still with the same extension) or otherwise load them to be read by Matlab as standard two column data matrixes?
Thanks!
If these are truly plain text files, try renaming the file from xxx.asc to xxx.txt. Then, see if you are able to edit them as desired.