Accuracy of iTextsharp pdf reader - itext

I have been successfully able to read pdf files in my .Net code using itextsharp.pdf.pdfreader. But it misread some characters some time, there is no fixed set of characters that it misread(my observation). I think there is some limitation with this wrt the pdf pixel which it can read with 100% accuracy. Does anyone has their own observation on what has been the limitations of itextsharp.pdfreader?

Related

Convert pixel array to DNG in SWIFT ... without libtiff/libjpeg?

I have raw image data, a pixel array (actually a single string of data) that I want to convert to the most basic and minimal DNG format/file. I saw different approaches utilizing libtiff and libjpeg.
Is there a simpler and more straight forward way to do this in SWIFT using Core Graphics, Core Image or yet another SWIFT API?
The image data are still in the original Bayer matrix format (CFA pattern is RGGB) without any tags.
I downloaded the over 100 pages of the latest Adobe DNG format specification, but I cannot find the actual byte order or structure of the DNG file in there... I am pretty lost with this document.
Any help and hints are highly appreciated.

Understanding webp encoder options

I'm currently experimenting with webp encoder (no wic) on windows 64 environment. My samples are 10 jpg stock photos depicting landscapes and houses, and the photos already optimized in jpegtran. I do this because my goal is to optimize the images of a whole website where the images have already been compressed with photoshop using the save for web command with various values on quality and then optimized with jpegtran.
I found out that using values smaller than -q 85 have a visual impact on the quality of the webp images. So I'm playing with values above 90 where the difference is smaller. I also concluded that I have to use -jpeg_like because without it the output is sometimes bigger in size than the original, which is not acceptable. I also use -m 6 -f 100 -strong because I really don't mind about the time the encoder needs to produce the output and trying to achieve the smoother results. I tried several values for these and concluded that -m 6 -f 100 -strong have the best output regarding quality and size.
I also tried the -preset photo avoiding any other parameter except -q but the size of the output gets bigger.
What I don't understand from https://developers.google.com/speed/webp/docs/cwebp#options are the options -sns , -segments which seem to have a great impact on the output size. Sometimes the output is bigger and sometimes smaller in size for the same options but I haven't concluded yet what is the reason for that and how to properly use them.
I also don't understand the -sharpness option which doesn't have an impact at the output size at least for me.
My approach is far less than a scientific approach and more like a trial and error method and If anybody knows how to use those options for the specific input and explain them for optimum results I would appreciate such a feedback.
-strong and -sharpness only change the strength of the filtering in the header of the compressed bitstream. They will be used at decoding time. That's why you don't see a change in file size for these.
-sns controls the choice of filtering strength and quantization values within each segments. A segment is just a group of macroblocks in the picture, that are believed to be sharing similar properties regarding complexity and compressibility. A complex photo should likely use the maximum allowed 4 segments (which is the default).

How to extract digits (number) using Matlab

At work I have to record a lot of data from png data. Every time I have to manually record the digits (e.g. mean\SD 101.1\11) on the excel sheet and read it with Matlab. Would it be possible that Matlab could directly read the digits from the PNG image, so that lots of work could be saved?
I know it might involve pattern recognition, but still hope that there may be someone who has done this before.
You can make use of Optical Character Recognition (OCR). The code for it is available here

Which 2d barcode has the highest data capacity/density

;)
if you wanted encode 2mb of data onto a 2d-bar code, which 2-bar code would be good to starting point or recommend.
There are lots and different types of 2dbar codes out today,Aztec 2-d barcodes,maxicodes,Pdf417,Microsoft HCCB,vericodes....etc...lots.... all unique in their own way.
i guess in a nutshell my questions is.... which barcode would make a good start off point to encode 2mb of data??
i tried reading through the Qr code international standard turns out even # version 40L the most amount of data you could encode is on to a Qr code is
1) numeric data: 7 089 characters
2) alphanumeric data: 4 296 characters
3) 8-bit byte data: 2 953 characters
4) Kanji data: 1 817 characters
which are all a far cry from the 17million bits thats is 2mb
my goal was to create something like
http://realestatemobilemarketingsolutions.com/wp-content/uploads/2012/07/real-estate-mobile-marketing.png
After you scan the barcode you can view photos of the house/property on your phone, you dont have to walk-in or wait for an open home,20 photos # 100kb each is about 2mb
Even if you could create a single 2D barcode which will encode the whole thing, the user won't be able to scan the whole thing in one go. No one has a cellphone imager which will support that kind of resolution. Your best bet is to do a QR-code with a URL in it.
Things like DataMatrix and QR-codes are extensible. You have a limit to how much data can be encoded into one block, but you CAN create a code which has multiple blocks. Indeed, if you look at this page, you'll see a discussion of using pages full of 2D barcodes as a form of data backup. They were able to fit up to 1/2 MByte of raw data into a single page. That's at 600 dpi, which will require a scanner (not a smartphone) to decode.
From what I've been reading, DataMatrix tends to have less overhead and, therefore, will stuff more (payload) data into a square inch for a given DPI. You would need a mobile app capable of shooting multiple images (tiles) of a very large image and either:
compositing the individual images into one large one for decoding OR
decoding each of the smaller blocks and reconstructing the original data from the pieces
I know of no app which will do that.
I've pondered providing bulk data via 2D barcodes. I was pondering publishing a mobile app in a magazine and providing a way for people to "download" the app from the magazine, without needing to provide a website / FTP site where they could download it. I'd first need to provide an app which could decode such a monster. Then, the end user would have to be patient enough to scan the whole thing. Good luck with that.
I MIGHT be able to provide a large 2D barcode containing a .torrent file and then using existing BitTorrent apps to download the resulting app; I have a .torrent for a recent Linux Live-DVD where the .torrent is < 32 KB.
A chunk of data (an app or images) in the MB or larger range ... really not feasible through this channel. The megabytes of data you're wanting to provide ... again ... really not feasible through this channel.
Voiceye Code is the highest density 3d code I have been able to find. Works well too, but code making software is price prohibitive to screw around with. 500.00 (ish)
How about using some variant of DataGlyphs, which has a lot in common with steganography? In other words, you use a greyscale image to also store your data...
I have developed a reader for JAB codes that can read whole audio file from a codebar. JAB codes are very high capacity due to polychrome nature.
More on this here

Use SVG plots when Publish-ing to HTML

This question was posted on Matlab Central (http://goo.gl/MkU8P) but hasn't gotten any answer. I thought someone here might have a lead.
I use publish() quite a bit to produce online class notes. It's working well but frankly, I find the PNG plots to be of awful quality. I've tried setting figureSnapMethod='antialiased' and that's a bit better but I mostly find the result fuzzy.
What would be awesome is to produce SVG plots. Is there any way to do that? I suppose it would have to involve plot2svg somehow (also from File Exchange) since SVG output seems to be only supported for Simulink models. Or perhaps someone has a better solution using some other high quality format.
before using publish you should set some options to change imageForamt. by default it is set to PNG format which is not good for reports, papers, etc. what you need is a vector graphic image, something like an eps or emf file formats. (in such case it does not matter how much you zoom in , it never becomes fuzzy or low resolution).
Any image format that your installed version of Microsoft Office can import, including 'png' , 'jpg', 'bmp',and 'tiff'. If the 'figureSnapMethod' is 'print',then you can also specify 'eps', 'epsc', 'eps2', 'ill', 'meta',and 'pdf'
for HTML output any format publishes successfully
for latex output any format publishes successfully
I suggest you to use eps, it is the best. and remember SVG is not supported in publish. ;)
you should add this part to your command:
'imageFormat','eps'