iText: Insert pdf bytes into an opened document - itext

Currently, our team is using Image.getInstance(bytes[]) to get an image object from png bytes. But later we will use a pdf file instead of png. So my question is that given pdf bytes, is there any way to convert it into png bytes or insert into opened document?
Something like this:
void insertIntoDocument(byte[] pdfBytes, Document document){}

Related

Convert base64 String into PDF file in Flutter

I am trying to show a PDF file. But PDF file I am receiving from server in Base64 String format. Is there any way I can directly show Base64 String into PDF viewer or WebView without saving it into File.
Check this : https://stackoverflow.com/a/55599926/305135
(Following code is copied from link above)
This should convert base64 encoded pdf data into a byte array.
import 'packages:dart/convert.dart';
List<int> pdfDataBytes = base64.decode(pdfBase64)
.map((number) => int.parse(number));
The pdf and the image plugins seems to suit your needs for displaying pdf.
The code should be roughly like so:
import 'package:pdf/pdf.dart';
import 'package:image/image.dart';
...
Image img = decodeImage(pdfDataBytes);
PdfImage image = PdfImage(
pdf,
image: img.data.buffer.asUint8List(),
width: img.width,
height: img.height);
// Display it somehow
...
At first i was doing the same thing like you. But i didnt file any appropriate solution to convert the base64 String into a pdf file.
I think you can get the BufferArray and then convert it into a pdf file.
I have answered how to parse blob data to pdf in this question :
How to convert ByteBuffer to pdf

How to load an SVG and a referenced PNG from a REST service at the same time?

I'm trying to create a web service in PHP that can deliver an SVG with reference to a PNG raster image. Both the data for the SVG as well as the binary PNG image come from a MySQL database on the server.
Option A: Encode the PNG data in base-64 and embed it directly in the SVG, such as:
<image xlink:href="data:image/png;base64,..."/>
Concerns: 30% heavier load than loading it as pure binary and noticeable delay when loading it with Postman (or is this just because of Postman).
Option B: Call the PNG data as binary and save it as a file on the file system, then call the SVG file, which would then reference the physical PNG file.
Concerns: Involvement of the file system (which implies I need to start managing physical files, expiration dates etc).
Is there perhaps another way that an SVG can reference the binary data on the fly without it having to be on the file system?
To accomplish something similar (in my case sending data for SVGs with additional data about each file as binary files, which are much smaller than sending xml, text, or json) - I use CBOR. In my case, I compress the SVG using LZString compression first, and add this along with additional data attributes to a JSON object. Then I convert the JSON object to CBOR. I think CBOR can handle your base 64 data without any need for conversion - more information about it is here: cbor.io
I found a PHP library for CBOR here: https://github.com/2tvenom/CBOREncode
This may not be the way to go at all for you, but I thought I'd throw it out there just in case.

what options for quartz PDFDocument.dataRepresentationWithOptions()

Does anyone know what options are available to you in the following method?
// pdf:PDFDocument
pdf.dataRepresentationWithOptions(options: [NSObject : AnyObject])
I'm trying to take a PDF, open it up, search for a specific tag per PDF specs and insert additional tags immediately after that tag.
After editing with PDFDocument methods, I was hoping to convert it a string to search the whole file, so I thought to convert it to a data representation first, and then from there to string. But I suppose I could save it to file, and re-open it from there too so I don't have to convert from the PDFDocument object directly.

itextsharp PDF to text dump

I am looking for a way to actually get the contents of the file itself, in its text format, dumped. E.g.: i don't want a dictionary object, i don't want some sort of extractionstrategy option, i just want the same text document that itextsharp uses to parse... the WHOLE thing as a string or stringbuilder...
I have not yet found a way to do this using any tools what so ever... my problem is that i am trying to read a dynamic PDF into a C# application... and we all know that those darn dynamic PDFs can't be parsed by iTextSharp (AcroForm and AcroFields always comes up empty), so i figured that if i can get the actual text dump of the entire file, i can see what it looks like and parse it myself for this specific task (e.g.: make a class for each document i know i can received, and make a map there based on what i see).
If anyone can help me do that, or even better, find a way, in C#, to extract the XML Source for the PDF (kinda like clicking the XML Source tab in LiveCycle) instead, it would be greatly appreciated.
Thanks!
Matt
If you are looking for the actual operators and commands of each page in the raw text format, try the following code:
var reader = new PdfReader("test.pdf");
int intPageNum = reader.NumberOfPages;
for (int i = 1; i <= intPageNum; i++)
{
byte[] contentBytes = reader.GetPageContent(i);
File.WriteAllBytes("page-" + i + ".txt", contentBytes);
}
reader.Close();
I am looking for a way to actually get the contents of the file
itself, in its text format, dumped. E.g.: i don't want a dictionary
object, i don't want some sort of extractionstrategy option, i just
want the same text document that itextsharp uses to parse... the WHOLE
thing as a string or stringbuilder...
Unfortunately the data that itextsharp uses to parse are not yet text; the operators in that data are given in some textual format but the actual glyphs may be given in a completely arbitrary ad-hoc encoding. That been said, often some standard encoding is used as it is the most simple solution for the components in use. You cannot in general count on that, though. The answer by VahidN shows you how to access the starting points for that content; not seldomly, though, that page content data he extracts only contain references to resources which are contained in different objects.
my problem is that i am trying to read a dynamic PDF into a C#
application... and we all know that those darn dynamic PDFs can't be
parsed by iTextSharp (AcroForm and AcroFields always comes up empty),
This sounds as if you actually have a completely different task at hand. Dynamic forms and their contents are not part of the page content but instead stored in a separate XML Forms Architecture stream.
iText in Action, 2nd edition, in chapter 8 gives you some information on how to access the XFA stream data, for a first glimps look at the sample XfaMovie.cs.
You might also want to look at the iText XML Worker project for easier manipulation of XFA streams.
if you just want to dump the text, try this:
PdfReader reader = new PdfReader(pdfFileName);
String text = "";
nPages = reader.NumberOfPages;
for (int i = 0; i < nPages; i++)
{
text += PdfTextExtractor.GetTextFromPage(reader, i + 1);
}

PDF file stored as BLOB, view in a webpage perl

I have a code that handles displaying a blob from a local Oracle database. I store both JPG and PDF files as blob. I could view the JPG file, but not the PDF. I have checked these
$self->content_type('image/jpg')
to
$self->content_type('application/pdf').
And the Blob does have data. I checked the length and it has "184546".
All I get when I click the link for the pdf file is a blank page with the title GETIMAGPAGE(application/pdf).
Any help or pointers would be greatly appreciated.
Also, How can we have the content_type to enable two different mime_types? For example in my case both image as well as pdf, depending on what we get?
File::MMagic can recognize the type of data using magic numbers.
use File::MMagic;
$magic = File::MMagic->new;
$self->content($blob);
$self->content_type($magic->checktype_contents($blob));
If you don't want to require a native/plugin PDF reader, perhaps FlexPaper might fit your needs.