extract pdf formatting

extract pdf formatting - iphone

hi guys working on a app which main work is pdf editing.
i understand Apple doesn't provide any api for editing the pdf. but my requirements are like that.
so i thought of extracting the whole contents of the pdf file and create a new pdf after editing. now i need to know how to extract the pdf formatting (header, footer, images, highlighting.,,)
im using Tj operators to extract the pdf text. which operators should i use to extract the other informations of pdf file.
thanks in advance.

Images are painted on the page using the Do operator. Its operand is the image name in the resources dictionary. The Do operator also paints form XObjects (self contained vector graphics) and these are stored also in the resources dictionary. The Subtype key in the image/form XObject dictionary gives you the object type: "Image" for images and "Form" for form XObjects.
The other elements are plain vector graphics and text, the PDF files do not have headers, footers, paragraphs, etc as standalone objects. What you see visually as a page header, inside the PDF file is just plain text painted at the top of the page.
Highlights can be plain semi-transparent yellow rectangles (these are no different from other rectangles on the page) or highlight annotations (these are available in page's Annots array).

Related

Can access display multiline captions in Access 365 form datasheet view?

I have read about using VBA to concatenate terms together using VbCrLf; I personally used Ctrl-Enter to create a second line in the caption field in the properties box.
But, after I do my ctrl-enter, it then only shows the first line of my multi-line caption in the datasheet view of my form.
becomes this...
This form is meant to recreate the functionality our owner is looking for from a current excel spreadsheet (the ability to sort on various columns), so I can't just use a report.
Please tell me I'm missing something obvious such as a caption height property value or something. The multiline caption will be very useful to help maintain appropriate column widths for the data.

Whilst you can display multiple lines of content within the datasheet view for a table by increasing the row height of each record, e.g.:
A more appropriate solution might be to use a text box on a form to display the data, where the height of the text box can be predefined in the design of the form, and scroll bars can be displayed:

There is no solution to adjusting column headers in the specific "datasheet" form that I was trying to use. It's a nice quick way that works for 95% of your uses. But, if you need more control (like me and others on the internet) the only solution is to create the form as a "Tabular" form in the form wizard. There are other descriptions of this type of form in Access (just to be confusing).
This is also described as a continuous form likely because that's the form property value toggle when you dive into the details.
It's more work but you have full control over the size, format, etc. of your column headers when creating/designing a tabular form.

Slack Thumbnail url from Slack api for any file type like Image, Pdf, Audio, Video etc

I need to extract thumbnail url for any particular file.
while accessing the https://slack.com/api/files.list
I saw that for image thumbnail key asfiles : thumb_64, thumb_80, 360 type JSON keys
For pdf it is like
& similarly for other file type it has different different json key to find thumbnail.
So for every particular file type should i find thumbnail key manually. Any shortcut to find thumbnail key for all file type.
I need to pass my thumbnail url to some service no matter what kind of file it is.
Thanks!

The documentation for the file object type gives some information on this. But as you have already noticed, it doesn't mention PDF or other file types.
If a thumbnail is available for the file, the URL to a 64x64 pixel
will be returned as the thumb_64 prop.
The thumb_80 prop (when present) contains the URL of an 80x80 thumb.
Unlike the 64px thumb, this size is guaranteed to be 80x80, even when
the source image was smaller (it's padded with transparent pixels).
A variable sized thumb will be returned as thumb_360, with its longest
size no bigger than 360 (although it might be smaller depending on the
source size). Dimensions for this thumb are returned in thumb_360_w
and thumb_360_h.
In the case where the original image was an animated gif with
dimensions greater than 360 pixels, we also created an animated
thumbnail and pass it as thumb_360_gif.
Depending on the original file's size, you may even find a thumb_480,
thumb_720, thumb_960, or thumb_1024 property.
I did try looking through the Slack client's JS code (search for thumb_pdf in client-boot-imports.XXXX.min.js), and found a code block where it seems to be defining possible thumb_* keys. While it is not conclusive, you may look for these fields in the API response and fallback to a default image for each supported format.

iTextSharp: change the order of objects in existing PDF

I'm dealing with a corporate report generating system that generates documents with stamps and signatures.
The sad thing is that the system is not able to place images below existing text and tables, so the jpg-stamps overlapping text look really odd and unrealistic. The system does not support images with transparency channel either.
I'm trying to fix things by first printing reports to PDF and then manipulating images sending all them to back (below text and other vector content) using iTextSharp. Finally the results are sent to a hardware printer.
All the images are stored in resources (XObjects).
The problem is that I have no idea how to manipulate PDF-objects z-order (creation order) with iTextSharp.
The current version (a c# COM-object/assembly) works as follows:
Build the list of references to existing images (reference, image bytes, image CTM) in a page loop with parser.ProcessContent()
Execute KillIndirect() on any reference found
Replace them with writer.AddDirectImageSimple() and blank image (with transparent mask)
Insert previously stored image bytes as images (taking CTM into account) with stamper, in GetUnderContent mode.
I wonder if is there a more simple solution without blank images, excess references etc.

Text not fitting into form fields (iTextSharp)

I created a .PDF file using Adobe Acrobat Pro. The file has several text fields. Using iTextSharp, I'm able to populate all the fields and mail out the .PDF.
One thing is bugging me - some of the next will not "fit" in the textbox. In Adobe, if I type more that the allocated height, the scroll bar kicks in - this happens when font size is NOT set to auto and multi-line is allowed.
However, when I attempt to set the following properties:
//qSize is float and set to 15;
//auto size of font is not being set here.
pdfFormFields.SetFieldProperty("notification_desc", "textsize", qSize, null);
// set multiline
pdfFormFields.SetFieldProperty("notification_desc", "setfflags", PdfFormField.FF_MULTILINE, null);
//fill the field
pdfFormFields.SetField("notification_desc", complaintinfo.OWNER_DESC);
However upon compilation and after stamping, the scroll bar does not appear in the final .PDF.
I'm not sure if this is the right thing to do. I'm thinking that perhaps I should create a table and flood it with the the text but the documentation makes little or no reference to scroll bars....

When you flatten a document, you remove all interactivity. Expecting working scroll bars on a flattened form, is similar to expecting working scroll bars on printed paper. That's why you don't get a lot of response to your question: it's kind of absurd.
When you fill out a rectangle with text, all text that doesn't fit will be omitted. That's why some people set the font size to 0. In this case, the font size will be adapted to make the text fit. I don't know if that's an option for you as you clearly state that the font size must be 15 pt.
If you can't change the font size, you shouldn't expect the AcroForm form field to adapt itself to the content. ISO-32000-1 is clear about that: the coordinates of a text field are fixed.
Your only alternative is to take control over how iText should fill the field. I made an example showing how to do this in the context of my book: MovieAds.java/MovieAds.cs. In this example, I ask the field for its coordinates:
AcroFields.FieldPosition f = form.GetFieldPositions(TEXT)[0];
This object gives you the page number f.page and a Rectangle f.position. You can use these variables in combination with ColumnText to add the content exactly the way you want to (and to check if all content has been added).
I hope you understand that:
it's only normal that there are no scroll bars on a flattened form,
the standard way of filling out fields clips content that doesn't fit,
you need to do more programming if you want a custom result.
For more info: please consult "iText in Action - Second Edition".

How to slice text or html string into pages with iPhone SDK?

How to slice some text (html) string into number of pages to be possible read text as a book?
Thanks for suggestions.

Assuming you are happy recognising only a subset of HTML markup without CSS (here I assume tags only plus for font size changes (with other attributes ignored), <img> tags for images with all but src,width,height ignored and accurate width and height mandatory with all other tags/attributes ignored):-
TidyLib seems to have an MIT license - http://tidy.sourceforge.net/#source
SAX parse the XHTML output of TidyLib using NSXmlParser into a custom object model (unless you are exclusively using later versions of iPhone OS with public builtin DOM parser API in which case just use a DOM object model).
Set up a state machine with a caret position at top left of page and initial font size and formatting, page number of 1, maximum height of glyphs/images in current line of zero, and empty list of page boundaries.
For each run of text or image in object model, apply pre-ceding font size/format modifications, measure text using iPhone text measurement calls, reducing text length (trim to nearest space or hyphen) until it fits on current line, and resetting caret to line beginning and continuing for line wraps, and apply following font size and formatting changes. Over-count the width and height of text by some factor in cases where this is found to be required to prevent page overflow in the actual page rendering engine (UIWebView; you will have to experiment to see what the factors in the rendering engine are). Record page boundary in list.
Convert objects between page boundaries to simplified XHTML for each page. You may wish to add some CSS at this point for example to format link colours. You will need to convert local references to anchors on another page to load the correct other page. Perhaps add page footer/header with page numbers (subtract size of these from page height in earlier steps).
Save XHTML as set of files.
In essence this will work as long as the source HTML is specially prepared to use a subset of HTML for your app. Any old HTML will not do, though it might perhaps not be completely useless to give a rough idea for previews in some instances for some files.
The description above assumes you throw away formatting like ALIGN= and tables. It really is a very basic approach and will not reproduce complex pages as originally designed! It might well not suit you!
Perhaps the files should be pre-processed before reaching the iPhones in the field but if the iPhone OS / WebView line-wrapping/test positioning behaviour changes, the best position for page breaks may change. So you may need to cut your pages smaller than you think they need to be to allow for some unexpected growth when the rendering engine changes. Hmm. Perhaps not an easy task!
I haven't even tried to analyse HTML tables... HTML is of course, in its non-restricted full glory enormously probably unmanageably complex.