Adding Title to Description section (in file Properties) of .png metadata for use as Alt Text when using Report Layout (or print layout) - qgis

I have developed multiple reports with great success; however, I would now like to add a unique Title to the Description in the Metadata for each exported .png file. In my review of the file Properties, for each image exported, there appears to be no Description information such as Title, Subject, Ratings, Tags or Comments. I would only require Title in the Description portion of the Metadata (this is not the file name).
Additionally, I would like to automate the generation of unique Titles based attributes in the various layers. These unique Titles would be exported as the Title for each unique png exported. I am familiar with the automation syntax; however, I am not familiar with adding the text (e.g. newly automated Title) into the Title of the Description. My intent is to extract the Title of each png so it can be used as the Alt Text when imported in a PDF or MSWord. When generating reports in QGIS I often have hundreds of individual png files exported so any automation would greatly improve this process. For those not familiar with Alt text, it is used to describe the image for visually impaired individuals when using a text reader.
I have searched on numerous keyword related to Alt Text, metadata, etc. for various leads to a solution but have had no success. I am thinking that this is way above my abilities at this point since the metadata labeling and export process appears to be blackhole.

Related

heading and sub-heading extraction from PDF

I am currently working in extracting text from pdf. my current issue is in distinguishing the headings and sub-headings from the extracted text. I am working with iTextSharp and using the bold text information to detect the heading. The font size cannot be trusted all the time. also tried with PDFBox.
1)I would like to know is there any method to identify headings and sub-headings from PDF.
2)Is adobe or pdfExchange editor provide any API for the same?
For example:
I need to extract
"Tourism in 2040:
Bringing an additional one million visitors
per year to paradise" as heading
"Executive Summary" as sub-heading
Even though this can be extracted using bold text info, it failed in a lot of cases. That's why looking for APIs.

Space length limitation

I have a word document file which is a form.
I try to complete it. Here is a screenshot of how it is looks like
When I type in the grey box there is a limitation in length and when I reach it, it won't let me type more.
I am not sure of what it is, however I want to insert an image or a table but I can't.
How can I make it?
The field you are trying to enter information into is a Legacy Text Form Field in Word 2010. In order to have a data entry area within the form that will accept text, tables, and images, delete this field and replace it with a Rich Text Content Control. This control is found on Word's Developer Tab:
Instructions for Displaying Word Developer Tab (if needed)
Like the legacy form fields, content controls allow manual or programmatic entry of data as well the ability to restrict editing of the data within the content control. Gregory K. Maxey has posted an incredibly detailed tutorial on creating forms with content controls, programming the content entry via VBA (Visual Basic for Applications) and restricting editing of the control's contents (all of which is available using the Rich Text Content Control):
Create Forms with Content Controls by Gregory K. Maxey
The same author also has an additional posting on content controls where he provides links to and offers explanations of more advanced content control abilities such as data mapping:
Content Controls (Additional Information) by Gregory K. Maxey
Lastly, Microsoft also provides some guidance on programming content controls via .NET (which I think may be beyond the scope of your question, but which I include for future readers):
MSDN: How to Add Content Controls to Word Documents

Field Detection using iText

Using Adobe Acrobat, if you choose Add or Edit Fields... from the Forms menu on a file with no fields, you get a pop-up with a message
Currently, there are no form fields in this PDF. Do you want Acrobat
to detect form fields for you?
Is there a way of accomplishing this sort of of field detection using iText?
Not out of the box but the API exists that you could build your own.
Adobe Acrobat is a PDF renderer and as such it can actually "look" at a PDF as a human does. It "sees" a line with text "near" it and can say with a fair amount of certainty that the line represents a field and the text represents the field's label. Same with circles and squares for radio buttons and check boxes. This document actually describes all of the shapes that Adobe Acrobat searches for.
Adobe's technology, however, assumes that a human will confirm and fix any problems that occur, usually using Adobe's technology:
After running the auto field detection process on a form, check it to make sure the correct fields have been created.
So even if iText supported this, you'd still have to open the PDF in Adobe Acrobat to check and fix things anyway.
But if you want to build your own you could use something like this or this to get at the lines. And this to get at the text.

How do I automate converting PDF to HTML?

I work for a publisher and am trying to extract content from our fully laid out PDFs. I've tried pdftohtml, pdftotext, pdfminer, and other Python-based approaches to getting the content, as well as saving to Word, HTML, XML, etc. from the original Acrobat files.
I don't need just the text, I also need the text formatting. That's because, for example, I need all the blue text in the document.
When I save to HTML, Word, etc. from Acrobat, the resulting files contain screenshots of the pages, not the laid out text. When I extract text using different Python modules I get the text but lose the text formatting.
The only solution I've found is to manually copy and paste from the PDF into a word doc, then saving as HTML. I'm hoping to automate this.
Why does copying from Acrobat into Word achieve what I can't do by other means? Has anybody come across this problem before?
Maybe you can consider another method. The software (https://pdfapi.codeplex.com/) can convert pdf files to html directly via MVS. If you are able to use the MVS, i think the software i mentioned above is useful for you to convert the text in pdf files to html that can keep the format perfectly. Of course, it's just a referral, you can have a try.

Converting large amounts of text and dynamic data into PDF

I have a three page Word document that needs to be converted into PDF. This Word document was given to me as a template to show me what the PDF output should look like. I tried converting this document into PDF, created a PDF form and used iTextSharp to open the form, populate it with data and return it back to the client. This is all great but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden.
My second attempt was to create an MVC 2 View without master page, pass the model to the view, take the HTML representation of the View, pass it over to iTextSharp and render the PDF. The problem here was that iTextSharp failed on some tags (one of them was <hr> tag). I managed to get rid of the problematic tag, but then tables were not rendered properly. Namely, the border attribute was ignored so I ended up with borderless tables. That attempt failed.
I need a suggestion or advice on the most efficient way to create a PDF document in MVC 2 which would be maintainable in the long run. I really don't want my actions to be 200+ lines long. Working directly with the Word document is not the best solution as I have never worked with VSTO so I don't quite know what it would look like to open Word and manipulate text inside of it and add dynamic data and then convert that dynamically into PDF.
Any suggestion is highly welcome.
Best regards!
One thing that I've done in the past is to save the Word file as a DOCX and unzip it since DOCX is just a renamed zip file. Within the archive open up /word/document.xml and you'll see your document. There's a lot of weird XML tags in there but overall you should get a pretty good idea of where your content is. Then just add placeholder text like {FIRST_NAME}, save the file and re-zip.
Then from code you can just perform the same steps, unzipping with something like SharpZipLib or DotNetZip, swapping placeholder copy, re-zipping and then using very simple Word automation to Save-As a PDF.
The other route is to fully utilize iTextSharp and actually write Paragraphs and PdfPTable and everything else. It takes a lot longer to setup but would give you the most control.
Q: you say "... but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden"
How do you end up having to much data ? If the word template can "hold" the data in 3 pages, they should fit in 3 PDF pages.
I used to use iTextSharp to create my PDF's, but I also almost always ended up building the PDF document from scratch myself.(not really a <200 line solution) Have you considerate another library, I recently switched to MigraDoc's PDFSharp.Way simpler to use then iText, lotsa examples / docus
Just my two cents
Word documents object model is quite easy to understand. It will either contain series of Paragraphs or Tables. Using the Open XML SDK, you can iterate through each paragraph/table in the word document and retrieve it's content and styles. Then you can generate PDF document on the fly using those retrieved information. This will work under MVC too.
But if your word document contains complex elements, then it will take some more time for you to implement based on this approach. Also, this approach would only work with (Word 2007 and 2010) files.
Also, HTML to PDF options currently available in the ITextSharp library would work with only known set of tags, as far as I know.
Another suggestion is to make use of commercially available .NET components. There are lot of good solution available. For ex: Syncfusion