render text with ground truth meta data

render text with ground truth meta data - metadata

I'm Working On Optical Character Recognition.In our work we need automatically generate some rendered word image and we need each characters location(boundary) in rendered word image. this meta data about rendered image is called ground truth.
How can I do that?

I found a rendering c api called Pango wich has a function named pango_layout_Iter_get_char_extent() that can be used for that.
https://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-iter-get-char-extents

Related

TYPO3 image rendering

I have TYPO3 10 and want to render the images in specific dimensions. I have a content element with only images, and these pictures are rendered with the dimension 495px x 331px. How can I change that? I've uploaded a much bigger original picture. The preview images are always in these dimensions. If I click on the image to enlarge it, it shows a much bigger picture.
The content element is a pictures only element with two columns.
When I set one column, the preview pictures are bigger. How is TYPO3 calculating the dimensions of the pictures?

I assume you want to know how to style the default content type "image".
This content type usually renders a gallery view (grid) of images.
First of all you have these options by default:
Selecting "Number of columns: 1" would make the image use a bigger space.
If you want to directly influence the HTML output, here are some pointers:
The default rendering engine of TYPO3 is fluid_styled_content.
Here is a guide how to override templates for a specific content type:
https://docs.typo3.org/c/typo3/cms-fluid-styled-content/10.4/en-us/Configuration/OverridingFluidTemplates/Index.html
That specific content type uses a GalleryProcessor:
https://github.com/TYPO3/TYPO3.CMS/blob/10.4/typo3/sysext/fluid_styled_content/Configuration/TypoScript/ContentElement/Image.typoscript
It renders this template:
https://github.com/TYPO3/TYPO3.CMS/blob/10.4/typo3/sysext/fluid_styled_content/Resources/Private/Templates/Image.html
... which uses some partials to render the media.
Media/Gallery sets up the layout for the image grid:
https://github.com/TYPO3/TYPO3.CMS/blob/10.4/typo3/sysext/fluid_styled_content/Resources/Private/Partials/Media/Gallery.html
Media/Rendering/Image finally renders the image. The value of dimension has been calculated by the GalleryProcessor:
https://github.com/TYPO3/TYPO3.CMS/blob/10.4/typo3/sysext/fluid_styled_content/Resources/Private/Partials/Media/Rendering/Image.html

AEM6.4: Meaning of values in image map properties

AEM offers a plugin to create image maps for its internal inplace editor. After configuration the given values are stored into follow forrmat:
[rect(89,92,356,368)"/content/sites/we-retail/us"|"_blank"|"fdfdfdfdf"|(0.2,0.2004,0.8,0.8017)]
The first paratheses are defines the coordinates of choosen shape.
The content within the first quotaion signs defines the target site, within the second how to open it the browser. In the third pair of quotations sign contains an alternative Text for non images display.
What I don't know are the values in second paratheses. Does someone know for what these values stands for?

From the WCM core components Image model, they are called relative coordinates.
They are not standard HTML attributes and are instead populated as data attributes of the area tag within the image component.
See code below:
<area shape="${area.shape}" coords="${area.coordinates}" href="${area.href}"
target="${area.target}" alt="${area.alt}" data-cmp-hook-image="area"
data-cmp-relcoords="${area.relativeCoordinates}">
Since the map coordinates are fixed coordinates and do not change when the image scales in or not based on screen sizes, the image component’s JavaScript uses this relative coordinates data to adjust the coordinates of the map area whenever the image size is adjusted. This is handled by the resizeAreas() function within the component’s clientlib.

Drawing graphical objects (boxes and lines) inside a structured iText(Sharp) document (Chapters and Sections)

I'm creating a PDF document using iTextSharp, what I'm doing is generating all of my content in a c# List<Chapter> where the Chapters contain one or more Sections, and the Chapters have not yet been added to the document. I then enumerate through my List<Chapter> to generate a table of contents at the start of the document, and then add the Chapters to the document after my TOC.
That works great when my Sections contain text and images, but now I need to generate a Section containing boxes and lines. I don't want to draw my boxes and lines into an image and drop the image into the Section, that won't look as good as if I have actual PDF boxes and lines.
The Sections containing graphical elements can be intermixed with Sections containing text, so I need a way to add some kind of element to a Section such that that graphical Section works like text Sections in terms of going onto a new page only if necessary.
What's the best way to do this? I feel like it somehow involves PdfTemplates but I'm not sure how. Or maybe I need to create a PdfPTable and create my graphical elements in an IPdfPCellEvent?

You are on the right track when you want to involve PdfTemplate elements. PdfTemplate is an iText object that corresponds with the concept of Form XObjects in the PDF specification. We chose another name because the word Form is somewhat misleading (people confuse it with form fields, interactive forms, etc).
The content stream of a page in PDF is a sequence of PDF syntax, consisting of operands and operators. An XObject is an object that is external to this content stream. The content of an XObject is stored inside the PDF document only once, but it can be reused many times on the same page, on different pages.
There are different types of XObjects, but Image XObjects and Form XObjects are the most important ones.
Image XObjects are used when we work with raster images. You are absolutely right when you write: *"I don't want to draw my boxes and lines into an image and drop the image into the Section, that won't look as good as if I have actual PDF boxes and lines."
Form XObjects are used when we want to reuse PDF syntax. This is what you need: you want to define moveTo(), lineTo(), curveTo(), stroke(), fill(),... operations, and you want these lines and shapes to be stored as vector data.
The solution to your problem is to draw lines and shapes to a PdfTemplate object and to wrap the PdfTemplate object inside an Image object. When you add that Image object to a Section or a Chapter, it will be added as a Form XObject. You don't have to feat that it will be degraded into a raster image.
You can find some examples of this technique on the official web site. For instance in the answer to the question
How to generate 2D barcode as vector image?
Here we create a PdfTemplate with a bar code and we return it as an Image object. The screen shot that shows you the internals of the resulting PDF proves that the bar code is added as a vector image.
public Image createBarcode(PdfContentByte cb, String text,
float mh, float mw) throws BadElementException {
BarcodePDF417 pf = new BarcodePDF417();
pf.setText("BarcodePDF417 barcode");
Rectangle size = pf.getBarcodeSize();
PdfTemplate template = cb.createTemplate(
mw * size.getWidth(), mh * size.getHeight());
pf.placeBarcode(template, BaseColor.BLACK, mh, mw);
return Image.getInstance(template);
}
To create a PdfTemplate object, you need a PdfContentByte instance (e.g. using writer.getDirectContent()) and use the createTemplate() method passing a width and a height as parameters. Then you draw content to the PdfTemplate and turn it into an Image object using Image.getInstance().
You'll find more info on drawing lines and shapes in the chapter on Absolute positioning of lines and shapes and in the example section of Chapter 3 and Chapter 14 of my book.

Text region extraction by finding co-ordinates of text from an image

I am developing an image processing software that extracts/crops and enhances this cropped single page form from an image taken from a cellphone camera.The form has no rectangular boundaries to simplify the process of extraction.Yes it is a white background black text format but nothing apart from that is fixed.Now some Text will be present which will verify that the image is of the form required.So my questions are these.
1) Can i search for a specific regular expression using leptonica library itself or do i have to shift focus to other libraries like the tessarect API to do this.So far i have not found anything of this sort
2) Now suppose i know the text at the top left corner and the bottom right corner and i search it succesfully.Can i get the co-ordinates of the particular text that i am searching and then crop the image accordingly?

Leptonica doesn't do anything with text, it's an image processing library.
To enable acquiring position of the text, add tessedit_create_hocr 1 to you Tesseract config file (or set this option whichever way you configure Tesseract if you're using it as a library).
The result is no longer a text file, but a UTF-8-encoded HTML file (note: it's not valid XML). Its format is self-explanatory. It will contain positions and dimensions of all words on all pages in pixels, as found on the input image. You need to parse that HTML, find the words you're looking for, and then get bounding boxed of those words.

Multiple UILabels to display decorated XML markup

I have a block of content (stored in XML) that I want to put in a UIScrollView. Certain parts of this text will be formatted with different fonts, sizes, and colors. Altogether, it mostly reads as a paragraph with word wrapping.
I've built my NSXMLParser code, and I have separated all the data. I'm ready to apply my decorations and add these elements as UILabels.
However, I'm looking for a solution to ease the inherent difficulties of string height/width calculations and all of that arithmetic to make these UILabels line up with word wrapping nicely. [keeping track of your last X and Y coordinates, knowing when to insert manual line breaks, how to best vertically display a line that has 2 different sized fonts]
The XML markup can easily be converted to HTML, and thus UIWebView, but I hear that is slower to load.
Is the UIWebView going to be the best class for this? I wish there were one that did all of this with UILabels so that I can use these elements for touch events. (I assume that I cannot use an HTML element to trigger a touch event.)

You should probably a UIWebView. You can use an HTML anchor for touchable elements. The delegate will give you the option of doing something other than loading a web-page when the user touches the element. You can use a made-up URL format to uniquely identify each element.
Aside from that, you may want to use a custom control that draws all the text, rather than a series of UILabels. The UILabels will probably make it difficult to do line wrapping.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

render text with ground truth meta data - metadata

I'm Working On Optical Character Recognition.In our work we need automatically generate some rendered word image and we need each characters location(boundary) in rendered word image. this meta data about rendered image is called ground truth. How can I do that?

I found a rendering c api called Pango wich has a function named pango_layout_Iter_get_char_extent() that can be used for that. https://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-iter-get-char-extents

Related

TYPO3 image rendering

AEM6.4: Meaning of values in image map properties

Drawing graphical objects (boxes and lines) inside a structured iText(Sharp) document (Chapters and Sections)

Text region extraction by finding co-ordinates of text from an image

Multiple UILabels to display decorated XML markup

Categories

Resources