Context:
extract text from a pdf
using the IEventListener - TextRenderInfo
a pdf document with more than one page
c# .net core program
Issue:
To calculate the exact X,Y position of a text I use this code:
var textMatrix =textRenderInfo.GetTextMatrix().Multiply(textRenderInfo.GetGraphicsState().GetCtm());
float X = textMatrix.Get(6);
float Y = textMatrix.Get(7);
This works ok for the first page. For subsequent pages the CTM seems to be calculated to: Power(ctm, pagenumber) and the X,Y result is obviously not correct.
More clarification: I have a document with a date repeated on every page on the exact same location. By consequence, it's text matrix is the same on every page. But the CTM looks like this for page 1:
{0,05 0 0
0 0,05 0
0 0 1}
For page 2:
{0,0025000002 0 0
0 0,0025000002 0
0 0 1}
For page 3:
{0,000125 0 0
0 0,000125 0
0 0 1}
Etc ...
So it looks that each value is powered by the pagenumber. Could this be a bug?
Could this be a bug?
More likely a case of incorrect API usage...
Unfortunately you don't show your pivotal code. I assume, though, that you re-use the same PdfCanvasProcessor for all pages. Have you considered the note in the ProcessPageContent documentation?
/// <summary>Processes PDF syntax.</summary>
/// <remarks>
/// Processes PDF syntax.
/// <strong>Note:</strong> If you re-use a given
/// <see cref="PdfCanvasProcessor"/>
/// , you must call
/// <see cref="Reset()"/>
/// </remarks>
/// <param name="page">the page to process</param>
public virtual void ProcessPageContent(PdfPage page)
(PdfCanvasProcessor.cs)
I.e.
Note: If you re-use a given PdfCanvasProcessor, you must call Reset() [between ProcessPageContent calls]
Related
I am trying to change the color value of a textlabel. I am doing so using:
script.Parent.Parent.toggled2.SurfaceGui.SIGN.TextColor3.R = 0
script.Parent.Parent.toggled2.SurfaceGui.SIGN.TextColor3.G = 255
script.Parent.Parent.toggled2.SurfaceGui.SIGN.TextColor3.B = 0
basically it navigates to a button (a part, parent of the script) then to the group its in, then to a part with the text (in this case toggled2) then to the surfacegui inside then the textlabel (which is named SIGN) it then modifies the TextColor3 attribute 3 times at once, adjusting all the R,G,B values.
Why wont it let me alter the value? do i have to do something like :new() or .new()?
In order to assign a value to the TextColor3 property, you have to pass a Color3 object :
local sign = script.Parent.Parent.toggled2.SurfaceGui.SIGN
sign.TextColor3 = Color3.new(0, 255, 0)
I have this implementation, using it in a page level 2 submenu. Each level 2 menu has multiple subpages. Each subpage has one image. So this implementation produces an image from each page for each submenu. For example, a submenu with 2 subpages will have 2 images (one from each subpage).
1 = FILES
1 {
references {
table = pages
fieldName = media
data = levelmedia:-1, slide
}
begin = 0
maxItems = 2
renderObj = COA
renderObj {
2 = IMAGE
2 {
file {
//params = -sharpen 50 +profile "*" -quality 100
import.data = file:current:uid
treatIdAsReference = 1
width.optionSplit = 300c|*|400c
height.optionSplit = 350c|*|450c
}
}
}
}
Would like to have images cropped in different sizes such that image 1 is cut to different dimensions from image 2 and so on.
My ImageMagick installation works perfectly. Am actually cropping single images with it without a hitch.
Without the optionSplit above, the images are cut to size nicely. Unfortunately with the optionSplit it simply outputs the images in their original sizes.
How can I produce different image sizes? My understanding is that optionSplit is the way to go (from the manuals). I read in articles that soureCollection for responsive images use optionSplit. I imagine another way would be to use an image register counter and use CASE to determine how to cut image 1, 2, 3 and so on, but am not familiar with register counters (maybe someone can show me how to do this?). And yet another way would be to use a file/image index number but I've tried looking at the manuals for hours for such a pointer and nowhere is it listed if there's any to help with processing. Anybody know a way to do this?
rendering two consecutive images with different paramters will be difficult in typoscript:
your optionsplit can not success as in the renderObj you always have only one file. A bad habit of all renderObj.
on the other hand: there is no property optionSplit. the functionality is build in any wrap property.
therefore a handling in typoscript could be to concatenate the elements, then split them again, but then use different options in the split renderObj to handle it separately.
or implement a counter with a register variable, then evaluate the register to set different values.
easier would be a handling in fluid, where you could use an iterator with the f:forviewhelper, and then do an f:if (for two cases) or an f:switch (for more cases) on {iterator.index} to render individual versions.
Based on #Bernd answer on the fact that each page (as item) is delivered as an object in TMENUs in each iteration, it is possible to achieve such image rendering in one of two ways:
First,
Through the use of two register entries register:count_menuItems which holds the total number of items you will be processing; and register:count_MENUOBJ which holds the index of the current item being iterated (starts at 1). These two can be use in conjunction with a CASE statement to thoroughly process each image to one's liking. If a page has multiple images, there are two more register items one can use, these are, register:FILES_COUNT (which starts to count starting with 0) and register:FILES_NUM_CURRENT. No need for implementing a registry counter since these registry entries are in-themselves, counters.
Secondly,
There's a much easier way, a far less time-consuming way, that uses a wrap as explained by #Bernd, as follows;
NO = 1
NO {
1 = LOAD_REGISTER
1 {
width.cObject = TEXT
width.cObject.stdWrap.wrap = 100c||200c
height.cObject = TEXT
height.cObject.stdWrap.wrap = 300c||400c
}
2 = FILES
2 {
# Get the images related to the current page
references {
table = pages
fieldName = media
}
# Render each image and wrap it as appropriate
renderObj = IMG_RESOURCE
renderObj {
file {
treatIdAsReference = 1
import.data = file:current:uid
width = {REGISTER:width}
width.insertData = 1
height = {REGISTER:height}
height.insertData = 1
}
}
stdWrap {
wrap = <img src="|" />
}
}
}
As you can see, this code is being used in a TMENU and processes each image based on different rules defined in segment 1 and stored by the LOAD_REGISTER. The trick is in the wraps. stdWrap's wrap already contains optionSplit. So by storing the desired pattern, the stdWrap will process the correct value to be stored for each iteration.
It has worked for me. Hope it helps someone.
Is there any point checking to see if a value has already been assigned to a variable or is it better to just simply assign the value? For example, if X is going to equal 1, is there any point checking to see if X already equals 1? Example code below:
if X != 1 {
X = 1
}
I ask this question become I'm looping through a bunch of children sprites and changing the alpha values to 0, which most are already set at 0. So I'm seeing if there is any benefit in checking the children's alpha value first (I can't see the benefit).
parent.enumerateChildNodes(withName: "*", using: {
node, stop in
// if node.alpha != 0 {
node.alpha = 0
// }
})
Just set the value normally.
What even is the point of checking whether the value is already 0 before setting it? What difference does it make? After the line of code:
node.alpha = 0
No matter what value alpha has before, it will always be 0 after the above line!
If you are worried about performance, don't, until you encounter one.
Setting alpha is just like setting any other variable. It doesn't do much apart from setting the value. It won't immediately change the alpha of the sprite on the screen. It will only do it in the next frame.
Say you do this a bunch of times:
for _ in 0...10000 {
node.alpha = 0
node.alpha = 1
}
The node's alpha on the screen won't be flashing like crazy. Eventually it will be 1 so the node will be drawn with alpha = 1 in the next frame.
I am experimenting with TYPO3 and Fluid and at the moment I am in trouble. It is about a backendlayout I created in TYPO3.
It consists of two content areas: "left-column" and
"right-column".
To bring them to frontend appearance via fluid was no problem. But then I created four content elements (text and image ) within "left column". I wanted to wrap each of these content elements with a bootstrap wrapper e.g. text "col-md-8" and img "col-md-4".
Unfortunately, I have not found any hints or documentation how to do this. Maybe someone can help me with that issue and tell me how to customize the wrappers of my content elements. Is it possible to do it via Fluid at all?
Backend layouts are used to map columns to your template, but they doesn't allow you to decide how each of them will be displayed. There are several solutions... but last time my favorite is extension Grid Elements.
It allows you to create sub-containers for Content Elements, so you can add, any combination of Bootstrap's grid layout (i.e. 2 columns - 4-8 or 3 columns - 3-3-3 etc...) and then wrap it whit Bootstrap classes.
Sample for mentioned 2 columns - 4-8 Grid Element record:
Title: 2 columns: 4-8 or whatever you want ;)
Alias: 2_columns_4_8 (must be unique)
Grid Configuration:
backend_layout {
colCount = 2
rowCount = 1
rows {
1 {
columns {
1 {
name = Left
colPos = 221
}
2 {
name = Right
colPos = 222
}
}
}
}
}
Finally in your TypoScript template add rendering definition like this:
tt_content.gridelements_pi1.20.10.setup {
2_columns_4_8 < .default
2_columns_4_8 {
wrap = <div class="row">|</div>
columns {
221 < .default
221.wrap = <div class="col-sm-4">|</div>
222 < .default
222.wrap = <div class="col-sm-8">|</div>
}
}
}
(in the sample observe where and how alias and also colPos values are used later in TypoScript)
hint: Don't waste time for creating any possible combination of columns at beginning, instead create one ad hoc when required, usually you need only few of them.
P.S. TYPO3 is written uppercase, always!
I have a problem. I using iTextSharp v. (5.5.3 to 5.5.6).
I have one file PDF and 1 page.
On the side I have 4 TextField (rotation: 0, 90, 180, 270) and one red polygon
My page rotation is set 270
I can flatten page
In code, I set
stamper.AnnotationFlattening = true;
stamper.FormFlattening = true;
After flattening my 2 TextField are rotated incorrectly
Image 1 of the original PDF:
Image 2 with error:
The PDF:
http://www.pdf-archive.com/2015/08/20/wyslac/
There is an issue in iText(Sharp) when flattening form fields with existing appearances rotated by means of their Matrix attribute if pdfStamper.AcroFields.GenerateAppearances is true.
The original
after flattening with GenerateAppearances == true looks like this:
Workaround
As the document already has appearance streams, you can switch off GenerateAppearances:
stamper.AcroFields.GenerateAppearances = false;
stamper.AnnotationFlattening = true;
stamper.FormFlattening = true;
The result you get now:
The issue
If GenerateAppearances == true, then iTextSharp (when flattening forms) first looks whether a field already has an appearance. If the field has one, iTextSharp only attempts to neatly fit the existing appearance into the rectangle of the form field. Unfortunately it (a) ignores the existing form field Matrix entry and (b) replaces it with a new matrix doing the fitting. If the appearance was rotated by means of its Matrix, that rotation is lost and instead the value is stretched to fit into the falsely oriented rectangle.
if (acroFields.GenerateAppearances) {
if (appDic == null || as_n == null) {
[...]
} else if (as_n.IsStream()) {
PdfStream stream = (PdfStream) as_n;
PdfArray bbox = stream.GetAsArray(PdfName.BBOX);
PdfArray rect = merged.GetAsArray(PdfName.RECT);
if (bbox != null && rect != null) {
float rectWidth = rect.GetAsNumber(2).FloatValue - rect.GetAsNumber(0).FloatValue;
float bboxWidth = bbox.GetAsNumber(2).FloatValue - bbox.GetAsNumber(0).FloatValue;
float rectHeight = rect.GetAsNumber(3).FloatValue - rect.GetAsNumber(1).FloatValue;
float bboxHeight = bbox.GetAsNumber(3).FloatValue - bbox.GetAsNumber(1).FloatValue;
float widthCoef = Math.Abs(bboxWidth != 0 ? rectWidth / bboxWidth : float.MaxValue);
float heightCoef = Math.Abs(bboxHeight != 0 ? rectHeight / bboxHeight : float.MaxValue);
if (widthCoef != 1 || heightCoef != 1)
{
NumberArray array = new NumberArray(widthCoef, 0, 0, heightCoef, 0, 0);
stream.Put(PdfName.MATRIX, array);
MarkUsed(stream);
}
}
}
}
(PdfStamperImp method FlatFields)
The background
The reason why iTextSharp ignores the appearance Matrix is that appearance generation in the course of form filling is not supposed to use such matrix values:
For non-rich text fields, the appearance stream—which, like all appearance streams, is a form XObject—has the contents of its form dictionary initialized as follows:
The resource dictionary (Resources) shall be created using resources from the interactive form dictionary’s DR entry (see Table 218).
The lower-left corner of the bounding box (BBox) is set to coordinates (0, 0) in the form coordinate system. The box’s top and right coordinates are taken from the dimensions of the annotation rectangle (the Rect entry in the widget annotation dictionary).
All other entries in the appearance stream’s form dictionary are set to their default values (see 8.10, “Form XObjects”).
(section 12.7.3.3 "Variable Text" of ISO 32000-1)
This means for the Matrix:
Matrix array (Optional) An array of six numbers specifying the form matrix, which maps form space into user space (see 8.3.4, "Transformation Matrices"). Default value: the identity matrix [1 0 0 1 0 0].
(Table 95 – Additional Entries Specific to a Type 1 Form Dictionary - in section 8.10.2 "Form Dictionaries", ibidem)
Thus, form field appearance streams created during fill-in according to the specification, can be assumed to have a an identity Matrix value, and no rotation.
So the issue in iTextSharp actually merely reflects the assumption that form field appearances are generated due to form fill-in.