iText - Can't read the content of a PD4ML generated pdf - itext

I have an issue at reading pdf content with iText. I have tested all the different technics. They all work with standard pdf documents, but I have one pdf document that I need to amend and I can't get the content.
This document has been generated by PD4ML. It can be read in Acrobat reader, but cannot be read in Open Office.
For exemple using the command
PdfReader reader = new PdfReader(src);
FileOutputStream out = new FileOutputStream(result);
out.write(reader.getPageContent(1));
Produces this output:
q Q q 29.18088 102.1433 536.9282 675.0511 re W n /Cs1 cs 1 1 1 sc 29.18088
775.5042 m 574.5602 775.5042 l 574.5602 -2599.312 l 29.18088 -2599.312 l h
f Q q 43.26609 761.4189 m 560.475 761.4189 l 560.475 -2572.832 l 43.26609
-2572.832 l h W n 29.18088 102.1433 536.9282 675.0511 re W n q 24.78997 0 0 22.53634 51.71722 733.2485
cm /Im1 Do Q /Cs1 cs 0.2 0.2 0.2 sc /Cs1 CS 0.2 0.2 0.2 SC 0.5 w 2 J 2 Tr
q 0.5634084 0 0 0.5634084 29.18088 711.2756 cm BT 20 0 0 20 40 0 Tm /G1 1
Tf [ <0033> 1 <004800550049> 1 <00520055005000440051004600480003> 1 <0044005100470003>
But when I am trying to get the text context, there are text items, they are not displayed. Like if the text format was different.
This code:
PdfReader reader = new PdfReader(src);
PdfReaderContentParser parser = new PdfReaderContentParser(reader); PrintWriter out = new PrintWriter(new FileOutputStream(result)); TextExtractionStrategy strategy;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
out.println(strategy.getResultantText());
}
Just produces spaces. Same for TextLocationStrategy.
The command
PdfContentReaderTool.listContentStream(new File(src), out);
Produces
==============Page 1====================
- - - - - Dictionary - - - - - -
(/Parent=Dictionary of type: /Pages, /Contents=Stream, /Type=/Page, /Resources=Dictionary, /MediaBox=[0, 0, 595.29, 841.89])
Subdictionary /Parent = (/Type=/Pages, /MediaBox=[0, 0, 595.29, 841.89], /Count=6, /Kids=[2 0 R, 14 0 R, 26 0 R, 30 0 R, 34 0 R, 38 0 R])
Subdictionary /Resources = (/XObject=Dictionary, /ProcSet=[/PDF, /Text, /ImageB, /ImageC, /ImageI], /ColorSpace=Dictionary, /Font=Dictionary)
Subdictionary /XObject = (/Im1=Stream of type: /XObject)
Subdictionary /ColorSpace = (/Cs1=[/ICCBased, 12 0 R])
Subdictionary /Font = (/G2=Dictionary of type: /Font, /G1=Dictionary of type: /Font)
Subdictionary /G2 = (/BaseFont=/HCNQGU+font000000001c036002, /DescendantFonts=[50 0 R], /Type=/Font, /Encoding=/Identity-H, /Subtype=/Type0, /ToUnicode=Stream)
Subdictionary /G1 = (/BaseFont=/HCZCBJ+font000000001c036002, /DescendantFonts=[43 0 R], /Type=/Font, /Encoding=/Identity-H, /Subtype=/Type0, /ToUnicode=Stream)
- - - - - XObject Summary - - - - - -
------ /Im1 - subtype = /Image = 9148 bytes ------
Content Stream - - - - - -
q Q q 29.18088 102.1433 536.9282 675.0511 re W n /Cs1 cs 1 1 1 sc 29.18088
775.5042 m 574.5602 775.5042 l 574.5602 -2599.312 l 29.18088 -2599.312 l h
f Q q 43.26609 761.4189 m 560.475 761.4189 l 560.475 -2572.832 l 43.26609
-2572.832 l h W n 29.18088 102.1433 536.9282 675.0511 re W n q 24.78997 0 0 22.53634 51.71722 733.2485
cm /Im1 Do Q /Cs1 cs 0.2 0.2 0.2 sc /Cs1 CS 0.2 0.2 0.2 SC 0.5 w 2 J 2 Tr
q 0.5634084 0 0 0.5634084 29.18088 711.2756 cm BT 20 0 0 20 40 0 Tm /G1 1
But The part Text Extraction is empty.
Any idea why I can't read the text? Is there something else I could do or test before getting the text?
Any pointer welcome.
Gilles

Related

Different result for same value in sigmoid function [MATLAB]

I am trying to find out the result of the sigmoid function. When I tried to compute it without loop. I did not get the expected result. On the other hand, using for loop getting an actual result.
The question is, Why I am getting different results?
%This is a hypothesis of AND operation
x1 = randi([0,1],[1,4]);
x2 = randi([0,1],[1,4]);
features = [x1;x2]';
theta = [-30 20 20];
x = [ones(length(features),1) features]
z = x * theta';
pred = zeros(length(z),1);
pred = 1 / (1 + exp(-z))
y1 = (pred >= 0.5)
fprintf('Using loop\n')
for i= 1:length(z)
pred(i) = 1 / (1 + exp(-z(i)));
end
pred
y1 = (pred >= 0.5)
Output:
x =
1 1 1
1 0 1
1 0 0
1 0 1
pred =
1.0e-13 *
0 0 0.9358 0
y1 =
0 0 0 0
Using loop
pred =
1.0000 0.0000 0.0000 0.0000
y1 =
1 0 0 0

Calculate a 2D homogeneous perspective transformation matrix from 4 points in MATLAB

I've got coordinates of 4 points in 2D that form a rectangle and their coordinates after a perspective transformation has been applied.
The perspective transformation is calculated in homogeneous coordinates and defined by a 3x3 matrix M. If the matrix is not known, how can I calculate it from the given points?
The calculation for one point would be:
| M11 M12 M13 | | P1.x | | w*P1'.x |
| M21 M22 M23 | * | P1.y | = | w*P1'.y |
| M31 M32 M33 | | 1 | | w*1 |
To calculate all points simultaneously I write them together in one matrix A and analogously for the transformed points in a matrix B:
| P1.x P2.x P3.x P4.x |
A = | P1.y P2.y P3.y P4.y |
| 1 1 1 1 |
So the equation is M*A=B and this can be solved for M in MATLAB by M = B/A or M = (A'\B')'.
But it's not that easy. I know the coordinates of the points after transformation, but I don't know the exact B, because there is the factor w and it's not necessary 1 after a homogeneous transformation. Cause in homogeneous coordinates every multiple of a vector is the same point and I don't know which multiple I'll get.
To take account of these unknown factors I write the equation as M*A=B*W
where W is a diagonal matrix with the factors w1...w4 for every point in B on the diagonal. So A and B are now completely known and I have to solve this equation for M and W.
If I could rearrange the equation into the form x*A=B or A*x=B where x would be something like M*W I could solve it and knowing the solution for M*W would maybe be enough already. However despite trying every possible rearrangement I didn't managed to do so. Until it hit me that encapsulating (M*W) would not be possible, since one is a 3x3 matrix and the other a 4x4 matrix. And here I'm stuck.
Also M*A=B*W does not have a single solution for M, because every multiple of M is the same transformation. Writing this as a system of linear equations one could simply fix one of the entries of M to get a single solution. Furthermore there might be inputs that have no solution for M at all, but let's not worry about this for now.
What I'm actually trying to achieve is some kind of vector graphics editing program where the user can drag the corners of a shape's bounding box to transform it, while internally the transformation matrix is calculated.
And actually I need this in JavaScript, but if I can't even solve this in MATLAB I'm completely stuck.
OpenCV has a neat function that does this called getPerspectiveTransform. The source code for this function is available on github with this description:
/* Calculates coefficients of perspective transformation
* which maps (xi,yi) to (ui,vi), (i=1,2,3,4):
*
* c00*xi + c01*yi + c02
* ui = ---------------------
* c20*xi + c21*yi + c22
*
* c10*xi + c11*yi + c12
* vi = ---------------------
* c20*xi + c21*yi + c22
*
* Coefficients are calculated by solving linear system:
* / x0 y0 1 0 0 0 -x0*u0 -y0*u0 \ /c00\ /u0\
* | x1 y1 1 0 0 0 -x1*u1 -y1*u1 | |c01| |u1|
* | x2 y2 1 0 0 0 -x2*u2 -y2*u2 | |c02| |u2|
* | x3 y3 1 0 0 0 -x3*u3 -y3*u3 |.|c10|=|u3|,
* | 0 0 0 x0 y0 1 -x0*v0 -y0*v0 | |c11| |v0|
* | 0 0 0 x1 y1 1 -x1*v1 -y1*v1 | |c12| |v1|
* | 0 0 0 x2 y2 1 -x2*v2 -y2*v2 | |c20| |v2|
* \ 0 0 0 x3 y3 1 -x3*v3 -y3*v3 / \c21/ \v3/
*
* where:
* cij - matrix coefficients, c22 = 1
*/
This system of equations is smaller as it avoids solving for W and M33 (called c22 by OpenCV). So how does it work? The linear system can be recreated by the following steps:
Start with the equation for one point:
| c00 c01 c02 | | xi | | w*ui |
| c10 c11 c12 | * | yi | = | w*vi |
| c20 c21 c22 | | 1 | | w*1 |
Convert this to a system of equations, solve ui and vi, and eliminate w. You get the formulas for projection transformation:
c00*xi + c01*yi + c02
ui = ---------------------
c20*xi + c21*yi + c22
c10*xi + c11*yi + c12
vi = ---------------------
c20*xi + c21*yi + c22
Multiply both sides with the denominator:
(c20*xi + c21*yi + c22) * ui = c00*xi + c01*yi + c02
(c20*xi + c21*yi + c22) * vi = c10*xi + c11*yi + c12
Distribute ui and vi:
c20*xi*ui + c21*yi*ui + c22*ui = c00*xi + c01*yi + c02
c20*xi*vi + c21*yi*vi + c22*vi = c10*xi + c11*yi + c12
Assume c22 = 1:
c20*xi*ui + c21*yi*ui + ui = c00*xi + c01*yi + c02
c20*xi*vi + c21*yi*vi + vi = c10*xi + c11*yi + c12
Collect all cij on the left hand side:
c00*xi + c01*yi + c02 - c20*xi*ui - c21*yi*ui = ui
c10*xi + c11*yi + c12 - c20*xi*vi - c21*yi*vi = vi
And finally convert to matrix form for four pairs of points:
/ x0 y0 1 0 0 0 -x0*u0 -y0*u0 \ /c00\ /u0\
| x1 y1 1 0 0 0 -x1*u1 -y1*u1 | |c01| |u1|
| x2 y2 1 0 0 0 -x2*u2 -y2*u2 | |c02| |u2|
| x3 y3 1 0 0 0 -x3*u3 -y3*u3 |.|c10|=|u3|
| 0 0 0 x0 y0 1 -x0*v0 -y0*v0 | |c11| |v0|
| 0 0 0 x1 y1 1 -x1*v1 -y1*v1 | |c12| |v1|
| 0 0 0 x2 y2 1 -x2*v2 -y2*v2 | |c20| |v2|
\ 0 0 0 x3 y3 1 -x3*v3 -y3*v3 / \c21/ \v3/
This is now in the form of Ax=b and the solution can be obtained with x = A\b. Remember that c22 = 1.
Should have been an easy question. So how do I get M*A=B*W into a solvable form? It's just matrix multiplications, so we can write this as a system of linear equations. You know like: M11*A11 + M12*A21 + M13*A31 = B11*W11 + B12*W21 + B13*W31 + B14*W41. And every system of linear equations can be written in the form Ax=b, or to avoid confusion with already used variables in my question: N*x=y. That's all.
An example according to my question: I generate some input data with a known M and W:
M = [
1 2 3;
4 5 6;
7 8 1
];
A = [
0 0 1 1;
0 1 0 1;
1 1 1 1
];
W = [
4 0 0 0;
0 3 0 0;
0 0 2 0;
0 0 0 1
];
B = M*A*(W^-1);
Then I forget about M and W. Meaning I now have 13 variables I'm looking to solve. I rewrite M*A=B*W into a system of linear equations, and from there into the form N*x=y. In N every column has the factors for one variable:
N = [
A(1,1) A(2,1) A(3,1) 0 0 0 0 0 0 -B(1,1) 0 0 0;
0 0 0 A(1,1) A(2,1) A(3,1) 0 0 0 -B(2,1) 0 0 0;
0 0 0 0 0 0 A(1,1) A(2,1) A(3,1) -B(3,1) 0 0 0;
A(1,2) A(2,2) A(3,2) 0 0 0 0 0 0 0 -B(1,2) 0 0;
0 0 0 A(1,2) A(2,2) A(3,2) 0 0 0 0 -B(2,2) 0 0;
0 0 0 0 0 0 A(1,2) A(2,2) A(3,2) 0 -B(3,2) 0 0;
A(1,3) A(2,3) A(3,3) 0 0 0 0 0 0 0 0 -B(1,3) 0;
0 0 0 A(1,3) A(2,3) A(3,3) 0 0 0 0 0 -B(2,3) 0;
0 0 0 0 0 0 A(1,3) A(2,3) A(3,3) 0 0 -B(3,3) 0;
A(1,4) A(2,4) A(3,4) 0 0 0 0 0 0 0 0 0 -B(1,4);
0 0 0 A(1,4) A(2,4) A(3,4) 0 0 0 0 0 0 -B(2,4);
0 0 0 0 0 0 A(1,4) A(2,4) A(3,4) 0 0 0 -B(3,4);
0 0 0 0 0 0 0 0 1 0 0 0 0
];
And y is:
y = [ 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 1 ];
Notice the equation described by the last row in N whose solution is 1 according to y. That's what I mentioned in my question, you have to fix one of the entries of M to get a single solution. (We can do this because every multiple of M is the same transformation.) And with this equation I'm saying M33 should be 1.
We solve this for x:
x = N\y
and get:
x = [ 1.00000; 2.00000; 3.00000; 4.00000; 5.00000; 6.00000; 7.00000; 8.00000; 1.00000; 4.00000; 3.00000; 2.00000; 1.00000 ]
which are the solutions for [ M11, M12, M13, M21, M22, M23, M31, M32, M33, w1, w2, w3, w4 ]
W is not needed after M has been calculated. For a generic point (x, y), the corresponding w is calculated while solving x' and y'.
| M11 M12 M13 | | x | | w * x' |
| M21 M22 M23 | * | y | = | w * y' |
| M31 M32 M33 | | 1 | | w * 1 |
When solving this in JavaScript I could use the Numeric JavaScript library which has the needed function solve to solve Ax=b.

Matlab: Block Tridiagonal with Non-Square Vectors

I'm trying to create a diagonal matrix using the following matrix as the diagonal
base = [a b c d e f 0;
0 g h i j k l];
I need the resulting matrix to look like this...
[a b c d e f 0 0 0;
0 g h i j k l 0 0;
0 0 a b c d e f 0;
0 0 0 g h i j k l];
except it needs to be "n" elements tall
I have tried using the kron function, but it shifts each consecutive row too many elements to the right.
How can I accomplish what I need in a way where I can select n arbitrarily?
You can do it pretty fast with a 2D convolution:
n = 4; %// desired number of rows in result. Should be a multiple of size(base,1)
T = eye(n-1);
T(2:size(base,1):end,:) = 0;
result = conv2(base,T);
Example: with
base =
0.7497 0.3782 0.4470 0.5118 0.6698 0.3329 0
0 0.9850 0.5638 0.9895 0.4362 0.4545 0.8578
and n=4 the result is
result =
0.7497 0.3782 0.4470 0.5118 0.6698 0.3329 0 0 0
0 0.9850 0.5638 0.9895 0.4362 0.4545 0.8578 0 0
0 0 0.7497 0.3782 0.4470 0.5118 0.6698 0.3329 0
0 0 0 0.9850 0.5638 0.9895 0.4362 0.4545 0.8578
The easy way is to use repeated out-of-bounds assignment. MATLAB will automatically pad any missing entries with 0 in those cases. Here's how:
%// Some test variables
a = rand; g = rand;
b = rand; h = rand;
c = rand; i = rand;
d = rand; j = rand;
e = rand; k = rand;
f = rand; l = rand;
%// base matrix
base = [
a b c d e f 0;
0 g h i j k l];
%// use out-of-bounds assignment
n = 3;
output = base;
for ii = 1:n
output(end+1:end+size(base,1), size(base,1)*ii+1:end+size(base,1)) = base;
end
The hard way is the faster way (relevant for when n is large and/or you need to repeat this very often). Figure out the pattern behind which indices would be filled in the final matrix by which values in the original matrix, then generate a list of those indices and assign those values to those indices:
[b1,b2] = size(base);
[ii,jj,vv] = find(base);
inds = bsxfun(#plus, (ii + (n+1)*b1*(jj-1)).', (0:n).'*b1*(1 + (n+1)*b1));
output = zeros( (n+1)*b1, b2+n*b1 );
output(inds) = repmat(vv.', n+1, 1)
I'll leave it as an exercise for you to figure out what happens here exactly :)

2D matrix of matrices

I have two diagonal matrices. I am trying to build a larger block diagonal matrix from them. For example if I had this:
D = diag(zeros(3,1)+1)
D =
1 0 0
0 1 0
0 0 1
and...
E = diag(zeros(2,1)+2, -1) + diag(zeros(2,1)+2, +1) + diag(zeros(3,1)+4)
E =
4 2 0
2 4 2
0 2 4
I have an equation that says A*U = X
Where A is
[E D 0
D E D
0 D E]
This is for 3x3. 5x5 would look like this:
A =
[E D 0 0 0
D E D 0 0
0 D E D 0
0 0 D E D
0 0 0 D E]
A would be another diagonal matrix consisting of these matrices. I need to produce a 40x40 and it would take a VERY LONG TIME to do manually, of course.
How can I define that? I haven't figured out how to use blkdiag to construct.
I solved this on my own manually because I could never find a Matlab function to help me.
for n = 1:Distance_Resolution
A(((n-1)*Distance_Resolution +1):n*Distance_Resolution, ((n-1)*Distance_Resolution +1):n*Distance_Resolution) = A1;
if n == Distance_Resolution
else
A((n*Distance_Resolution+1):(n+1)*(Distance_Resolution), ((n-1)*Distance_Resolution+1:n*Distance_Resolution)) = A2;
A((n-1)*Distance_Resolution+1:n*Distance_Resolution, (n*Distance_Resolution+1):(n+1)*(Distance_Resolution)) = A2;
end
end
This will produce a block matrix that has the above specified demands and is of length Distance_Resolution x Distance_Resolution x Distance_Resolution. I defined A1 and A2 through help from above poster (Fo is just a constant here):
vector = zeros(Distance_Resolution,1) - Fo;
A2 = diag(vector);
A1 = toeplitz([1+4*Fo, -Fo, zeros(1,Distance_Resolution-2)]);
This is a workable code snippet, but I am still looking for a smarter way to code it.

pspice -> ngspice conversion -- Absolute Value Function

I am trying to port an existing PSPICE model (the HP Memristor Model) to ngspice... is there an absolute value function for ngspice?
Original PSPICE Model:
.SUBCKT modelmemristor plus minus PARAMS: +phio=0.95 Lm=0.0998 w1=0.1261 foff=3.5e-6 +ioff=115e-6 aoff=1.2 fon=40e-6 ion=8.9e-6 +aon=1.8 b=500e-6 wc=107e-3
G1 plus internal value={sgn(V(x))*(1/V(dw))^2*0.0617* (V(phiI)*exp(-V(B)*V(sr))-(V(phiI)+abs(V(x)))* exp(-V(B)*V(sr2)))}
Esr sr 0 value={sqrt(V(phiI))}
Esr2 sr2 0 value={sqrt(V(phiI)+abs(V(x)))} Rs internal minus 215
Eg x 0 value={V(plus)-V(internal)} Elamda Lmda 0 value={Lm/V(w)}
Ew2 w2 0 value={w1+V(w)- (0.9183/(2.85+4*V(Lmda)-2*abs(V(x))))}
EDw dw 0 value={V(w2)-w1}
EB B 0 value={10.246*V(dw)}
ER R 0 value={(V(w2)/w1)*(V(w)-w1)/(V(w)-V(w2))} EphiI phiI 0 value= {phio-abs(V(x))*((w1+V(w2))/(2*V(w)))- 1.15*V(Lmda)*V(w)*log(V(R))/V(dw)}
C1 w 0 1e-9 IC=1.2
R w 0 1e8MEG
Ec c 0 value={abs(V(internal)-V(minus))/215}
Emon1 mon1 0 value={((V(w)-aoff)/wc)-(V(c)/b)} Emon2 mon2 0 value={(aon-V(w))/wc-(V(c)/b)}
Goff 0 w value={foff*sinh(stp(V(x))*V(c)/ioff)* exp(-exp(V(mon1))-V(w)/wc)}
Gon w 0 value={fon*sinh(stp(-V(x))*V(c)/ion)* exp(-exp(V(mon2))-V(w)/wc)}
.ENDS modelmemristor
* test memristor
xmemr 1 0 modelmemristor phio=0.95 Lm=0.0998 w1=0.1261 foff=3.5e-6 ioff=115e-6 aoff=1.2 fon=40e-6 ion=8.9e-6 aon=1.8 b=500e-6 wc=107e-3
.SUBCKT modelmemristor plus minus phio=0.95 Lm=0.0998 w1=0.1261 foff=3.5e-6 ioff=115e-6 aoff=1.2 fon=40e-6 ion=8.9e-6 aon=1.8 b=500e-6 wc=107e-3
G1 plus internal cur={sgn(V(x))*(1/V(dw))^2*0.0617* (V(phiI)*exp(-V(B)*V(sr))-(V(phiI)+abs(V(x)))* exp(-V(B)*V(sr2)))}
Esr sr 0 vol={sqrt(V(phiI))}
Esr2 sr2 0 vol={sqrt(V(phiI)+abs(V(x)))}
Rs internal minus 215
Eg x 0 vol={V(plus)-V(internal)}
Elamda Lmda 0 vol={Lm/V(w)}
Ew2 w2 0 vol={w1+V(w)- (0.9183/(2.85+4*V(Lmda)-2*abs(V(x))))}
EDw dw 0 vol={V(w2)-w1}
EB B 0 vol={10.246*V(dw)}
ER R 0 vol={(V(w2)/w1)*(V(w)-w1)/(V(w)-V(w2))}
EphiI phiI 0 vol= {phio-abs(V(x))*((w1+V(w2))/(2*V(w)))- 1.15*V(Lmda)*V(w)*log(V(R))/V(dw)}
C1 w 0 1e-9 IC=1.2
R w 0 1e8MEG
Ec c 0 vol={abs(V(internal)-V(minus))/215}
Emon1 mon1 0 vol={((V(w)-aoff)/wc)-(V(c)/b)}
Emon2 mon2 0 vol={(aon-V(w))/wc-(V(c)/b)}
Goff 0 w cur={foff*sinh(u(V(x))*V(c)/ioff)* exp(-exp(V(mon1))-V(w)/wc)} $ stp -> u
Gon w 0 cur={fon*sinh(u(-V(x))*V(c)/ion)* exp(-exp(V(mon2))-V(w)/wc)} $ stp -> u
.ENDS modelmemristor
Regards
Holger