Indentation misplaced while creating PDF using perl module PDF::API2 - perl

I have data in an array and I can write the data into a pdf format using PDF::API2 .But the problem is during the writing process the Indentation(spaces) is not exactly same as in the array
In array format:
ATOM 1 N MET A 0 24.277 8.374 -9.854 1.00 38.41 N 0.174
ATOM 38 OE2 GLU A 4 37.711 19.692 -12.684 1.00 28.70 O 0.150
In PDF format:
ATOM 1 N MET A 0 24.277 8.374-9.8541.0038.41 N 0.174
ATOM 38 OE2 GLU A 4 37.71119.692-12.684 1.00 28.70 O 0.150
My code:
my $pdf = PDF::API2->new(-file => "/home/httpd/cgi-bin/new.pdf");
$pdf->mediabox("A4");
my $page = $pdf->page;
my $fnt = $pdf->corefont('Arial',-encoding => 'latin1');
my $txt = $page->text;
$txt->textstart;
$txt->font($fnt, 8);
$txt->translate(100,800);
$j1=0;
for($i=0;$i<=scalar(#ar_velz);$i++) #Data input to write in PDF
{
$txt->lead(10);
$txt->section("$ar_velz[$i]", 500, 800); #writing each array index
if($j1 == 75) #To create a page for every 75 lines
{
$page = $pdf->page;
$fnt = $pdf->corefont('Arial',-encoding => 'latin1');
$txt = $page->text;
$txt->textstart;
$txt->font($fnt, 8);
$txt->lead(10);
$txt->translate(100,800);
$j1=0;
}
$j1++;
}
$txt->textend;
$pdf->save;
$pdf->end( );
}

That happens because Arial is not a mono-spaced font. The characters all have different widths. Especially a blank space is usually not very wide. If you want the spacing to stay intact, you need to use a mono-spaced font, such as Courier.
$fnt = $pdf->corefont('Courier',-encoding => 'latin1');
That fact is also why PDF::API2 includes a method advancewidth in its PDF::API2::Content class. You can use that to check if a block of text is too wide to fit into a line, and manually wrap it if needed. Of course for your table, that doesn't help.
An alternative to the mono-spaced font might be to use PDF::Table, which can create tables inside a PDF::API2 document.

Related

How to count the numbers of elements in parts of a text file using a loop in Perl?

I´m looking for a way to create a script in Perl to count the elements in my text file and do it in parts. For example, my text file has this form:
ID Position Potential Jury agreement NGlyc result
(PART 1)
NP_073551.1_HCoV229Egp2 23 NTSY 0.5990 (8/9) +
NP_073551.1_HCoV229Egp2 62 NTSS 0.7076 (9/9) ++
NP_073551.1_HCoV229Egp2 171 NTTI 0.5743 (5/9) +
...
(PART 2)
QJY77946.1_NA 20 NGTN 0.7514 (9/9) +++
QJY77946.1_NA 23 NTSH 0.5368 (5/9) +
QJY77946.1_NA 51 NFSF 0.7120 (9/9) ++
QJY77946.1_NA 62 NTSS 0.6947 (9/9) ++
...
(PART 3)
QJY77954.1_NA 20 NGTN 0.7694 (9/9) +++
QJY77954.1_NA 23 NTSH 0.5398 (5/9) +
QJY77954.1_NA 51 NFSF 0.7121 (9/9) ++
...
(PART N°...)
Like you can see the ID is the same in each part (one for PART 1, other to PART 2 and then...). The changes only can see in the columns Position//Potential//Jury agreement//NGlyc result Then, my main goal is to count the line with Potential 0,7 >=.
With this in mind, I´m looking for output like this:
Part 1:
1 (one value 0.7 >=)
Part 2:
2 (two values 0.7 >=)
Part 3:
2 (two values 0.7 >=)
Part N°:
X numbers of values 0.7 >=
This output tells me the number of positive values (0.7 >=) for each ID.
The pseudocode I believe would be something like this:
foreach ID in LIST
foreach LINE in FILE
if (ID is in LINE)
... count the line ...
end foreach LINE
end foreach ID
I´m looking for any suggestion (for a package or script idea) or comment to create a better script.
Thanks! Best!
To count the number of lines, for each part, that match some condition on a certain column, you can just loop over the lines, skip the header, parse the part number, and use an array to count the number of lines matching for each part.
After this you can just loop over the counts recorded in the array and print them out in your specific format.
#!/usr/bin/perl
use strict;
use warnings;
my $part = 0;
my #cnt_part;
while(my $line = <STDIN>) {
if($. == 1) {
next;
}elsif($line =~ m{^\(PART (\d+)\)}) {
$part = $1;
}else {
my #cols = split(m{\s+},$line);
if(#cols == 6) {
my $potential = $cols[3];
if(0.7 <= $potential) {
$cnt_part[$part]++;
};
};
};
};
for(my $i=1;$i<=$#cnt_part;$i++){
print "Part $i:\n";
print "$cnt_part[$i] (values 0.7 <=)\n";
};
To run it, just pipe the entire file through the Perl script:
cat in.txt | perl count.pl
and you get an output like this:
Part 1:
1 (values 0.7 <=)
Part 2:
2 (values 0.7 <=)
Part 3:
2 (values 0.7 <=)
If you want to also display the counts into words, you can use Lingua::EN::Numbers (see this program ) and you get an output very similar to the one in your post:
Part 1:
1 (one values 0.7 <=)
Part 2:
2 (two values 0.7 <=)
Part 3:
2 (two values 0.7 <=)
All the code in this post is also available here.

Option to cut values below a threshold in papaja::apa_table

I can't figure out how to selectively print values in a table above or below some value. What I'm looking for is known as "cut" in Revelle's psych package. MWE below.
library("psych")
library("psychTools")
derp <- fa(ability, nfactors=3)
print(derp, cut=0.5) #removes all loadings smaller than 0.5
derp <- print(derp, cut=0.5) #apa_table still doesn't print like this
Question is, how do I add that cut to an apa_table? Printing apa_table(derp) prints the entire table, including all values.
The print-method from psych does not return the formatted loadings but only the table of variance accounted for. You can, however, get the result you want by manually formatting the loadings table:
library("psych")
library("psychTools")
derp <- fa(ability, nfactors=3)
# Class `loadings` cannot be coerced to data.frame or matrix
class(derp$Structure)
[1] "loadings"
# Class `matrix` is supported by apa_table()
derp_loadings <- unclass(derp$Structure)
class(derp_loadings)
[1] "matrix"
# Remove values below "cut"
derp_loadings[derp_loadings < 0.5] <- NA
colnames(derp_loadings) <- paste("Factor", 1:3)
apa_table(
derp_loadings
, caption = "Factor loadings"
, added_stub_head = "Item"
, format = "pandoc" # Omit this in your R Markdown document
, format.args = list(na_string = "") # Don't print NA
)
*Factor loadings*
Item Factor 1 Factor 2 Factor 3
---------- --------- --------- ---------
reason.4 0.60
reason.16
reason.17 0.65
reason.19
letter.7 0.61
letter.33 0.56
letter.34 0.65
letter.58
matrix.45
matrix.46
matrix.47
matrix.55
rotate.3 0.70
rotate.4 0.73
rotate.6 0.63
rotate.8 0.63

delete rows with character in cell array

I need some basic help. I have a cell array:
TITLE 13122423
NAME Bob
PROVIDER James
and many more rows with text...
234 456 234 345
324 346 234 345
344 454 462 435
and many MANY (>4000) more with only numbers
text
text
and more text and mixed entries
Now what I want is to delete all the rows where the first column contain a character, and end up with only those rows containing numbers. Row 44 - 46 in this example.
I tried to use
rawdataTruncated(strncmp(rawdataTruncated(:, 1), 'A', 1), :) = [];
but then i need to go throught the whole alphabet, right?
Given data of the form:
C = {'FIRSTX' '350.0000' '' '' ; ...
'350.0000' '0.226885' '254.409' '0.755055'; ...
'349.9500' '0.214335' '254.41' '0.755073'; ...
'250.0000' 'LASTX' '' '' };
You can remove any row that has character strings containing letters using isstrprop, cellfun, and any like so:
index = ~any(cellfun(#any, isstrprop(C, 'alpha')), 2);
C = C(index, :)
C =
2×4 cell array
'350.0000' '0.226885' '254.409' '0.755055'
'349.9500' '0.214335' '254.41' '0.755073'

How to read a file containing numbers in Octave using textscan

I am trying to import data from text file named xMat.txt which has the data in the following format.
200 space separated elements in one line and some 767 lines.
This is how xMat.txt looks.
386.0 386.0 388.0 394.0 402.0 413.0 ... .0 800.0 799.0 796
801.0 799.0 799.0 802.0 802.0 80 ... 399.0 397.0 394.0 391
.
.
.
This is my file - for reference.
When I try to read the file using
file = fopen('xMat.txt','r')
c = textscan(file,'%f');
I get the output as:
> c = { [1,1] =
> 386
> 386
> 388
> 394
> 402
> 413
> 427
> 442
> 458
> 473
> 487
> 499
> 509
> 517
> 524 ... in column format
What I need is a matrix of size (767X200). How can I do this?
I wouldn't use textscan in this case because your text file is purely numeric. Your text file contains 767 rows of 200 numbers per row where each number is delimited by a space. You couldn't get it to be any better suited for use with dlmread (MATLAB doc, Octave doc). dlmread can do this for you in one go:
c = dlmread('xMat.txt');
c will contain a 767 x 200 array for you that contains the data stored in the text file xMat.txt. Hopefully you can dump textscan in this case because what you're really after is trying to read your data into Octave... and dlmread does the job for you quite nicely.

Null character appearing when I print a file

I have a code where I read a file and remove a block of line if a certain keyword matches. If I see the key word THERMST, I delete the line before and all lines until I reach a & :
QNODE "CExtHrn - Heater_Bidon" 1.0 T884 TOTAL
THERMST "CExtHrn" 0 2.500000E+01 3.000000E+01 883 ID 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 "Heater_Bidon"
NAME2 Heater_ CExtHrn - Heater_Bidon
NAME Heater_ 40097 40170 1
TABTYPE 884 TABLE OPERATION
TABDATA 884 885 INTERP
TABDATA 884 883 THERMST
TABTYPE 885 QNODE TIME
TABDATA 885 2.000000E+01 0.000000E+00
$
However, for an obscure reason, when I print to a new file, it gives several null characters on a certain line. The weird thing is that this line is not related with the line I just changed. If I don't modify the file, by commenting the following lines, I don't get any null characters.
# We delete the last 2 line and skip the rest of the qnode/thermst definition
splice #INPF1_OUT, -2;
# Skipping the lines until next comment line.
$ii++ until substr($INPF1_IN[$ii], 0, 1) eq '$';
$ii = $ii - 1;
Any idea what this could be? The null characters are causing problems for what I do with the file.
Here is what the line should be :
NAME winte_T 101269 101270 1
here is what it prints in the new file :
NAME winte_T ULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNUL 101269 101270 1
You can see that the line that cause the error is not related to the one that should be modified
Thank you, the code is below
#!/bin/perl
use strict;
use Text::ParseWords;
open (INPF1_in, '<', $INPF1)
or die "Not able to open : $INPF1";
my #INPF1_IN = <INPF1_in>;
close INPF1_in;
my #INPF1_OUT; # Output INPF1
my $cardno = 1;
my $ii = 0;
until ($ii > $#INPF1_IN) {
my $INPF_line = $INPF1_IN[$ii];
push(#INPF1_OUT, $INPF_line); # Adding line
chomp($INPF_line);
if ($INPF_line eq "-1") {
$cardno++;
}
if ($cardno == 9) {
my #line = parse_line(" ", 0, $INPF_line); # parsing the line elements
if ($line[0] eq "THERMST") { # If Thermostat
# We delete the last 2 line and skip the rest of the qnode/thermst definition
splice #INPF1_OUT, -2;
$ii++ until substr($INPF1_IN[$ii], 0, 1) eq '$';
$ii = $ii-1; # Skipping the lines until next comment line.
}
}
$ii++;
}
open (INPF1_out, '>', $INPF1);
print INPF1_out $_ foreach #INPF1_OUT;
close INPF1_out;
I may be misreading your code, but it look like you're trying to do something very simple in perl, a very hard way.
If I'm reading it right, what you're trying to do is take an input record format, and conditionally print certain lines. Perl has a very good tool for this, called the 'range operator'.
I think you will be able to accomplish what you want with something considerably simpler.
#!/bin/perl
use strict;
use warnings;
while ( <DATA> ) {
print unless ( m/^THERMST/ ... m/^\$$/ );
}
__DATA__
QNODE "CExtHrn - Heater_Bidon" 1.0 T884 TOTAL
THERMST "CExtHrn" 0 2.500000E+01 3.000000E+01 883 ID 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 "Heater_Bidon"
NAME2 Heater_ CExtHrn - Heater_Bidon
NAME Heater_ 40097 40170 1
TABTYPE 884 TABLE OPERATION
TABDATA 884 885 INTERP
TABDATA 884 883 THERMST
TABTYPE 885 QNODE TIME
TABDATA 885 2.000000E+01 0.000000E+00
$
This is an example, based on the data you've given so far - if you can give a bit more to show exactly what you're trying to accomplish, I would be pretty sure you can extract the information you need without having to do iterating through elements in an array of words. Perl can do better than that.
(I am guessing a bit, as it's completely unclear where you're getting $cardno from. However this should be quite easy to modify to suit your needs)