Inverting PDF colors (negative) using Perl PDF::API2 - perl

Like from title I'm trying to write a script that inverts the colors of a bunch of PDF, by using Perl and PDF::API2.
I'm not very familiar with perl, I've modified a brief script I found here on stackoverflow, from this post, with the help of chatGPT
how to change all colours in a PDF to their respective complimentary colours; how to make a PDF negative
The code I've come to is the following:
use strict;
use warnings;
use PDF::API2;
use PDF::API2::Basic::PDF::Utils;
my $dirname = '.';
my $filename;
opendir(DIR, $dirname) or die "Could not open $dirname\n";
mkdir("inverted") unless -d "inverted";
while ($filename = readdir(DIR)) {
print("$filename\n");
next unless $filename =~ /\.pdf$/; # Skip files that are not PDFs
my $pdf = PDF::API2->open($filename);
for my $n (1..$pdf->pages()) {
my $p = $pdf->openpage($n);
$p->{Group} = PDFDict();
$p->{Group}->{CS} = PDFName('DeviceRGB');
$p->{Group}->{S} = PDFName('Transparency');
my $gfx = $p->gfx(1); # prepend
$gfx->fillcolor('white');
$gfx->rect($p->mediabox());
$gfx->fill();
$gfx = $p->gfx(); # append
$gfx->egstate($pdf->egstate->blendmode('Difference'));
$gfx->fillcolor('white');
$gfx->rect($p->mediabox());
$gfx->fill();
}
$pdf->saveas("inverted/$filename");
}
closedir(DIR);
It seems to partially work sometimes, some pages are correctly inverted; however sometimes the first half of the page is not inverted, it remains white, like in this pic:
Page partially inverted
I'd like to fix this, I'd really need a simple script that perform this job, I've also written a script that after that join all the pdf files from multiple files into a single PDF. If anyone has an idea on how to fix it I'll be grateful, I could also upload the result on github if anyone needs this (it's a question has been asked other times too, but I haven't found a script nor in python or other languages that performs this work well, except for a couple of scripts that relies on docker and nodejs in order to install them)
I've tried working with chatGPT to fix the issue, but it has no idea on how to do this (yes, I know, I shouldn't rely on it, but this is the first time I use Perl)

I am debugging this and am confident it has to do with rotation of the pages but I do not understand the details of the problem yet. However, I have this workaround for the test file:
Rotate it 90 degrees with pdftk, then apply the perl script, then rotate it back 90 degrees with pdftk:
$ pdftk test.pdf cat 1-endLeft output test2.pdf
$ # run perl script to invert the colors in test2.pdf
$ pdftk test2.pdf cat 1-endRight output test3.pdf
After this test3.pdf seems to be correctly inverted. This workaround might also work for the other files you have.

Related

Vcf to Bayescan format - perl script not recognising populations

I am trying to convert a .vcf file into the correct format for BayeScan. I have tried using PGDSpider as recommended but my .vcf file is too big so I get a memory issue.
I then found a perl script on Github that may be able to convert my file even though it is really big. The script can be found here. However it does not correctly identify the number of populations I have. It only finds 1 popualtion, whereas I have 30.
The top of my population file looks like so, following the example format in the perl script.
index01_barcode_10_PA-1-WW-10 pop1
index02_barcode_29_PA-5-Ferm-19 pop2
index01_barcode_17_PA-1-WW-17 pop1
index02_barcode_20_PA-5-Ferm-10 pop2
index03_barcode_16_PA-7-CA-14 pop3
I have also tried the script with a sorted population file.
I have no experience with perl language so I am struggling to work out why the script is not working.
I think it is to do with this section of the script but cannot be sure:
# read and process pop file
while (<POP>){
chomp $_;
#line = split /\t/, $_;
$pops{$line[0]} = $line[1];
}
close POP;
# Get populations and sort them
my #upops = sort { $a cmp $b } uniq ( values %pops );
print "found ", scalar #upops, " populations\n";
Appolgies as I am not sure how to make this a reproducible example but I am hoping someone could at least help me understand what this part of the code is doing and if there is a way to adapt it? Isthe problem that my individual names include _ and -?
Thank you so much for your advice and help in advance :)
Firslty thank you to #toolic for his help and guidance :)
Whilst trying to create a reproducible example it started working and I think the problem is how I made my populations file.
Previously I used: paste sample_names pops | column -s $'\t' -t > pop_file.txt
to output the file printed in the question.
However it works if i simply use: paste sample_names pops > pop_file.txt
Also I have put the full path to the .vcf file instead of path from the current directory.
I hope this helps anyone who comes across this issue in the future :)

About searching recursively in Perl

I have a Perl script that I, well, mostly pieced together from questions on this site. I've read the documentation on some parts to better understand it. Anyway, here it is:
#!/usr/bin/perl
use File::Find;
my $dir = '/home/jdoe';
my $string = "hard-coded pattern to match";
find(\&printFile, $dir);
sub printFile
{
my $element = $_;
if(-f $element && $element =~ /\.txt$/)
{
open my $in, "<", $element or die $!;
while(<$in>)
{
if (/\Q$string\E/)
{
print "$File::Find::name\n";
last; # stops looking after match is found
}
}
}
}
This is a simple script that, similar to grep, will look down recursively through directories for a matching string. It will then print the location of the file that contains the string. It works, but only if the file is located in my home directory. If I change the hard-coded search to look in a different directory (that I have permissions in), for example /admin/programs, the script no longer seems to do anything: No output is displayed, even when I know it should be matching at least one file (tested by making a file in admin/programs with the hard-coded pattern. Why am I experiencing this behavior?
Also, might as well disclaim that this isn't a really useful script (heck, this would be so easy with grep or awk!), but understanding how to do this in Perl is important to me right now. Thanks
EDIT: Found the problem. A simple oversight in that the files in the directory I was looking for did not have .txt as extension. Thanks for helping me find that.
I was able to get the desired output using the code you pasted by making few changes like:
use strict;
use warnings;
You should always use them as they notify of various errors in your code which you may not get hold of.
Next I changed the line :
my $dir = './home/jdoe'; ##'./admin/programs'
The . signifies current directory. Also if you face problems still try using the absolute path(from source) instead of relative path. Do let me know if this solves your problem.
This script works fine without any issue. One thing hidden from this script to us is the pattern. you can share the pattern and let us know what you are expecting from that pattern, so that we can validate that.
You could also run your program in debug mode i.e.,
perl -d your_program.
That should take you to debug mode and there are lot of options available to inspect through the flow. type 'n' on the debug prompt to step in to the code flow to understand how your code flows. Typing 'n' will print the code execution point and its result

How can I get perl to correctly pass a command line argument with multiple arguments and complex file paths (spaces and symbols)?

I have a small perl script which collects file paths from an excel file and passes them through the command line to perltex which then compiles a pdf based on the files and paths chosen.
My problem is that the moment I introduce more complex file paths (which is necessary based on the network setup of the final user pool) perltex fails to find the file paths, cutting them at the space.
A MWE is a follows
#!/usr/bin/perl
use strict;
use warnings;
use 5.14.2;
use Text::Template;
use Spreadsheet::Read;
use Spreadsheet::ParseXLSX;
use utf8;
use charnames qw( :full :short );
use autodie;
my $row = 5;
my $col = 15;
my $File = "C:/Users/me/Desktop/Reporting-Static/Input-test1.xlsm";
my $parser = Spreadsheet::ParseXLSX->new();
my $workbook = $parser->parse($File);
my $worksheet = $workbook->worksheet("Input");
my $cell = $worksheet->get_cell($row, $col);
my $Filename = $cell->Value();
my $texfile = "C:/Users/me/Desktop/Reporting-Static/file.tex";
# can't find this file if there are spaces in the address
system("perltex", "--latex=pdflatex", "--nosafe", "--jobname=$Filename", "$texfile");
if ( $? == -1 )
{
print "command failed: $!\n";
}
else
{
printf "command exited with value %d", $? >> 8;
}
exit;
However, the moment I change the folder name to one with spaces eg. "Reporting Static" it fails to find the tex file.
I have read several other posts regarding this on stack exchange and other websites but for whatever reason the proposed solutions do not appear to work for me. I have tried
my $texfile = "C:/Users/me/Desktop/Reporting Static/file.tex";
my $texfile = C:/Users/me/Desktop/"Reporting Static"/file.tex;
my $texfile = "\"C:/Users/me/Desktop/"Reporting Static"/file.tex\"";
my $texfile = "\"C:/Users/me/Desktop/Reporting Static/file.tex\"";
my $texfile = "C:/Users/me/Desktop/Reporting^ Static/file.tex";
my $texfile = "C:/Users/me/Desktop/Reporting\^ Static/file.tex";
As well as a few other combinations or varioations of the above, all without success. I have also tried replacing the double quote with a single quote so that perl doesn't interpolate the contents.
I have also tried manually typing all of the above into the command prompt to check whether there was a small issue with the way perl passed the commands to the command line but still no luck.
I am aware that I can use the 'dir /X ~1 c:\' command to find system name allocations that avoid spaces but the idea is that the filename and location will be dynamic and change as a funtion of department and site, so I would prefer to avoid trying to write a script which will go and find this pathname and use it to replace all locations using spaces or other special characters.
The final idea that I had is that this problem could be connected ot the way that perltex passes it's arguments yet I am unable to find any documentation (that I can follow...) on the specifics of how this particular aspect of the file functions.
So my questions are, is there something I am missing not metioned in the other answers that I have read regarding how to correctly pass these paths to perltex, is there perhaps some sort of incompatiblity in how I'm trying to go about this, is this more probabl linked to perltex as opposed to perl or cmd or is there something completely different that I am unaware of that is stopping this from working???
EDIT:
from cmd prompt perltex returns a "unable to find path X, please enter another file location". Until now I hadn't really tested retyping the path by by entering 'C:/Users/me/Desktop/"Reporting Static"/file.tex' (no quotes at the beginning) it is subsequently accepted and runs. but initially passing it this path does not work, suggesting that some internal perltex code accepts the inital path differently to being repassed the same path after an error.... not quite sure what to make of this.
EDIT:
The contents of #latexcmdline that I extracted
$VAR1 = [
'pdflatex',
'--jobname=--',
'\\makeatletter\\def\\plmac#tag{AYNNNUVKQVJGZKKPGPTH}\\def\\plmac#tofile{Perl.topl}\\def\\plmac#fromfile{Perl.frpl}\\def\\plmac#toflag{Perl.tfpl}\\def\\plmac#fromflag{Perl.ffpl}\\def\\plmac#doneflag{Perl.dfpl}\\def\\plmac#pipe{Perl.pipe}\\makeatother\\input C:/Users/me/Desktop/PERLTEST/Perl',
'Modules/RevuedeProjetDB.tex'
];
This was done by inserting
use File::Slurp;
use Data::Dumper;
write_file 'C:\Users\me\Desktop\PERLTEST\mydump.log', Dumper( \#latexcmdline );
before the exec command.
Update
I initially recommended that you should use String::ShellQuote but that module is for Linux only so I deleted my answer when I realised that your question was about the Windows system
It seems that there's also a Win32::ShellQuote which does the same thing for Windows, so I am renewing my suggestion
As I said before, the issue is that perltex itself doesn't properly handle paths containing whitespace, even if they are correctly passed as a single element of #ARGV. I believe the solution is to pass the path including enclosing quotes, although I have never been able to test this properly as I have no LaTex installation
Unfortunately, even if I pass qq{"$texfile"}, the quotes are still stripped before they reach the target program, so they must be protected in some way
You need the quote_system function from that module, which will prepare a list of strings so that they retain any quotation marks
Using a parameter of quote_system(qq{"$texfile"}) produces the correct result in my tests. It is the equivalent of passing qq{"\\"$texfile\\""} but less ugly
So your system call should be like this (with no modification to perltex.pl)
I have applied the same principle to $Filename as it may well be that this also contains whitespace
use Win32::ShellQuote 'quote_system';
system(quote_system(
'perltex',
'--latex=pdflatex',
'--nosafe',
qq{--jobname="$Filename"},
qq{"$texfile"},
));
Okay, well I have a solution of sorts
The issue, as I suspected, is that, although the path is passed as a single string to perltex.pl, the latter doesn't handle paths with spaces properly after it has received them
The temporary fix is to hack perltex.pl
Line 82 of my version of perltex.pl (there is no version number in the source) reads
$latexcmdline[$firstcmd] = "\\input $option";
If you change that to
$latexcmdline[$firstcmd] = qq{\\input "$option"};
then all should be well. However this is a solid fix only when it is distributed by the author of perltex. Meanwhile I am looking for a nicer solution from the calling side
There are two steps to solving this problem.
Work out how to get the correct arguments into an external
program.
Work out how to do that from a Perl program.
For step 1, I find a program like this to be useful.
#!/usr/bin/perl
use strict;
use warnings;
print "Received ", scalar #ARGV, " arguments:\n";
for (1 .. #ARGV) {
print "$_: $ARGV[$_ - 1]\n";
}
It just explains what arguments it receives on the command line. You can use this in place of "perltex" for testing purposes.
You'll see that if you give it an argument that contains spaces, then that is interpreted as the called program as multiple arguments. The way to get round that is to quote the argument that contains a space. And I seem to remember that Windows insists on double-quotes (for reasons that I can never remember).
So I think that you want this:
system('perltex', '--latex=pdflatex', '--nosafe', "--jobname=\"$Filename\"", "\"$texfile\"");
I've double-quoted both of the filenames. Of course, those escaped double-quote characters look really ugly, and Perl gives us qq(...) to make that look nicer.
system('perltex', '--latex=pdflatex', '--nosafe', qq(--jobname="$Filename"), qq("$texfile"));
If that's not quite right, then the program I showed earlier will make it easier to find the solution.
Update: Borodin's comment below about this making no difference to $texfile is accurate. The fact that we're passing a list to system() means that the shell isn't involved at all.

Change output filename from WGET when using input file option

I have a perl script that I wrote that gets some image URLs, puts the urls into an input file, and proceeds to run wget with the --input-file option. This works perfectly... or at least it did as long as the image filenames were unique.
I have a new company sending me data and they use a very TROUBLESOME naming scheme. All files have the same name, 0.jpg, in different folders.
for example:
cdn.blah.com/folder/folder/202793000/202793123/0.jpg
cdn.blah.com/folder/folder/198478000/198478725/0.jpg
cdn.blah.com/folder/folder/198594000/198594080/0.jpg
When I run my script with this, wget works fine and downloads all the images, but they are titled 0.jpg.1, 0.jpg.2, 0.jpg.3, etc. I can't just count them and rename them because files can be broken, not available, whatever.
I tried running wget once for each file with -O, but it's embarrassingly slow: starting the program, connecting to the site, downloading, and ending the program. Thousands of times. It's an hour vs minutes.
So, I'm trying to find a method to change the output filenames from wget without it taking so long. The original approach works so well that I don't want to change it too much unless necessary, but i am open to suggestions.
Additional:
LWP::Simple is too simple for this. Yes, it works, but very slowly. It has the same problem as running individual wget commands. Each get() or get_store() call makes the system re-connect to the server. Since the files are so small (60kB on average) with so many to process (1851 for this one test file alone) that the connection time is considerable.
The filename i will be using can be found with /\/(\d+)\/(\d+.jpg)/i where the filename will simply be $1$2 to get 2027931230.jpg. Not really important for this question.
I'm now looking at LWP::UserAgent with LWP::ConnCache, but it times out and/or hangs on my pc. I will need to adjust the timeout and retry values. The inaugural run of the code downloaded 693 images (43mb) in just a couple minutes before it hung. Using simple, I only got 200 images in 5 minutes.
use LWP::UserAgent;
use LWP::ConnCache;
chomp(#filelist = <INPUTFILE>);
my $browser = LWP::UserAgent->new;
$browser->conn_cache(LWP::ConnCache->new());
foreach(#filelist){
/\/(\d+)\/(\d+.jpg)/i
my $newfilename = $1.$2;
$response = $browser->mirror($_, $folder . $newfilename);
die 'response failure' if($response->is_error());
}
LWP::Simple's getstore function allows you to specify a URL to fetch from and the filename to store the data from it in. It's an excellent module for many of the same use cases as wget, but with the benefit of being a Perl module (i.e. no need to outsource to the shell or spawn off child processes).
use LWP::Simple;
# Grab the filename from the end of the URL
my $filename = (split '/', $url)[-1];
# If the file exists, increment its name
while (-e $filename)
{
$filename =~ s{ (\d+)[.]jpg }{ $1+1 . '.jpg' }ex
or die "Unexpected filename encountered";
}
getstore($url, $filename);
The question doesn't specify exactly what kind of renaming scheme you need, but this will work for the examples given by simply incrementing the filename until the current directory doesn't contain that filename.

Convert Perl Script to VBA Script

I have a working Perl script that I would like to convert to VBA to run in an Excel macro so that it can easily be shared to other PC's. I have a shell script (where I pass the parameters) driving the perl script.
I used the Perl script to read each specified column of all rows of data from a fixed width file (below the start is 54,63) and compare that data with another file and print the difference. I'd pass parameters in the Shell script as runpro.pl filea.txt fileb.csv > myoutput.txt
Any assistance would be great! Especially if someone can point me in the right direction since the code is fairly simple. Thanks!
#!/usr/bin/perl
#Perl Script runpro.pl
#***************************************************
use strict;
use warnings;
my ($fa, $fb) = #ARGV;
#ARGV = $fa;
my %codes;
while(<>) {
s/[\r\n]+\z//;
$_ = substr($_, 54, 63);
s/\s+\z//;
next if $_ eq "";
$codes{$_} =1;
}
#ARGV = $fb;
my %descrip;
while(<>) {
s/[\r\n]+\z//;
s/,.*//;
s/"//g;
$descrip {$_} = 1 if s/^1234//;
}
for (sort keys %codes) {$
print "$_\n" unless ($descrip{$_});
}
A couple of points:
1) VBA is very different from Perl - so things that are one-liners in one will be tricky in the other
2) If you haven't used VBA in Excel, I suggest you start by "recording" a macro (first make the "Developer" tab in the ribbon visible, then select "record macro", and start doing things like opening files, importing them (fixed width). After you stop the recording you will see the syntax for doing these things - that should help a lot
3) You will have to decide how you want to pass arguments to VBA - cells on a worksheet, dialog box... There is no such thing as "running VBA from the command line".
I wonder if you really need / want VBA or if you would be better off compiling a standalone program (.exe). Is this meant to run on PC (windows), Mac OS, or both? See for example this earlier question - maybe that's what you actually need (if not what you asked for...)?