How to locate code causing corrupt binary output in Perl - perl

I have a relatively complex Perl program that manages various pages and resources for my sites. Somewhere along the line I messed up something in a library of several thousand lines that provides essential services to most of the different scripts for the system so that scripts within my codebase that output PDF or PNG files no longer can output those files validly. If I rewrite the scripts that do the output to avoid using that library, they work, but I'd like to figure out what I broke within my library that is causing it to hurt binary output.
For example, in one snippet of code, I open a PDF file (or any sort of file -- it detects the mime type automatically) and then print it directly:
#Figure out MIME type.
use File::MimeInfo::Magic;
$mimeType = mimetype($filename);
my $fileData;
open (resource, $filename);
foreach my $self (<resource>) { $fileData .= $self; }
close (resource);
print "Content-type: " . $mimeType . "\n\n";
print $fileData;
exit;
This worked great, but at some point while editing the suspect library I mentioned, I did something that broke it and I'm stumped as to what I did. I had been playing with different utf8 encoding functions, but as far as I can tell, I removed all of my experimental code and the problem remains. Merely loading that library, without calling any of its functions, breaks the script's ability to output the file.
The output is actually being corrupted visibly, if I open it in a text editor. If I compare the source file that is opened by the code given above and the output, the source file and the output file have many differences despite there being no processing in the code above before output (those links are to a sample PDF that was run through the broken code).
I've tried retracing my steps for days and cannot find what is wrong in the problematic library -- I hadn't used this function in awhile and I wrote a lot of new code since I last tested it, so it is hard to know precisely where the problem is. My hope is someone may be able to look at the corrupted output file in comparison to the source file and at least point me in the direction of what I should be looking for that could cause such a result. I feel like I'm looking for a needle in the haystack.

Related

use matlab to open a file with an outside program and execute 'save as'

Alright, here's what I'm dealing with (you can skip to TLDR if all you need to see is what I want to run):
I'm having an issue with file formatting for a nasty conglomeration of several ancient programs I've strung together. I have some data in .CSV format, and I need to put it into .SPC format. I've tried a set of proprietary MATLAB programs called 'GS tools' for fast and easy conversion, but fast and easy doesn't look like its gonna happen here since there are discrepancies in how .spc files are organized now and how they were organized back when my ancient programs were written.
If I could find the source code for the old programs I could probably alter the GS tools code to write my .spc files appropriately, but all I can find are broken links circa 2002 and earlier. Seeing as I don't know what my programs are looking for, I have no choice but to try resaving my data with other programs until one of them produces something workable.
I found my Cinderella program: if I open the data I have in a program called Spekwin and save the file with a .spc extension... viola! Everything else runs on those files. The problem is that I have hundreds of these files and I'd like to automate the conversion process.
I either need to extract the writing rubric Spekwin uses for .spc files (I believe that info is stored in a dll file within the program, but I'm not sure if that actually makes sense) and use it as a rule to write a file from my input data, or I need a piece of code that will open a file with Spekwin, tell Spekwin to save that file under the .spc extension, and terminate Spekwin.
TLDR: Need a command that tells the computer to open a file with a certain program, save that file under a different extension through that program (essentially open*.csv>save as>*.spc), then terminate the program.
OR--I need a way to tell MATLAB to write a file according to rules specified by a .dll, but I'm not sure I fully understand what that entails.
Of course I'm open to suggestions on other ways to handle this.

LIne-by-line file-io not working as expected in Windows

I'm using Perl 5.16.1 from Strawberry in a Windows environment. I have a Perl script reading very large text files. The smallest text file is 30M. When reading files that do not have a line feed at the end of the very last line I get very peculiar results. It may not happen all the time but when it does It's as though it is reading cached data from the I/O system for another file that I previously opened with the Perl script. If I manually edit the file and add a line feed it's fine. I added a line counter and some inline code to display what happens when I'm near the end of the file to make sure I wasn't going nuts. To try and fix I tried adding this to my script:
open (SS_LOG, ">>", $SSFile) or die "Can't open $SSFile\r\n $!\r\n";
print SS_LOG "\r\n";
close SS_LOG;
but it does nothing. The file stays the same size. I'm also storing data in large arrays.
Has anyone else seen anything like this?
Try unbuffering your output:
SS_LOG->autoflush(1);

Perl Extracting XML Tag Attribute Using Split Or Regex

I am working on a file upload system that also parses the files that are uploaded and generates another file based on info inside the file uploaded. The files being uploaded as XML files. I only need to parse the first XML tag in each file and only need to get the value of the single attribute in the tag.
Sample XML:
<LAB title="lab title goes here">...</LAB>
I am looking for a good way of extracting the value of the title attribute using the Perl split function or using Regex. I would use a Perl XML parser if I had the ability to install Perl modules on the server I am hosting my code on, however I do not have that ability.
This XML is located in an XML file, that I am opening and then attempting to parse out the attribute value. I have tried using both Split and Regex to no luck. However, I am not very familiar with Perl or regular expressions.
This is he basic outline my code so far:
open(LAB, "<", "path-to-file-goes-here") or die "Unable to open lab.\n";
foreach my $line (<LAB>) {
my #pieces = split(/"(.*)"/, $line);
foreach my $piece (#pieces) {
print "$piece\n";
}
}
I have tried using split to match against title alone using
/title/
Or match against the = character or the " character using
/\=/ or /\"/
I have also tried doing similar things using regex and have had no luck as well. I am not sure if I am just not using the proper expression or if this is not possible using split/regex. Any help on the matter would be much appreciated, as I am admittedly a novice at Perl still. If this type of question has been answered elsewhere, I apologize. I did some searching and could not find a solution. Most threads suggest using an XML parsing Perl module, which I would if I had the privileges to install them.
"But I can't use CPAN" is a quick way to get yourself downvoted on the Perl tag (though it wasn't I who did so). There are many ways that you can use CPAN, even if you don't have root. In fact you can have your own Perl even if you don't have root. While I highly recommend some of those options, for now, the easiest way to do this is just to download some Pure Perl modules, and included them in your codebase. Mojolicious has a very small, but very useful XML/DOM parser called Mojo::DOM which is a likely candidate for this kind of process.

Printing Multiple Files in one Print Job

I have a program that will output timesheets as separate .xps files into a folder. I am looking for a way to use the command line to print all of these files at the same time. Since there could be hundreds of these files it is also important that they be printed as one print job so that other documents aren't printed in the middle of them.
I have been searching on and off for about two months for a way to do this. So far I have come up with nothing. I would appreciate any advice on how to do this.
Thanks.
I ended up converting my files to PDFs. Then I downloaded pdftk which is a command-line tool that can be used to combine multiple PDF files into one. Now I can just run pdftk and then print the combined file. It's unfortunate that there is not a better way to handle this in Windows but at least it works.

Trouble reading text from a pdf in Perl

I am trying to read the text content of a pdf file into a Perl variable. From other SO questions/answers I get the sense that I need to use CAM::PDF. Here's my code:
#!/usr/bin/perl -w
use CAM::PDF;
my $pdf = CAM::PDF->new('1950-01-01.pdf');
print $pdf->numPages(), " pages\n\n";
my $text = $pdf->getPageText(1);
print $text, "\n";
I tried running this on this pdf file. There are no errors reported by Perl. The first print statement works; it prints "2 pages" which is the correct number of pages in this document.
The next print statement does not return anything readable. Here's what the output looks like in Emacs:
2 pages
^A^B^C^D^E^C^F^D^G^H
^D^A^K^L^C^M^D^N^C^M^O^D^P^C^Q^Q^C ^D^R^K^M^O^D ^A^B^C^D^E
^F^G^G^H^E
^K^L
^M^N^E^O^P^E^O^Q^R^S^E
.... more lines with similar codes ....
Is there something I can do to make this work? I don't understand pdf files too well, but I thought that because I can easily copy and paste the text from the PDF file using Acrobat, it must be recognized as text and not an image, so I hoped this meant I could extract it with Perl.
Any guidance would be much appreciated.
PDFs can have different kinds of content. A PDF may not have any readable text at all, only bitmaps and graphical content, for example. The PDF you linked to, has compressed data in it. Open it with a text editor, and you will see that the content is in a "/Filter/FlateDecode" block. Perhaps CAM::PDF doesn't support that. Google FlateDecode for a few ideas.
Looking further into that PDF, i see that it also uses embedded subsets of fonts, with custom encodings. Even if CAM::PDF handles the compression, the custom encoding may be what's throwing it off. This may help: Web page from a software company, describing the problem
I'm fairly certain that the issue isn't with your perl code, it is with the PDF file. I ran the same script on one of my own PDF files, and it works just fine.