Is there a simple way to effectively cat a filestream without writing to disk? - command-line

I'm working on a system to scan remote files for viruses. I'm downloading as a stream and would like to avoid saving unscanned files to disk for obvious reasons.
I can use clamscan for scanning the stream, but I'm not sure how to generate that stream in the command line. Both echo and the command line have the potential of playing games with what is actually being output if I did something like the following:
system("echo $data | clamscan -");
Are there any elegant solutions to achieving this that I am missing? Obviously I could probably filter the file dump with some stream editor before it hits clamscan, but that is definitely not elegant and prone to error, I would think.

You could use popen(). However, it has its limitations. Anything more sophisticated will require you to play with your pipes and spawning of processes.

Related

Checking if a file is still open

I use ffmpeg to reduce size and convert a video file with a batch. Meanwhile, I'd like to check if the converting process of this video is done, using a Perl script.
Is the -t operator checking that ?
Or a simple executable check -x does the trick ? Or something else ?
Thank you !
It's inadvisable to argue with people whose help you're getting for free
It's quite possible to examine what file handles are open and by what processes, but the method varies according to the operating system. And it sounds like you're running ffmpeg on a remote system so it's even less straightforward
The usual method would be cooperative locking, but ffmpeg doesn't do that
If you're running a batch job, then the obvious way is for the job to create a flag file once the ffmpeg run is complete. Then you need only to wait for the existence of that file to be sure that ffmpeg has finished
Please don't be overconfident in future, or you will get only the answers that you deserve

Reduce relocatable win32 Perl to as few files and bytes as possible

I'm trying to use a perl program on a Windows HTCondor computing cluster. The way HTCondor on windows works is it copies all dependencies into a temporary directory (used as a chroot of sorts) and then it deletes the directory after the specified outputs are moved to a designated place.
If I take only perl.exe and perl514.dll and make a job like this: perl -e "print qq/hello\n/" and tell the cluster to run it 200 times, then each replication winds up taking about 15 seconds, which is acceptable overhead. That's almost all time spent repeatedly copying the files over the network and then deleting them. echo_hello.bat run 200 times takes more like two seconds per replication.
The problem I have is that when I try to use my full blown perl distribution of 55MB and 2,289 files, a single "hello" rep takes something like four minutes of copying and deleting, which is unacceptable. When I try to do many runs the disks on the machines grind to a halt trying to concurrently handle all the file operations across all the reps, so it doesn't work at all. I don't know how long it might take to eventually finish because I gave up after half an hour and no jobs had finished.
I figured PAR::Packer might fix the issue, but nope. I tried print_hello.exe created like this: pp -o print_hello.exe -e "print qq/hello\n/". It still makes things grind to a halt, apparently by swamping the filesystem. I think a PAR::Packer executable makes a ton of temporary files as it pulls out files it needs from the archive. I think the windows file system totally chokes when there are a bunch of concurrent small file operations.
So how can I go about cutting down the perl I built to something like 6MB and a dozen files? I'm really only using a tiny number of core modules and don't need most of the crap in bin and lib, but I have no idea how to proceed ripping out stuff in a sane way.
Is there an automated way to strip away un-needed files and modules?
I know TCL has a bunch of facilities for packing files into a single uncompressed archive that can then be accessed through a "virtual filesystem" without expanding the file. Is there some way to do this with perl itself sort of like with PAR? The problem is PAR compresses everything and then has to extract to temporary files, rather than directly work through a virtual filesystem layer. (If I understand correctly.)
My usage of perl is actually as a scripting layer. It's embedded in a simulation. So I'm really running my_simulation.exe which depends on per514.dll, but you get the idea. I also cannot realistically do anything to the HTCondor cluster other than use it. So there's no need to think outside the box on what I should be using instead of perl and what I could administratively tweak in Windows and HTCondor, thanks.
You can use Module::ScanDeps to get list of actual dependencies of your perl. It was terrible, that it took significant amount of time, when PAR::Packer unpacked the whole application, so I decided to build the executable by myself.
Here is my ready to use script which gathers perl dependencies into some directory; it might be useful for you to reduce the number of perl-modules, e.g. by manually removing some dependencies after copying.
In theory (I have never tried that), the next your step could be merge all pure-perl dependencies into single file (like deps.pm); although it might be non-trivial due to perl's autoload magic and some other tricks.
You can list the modules that are needed by your program using the very nice ListDependencies module
To my knowledge it isn't downloadable anywhere, but it is simple to copy and paste into your own ListDependencies.pm file
You should read the POD documentation within the module for usage instructions

Command line CSV viewer with column-alignment for LARGE files

I would like to view my CSV files in a column-aligned format from the command line, with something like less, but my CSV files are sometimes gigabytes big, and I'm using a little computer (Netbook, 1GB RAM, 8GB HD, 1GHz processor), so I don't want to waste a lot of memory or processing power viewing the file.
I mention that I'd like to use something like less because I would like to be able to navigate around within the file.
cat FILE | column -s, -t | less is one thought, but cat is still going to try to print the whole file and I'm not sure how much buffering the pipes will use (if any) or what sort of caching less employs.
This question is similar to this other question, but I'm specifically interested in viewing large files using minimal resources preferably already on the machine. I don't presently use VI or EMACS, and think they'd both be overkill here. VI, for instance, would be a 27MB install for a utility acting merely as a viewer.
First of all, less can open oversized files. Second, both vim (which I use with the Largefile plugin and with files over 8 GB) and emacs can do it.
But... Most of the time, viewing a big file in a 80x40 (or a bit bigger) terminal is useless... so you should filter it with something like (f)grep or process it with awk. If you want only the start or end, then there are head and tail.
HTH
Check the tail \ head commands.
Or even better, Download VIM source and compile it. That should be easy enough. Version 5.8 source is 1Mb before decompressing (4MB after). Enjoy.

To read a big file which are in Gigs fastly in PERL

We are currently reading the file line by line which delays to read and complete for all.
we would need to read the file fastly and prgoress with our commands.
the commands which i tried using fork and array just displays me the first set of lines only and not proceeding with pther sets.
please help on it.
Reading a large file takes a fair bit of time - disks are slow, after all. Before you start looking at Perl, first try (assuming you're on a unix-type system):
time cat /path/to/your/large/file >/dev/null
The output will tell you how long it takes to just read that file from disk without doing anything to it. Alternately, open the file in your favorite text editor and time how long it takes to load. Once you have that time, compare it to how long your Perl program takes to read the file. Unless the Perl program takes significantly longer, you're not likely to be able to do anything about it because the time is being spent on getting the data from disk rather than on processing it.
Of course, that's assuming that you actually do need to read the entire file. If you can get by with only reading specific parts of it, then you could create an index file and use that to jump directly to the part that's of interest, but you haven't provided enough information for us to tell whether that would apply to your case or not.
If you need more specific help, please provide a better description of what you mean to accomplish and a small, runnable piece of Perl code which shows how you're currently reading and processing the file so that we can see whether you're doing anything particularly inefficient that can be improved on.

ExtAudioFileSeek and ExtAudioFileWrite together on the same file

I have a situation where I can save a post-processing pass through the audio by taking some manipulated buffer from the end of the track and writing them to the beginning of my output file.
I originally thought I could do this by resetting the write pointer using ExtAudioFileSeek, and was about to implement it when I saw this line in the docs
Ensure that the file you are seeking in is open for reading only. This function’s behavior with files open for writing is undefined.
Now I know I could close the file for writing then reopen it, but the process is a little more complicated than that. Part of the manipulation I am doing is reading from buffers that are in the file I am writing to. The overall process looks like this:
Read buffers from the end of the read file
Read buffers from the beginning of the write file
Process the buffers
Write the buffers back to the beginning of the write file, overwriting the buffers I read in step 2
Logically, this can be done in 1 pass no problem. Programmatically, how can I achieve the same thing without corrupting my data, becoming less-efficient (opposite of my goal) or potentially imploding the universe?
Yes, using a single audio file for both reading and writing may, as you put it, implode the universe, or at least lead to other nastiness. I think that the key to solving this problem is in step 4, where you should write the output to a new file instead of trying to "recycle" the initial write file. After your processing is complete, you can simply scrap the intermediate write file.
Or have I misunderstood the problem?
Oh, and also, you should use ExtAudioFileWriteAsync instead of ExtAudioFileWrite for your writes if you are doing this in realtime. Otherwise the I/O load will cause audio dropouts.