How to ignore error/exception during process of a list? - kdb

Learning kdb+q. Without loss of generality, let's say if I have a list of file paths and wants to open each one of them, but I want to make sure Q opens whatever it exists. Say if one of the file paths doesn't exist, then continue to process others.
I know protected evaluation can let me handle error #[open_many_files_func; files; errhandler].
But how to make it not fail in the middle and continue with the rest?

You can use 'each' to iterate over the files
#[open_many_files_func; ; errhandler] each files

If it's just existance of the file that you're checking for (and not that the function fails due to a problem with the file), then you could also use the key function to check the existance of the file.
q)system "ls"
"file1"
"file2"
"file4"
q)b: a where count each a: key each `:file1`:file2`:file3`:file4
`:file1`:file2`:file4
Once you have the list of files, you can just do
open_many_files_func each b
https://code.kx.com/q/ref/key/

Related

rename file when using DD name

In a 'C' language LP64 compiled program, which will run in Batch, TSO and z/OS UNIX, when opening a PDS(E) member using the following notation (recommended in order to allow file disposition to be used):-
hFile = fopen("DD:CONFIG(COPY)", "w");
fclose(hFile);
I am surprised to discover that the following does not appear to work:-
rename("DD:CONFIG(COPY)","DD:CONFIG(MAIN)");
Failing as it does with an errno of ENOENT (EDC5129I No such file or directory.)
The documentation for rename says:-
The rename() function renames memory files and DASD data sets. It also renames individual members of PDSs (and PDSEs)
If instead I do:-
rename("//'MYUSER.CONFIG(COPY)'","//'MYUSER.CONFIG(MAIN)'");
the rename() works.
Alternatively if I do:-
rename("//'MYUSER.CONFIG(COPY)'","DD:CONFIG(MAIN)");
if fails with an errno of EINVAL (EDC5121I Invalid argument.)
Why does it not accept the same file name notation that is used for fopen?
The reason this is important is because the rename() cannot succeed while the PDSE is being browsed by someone. Whereas, using the DD: notation allows an fopen() for write to succeed when the PDSE is being browsed because the DISP=SHR coded on the DD name in the JCL is adopted by the fopen().
So, I suppose the real question is - how can my program rename a PDSE member in a way that will succeed when the PDSE is also being browsed by someone?
The technique required to rename a dataset is different than the technique to rename a member inside a PDS/PDSE...I'd wager that the system rename() function you're calling is just getting this wrong. In z/OS, there are lots of combinations functions like "rename()" have to handle, and it's not unusual to find some that don't work as you expect.
Certainly it's worth a call to IBM Support to see if there's something else going on here...what you're trying to do seems like it should work, so I think there's something to be said for treating it like a bug or documentation error.
Beyond that, as you suggest, you can either use the form of rename that works, or you can replace the system's rename function with something that actually works properly.
One simple way would be to create the rename() as you show it:
rename("//'MYUSER.CONFIG(COPY)'","//'MYUSER.CONFIG(MAIN)'");
You can get the DSN for a DDNAME using the fldata() function, so it's not hard to create a rename like this on the fly given an open file handle. Beware that the form of rename may allocate the file you specify with DISP=OLD, and hence cause problems if some other task has the file allocated. Also, if this is supposed to be commercial quality code, as a customer, my eyebrows would go up if I found out you needed to launch some external program because you couldn't figure out how to rename a PDS/PDSE member - but that might just be me.
The other alternative is to write your own "rename()" function...unfortunately, it most likely would need to be assembler language if you want it to be efficient. As others suggest, you might spawn off a shell, REXX or TSO command, but of course, that means creating a new process, etc etc etc just to rename the PDS/PDSE member. Keep in mind also that some of these approaches might also have issues with trying to allocate the input file with DISP=OLD.
If that's too slow for your needs, the way to do what you want is to call a small assembler routine that invokes the system STOW service against your DDNAME to do your rename. The flow would be something like this:
You'd create a 16-byte area containing the old and new member names. They're 8 characters each and blank padded.
You'd need the address of an open DCB that describes the file you're looking at. You can get the DCB address from the FILE structure, I believe - or you could just open a second DCB to the DDNAME you have allocated.
You'd call the system STOW service with the parameters that tell it to rename a PDS/PDSE member:
STOW dcb,area_from_step1,C
In the STOW macro above, the "directory option" of "C" tells STOW that you want to rename an existing member. The area_from_step1 has the current and new member names - the system searches the directory for the current name and rewrites it with the new member name in place.
To be honest, what I describe above is exactly what the system runtime should be doing, but if it's not and IBM doesn't want to fix it, then you might prefer to do this sort of thing "by hand".
Not sure if this will work, but since you have the dataset already allocated, perhaps you could "call" (for some value of call) IEHPROGM from your program, constucting the proper SYSIN before making the call?
Here's a link to the IBM example for IEHPROGM (mind any break):
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.idau100/u1354.htm
--Scott

Perl: Is it better to clobber a file or remove it and open a new one?

For example,
#!/usr/bin/perl
open FILE1, '>out/existing_file1.txt';
open FILE2, '>out/existing_file2.txt';
open FILE3, '>out/existing_file3.txt';
versus
#!/usr/bin/perl
if (-d out) {
system('rm -f out/*');
}
open FILE1, '>out/new_file1.txt';
open FILE2, '>out/new_file2.txt';
open FILE3, '>out/new_file3.txt';
In the first example, we clobber the files (truncate them to zero length). In the second, we clean the directory and then create new files.
The second method (where we clean the directory) seems redundant and unnecessary. The only advantage to doing this (in my mind) is that it resets permissions, as well as the change date.
Which is considered the best practice? (I suspect the question is pedantic, and the first example is more common.)
Edit: The reason I ask is because I have a script that will parse data and write output files to a directory - each time with the same filename/path. This script will be run many times, and I'm curious whether at the start of the script I should partially clean the directory (of the files I am writing to) or just let the file handle '>' clobber the files for me, and take no extra measures myself.
Other than the permissions issue you mentioned, the only significant difference between the two methods is if another process has one of the output files open while you do this. If you remove the file and then recreate it, the other process will continue to see the data in the original file. If you clobber the file, the other process will see the file contents change immediately (although if it's using buffered I/O, it may not notice it until it needs to refill the buffer).
Removing the files will also update the modification time of the containing directory.

how to use perl Archive::Zip to recursively walk archive files?

I have a small perl script that I use to search archives for members matching a name. I'd like to enhance this so that if it finds any members in the archive that are also archives (zip, jar, etc) it will then recursively scan those, looking for the original desired pattern.
I've looked through the "Archive::Zip" documentation, and I thought I saw how to do this. I noticed the "fh()" and "readFromFileHandle()" methods. However, in my testing, it appears that the "fh()" call on an archive member returns the file handle for the containing archive, not the member. Perhaps I'm doing it wrong, but I would appreciate an example of how to do this.
You can't read the contents of any sort of archive member (whether it is text, picture, or another archive) without extracting it from the archive file.
Once you have identified a member that you want to view, you must call extractMember (or, more likely, extractMemberWithoutPaths if the file is to be temporary) to extract it to a disk file. Then you can create a new Archive::Zip object and read the new file while keeping the old one open.
You will presumably want to unlink the archive file once you have catalogued its contents.
Edit
I hadn't come across the Archive::Zip::MemberRead module before. It appears you were on the right track with readFromFileHandle. I would guess that it should work like this, but it would be awkward for me to test it at present.
my $zip = Archive::Zip->new;
$zip->read('myfile.zip');
my $zipfh = Archive::Zip::MemberRead->new($zip, 'archive/path/to/member.zip');
my $newzip = Archive::Zip->new;
$newzip->readFromFileHandle($zipfh)

Delete multiple files with names containing a substring efficiently

I would like to delete multiple files that contain a substring. Say for example I would like to delete all the files that has the substring my. Assume that my directory contains 4 files: photo.jpg, myPhoto.jpg, beachMyPhoto.jpg, anyPhoto.jpg, since the term of search is my the files that I am interested to delete are myPhoto.jpg and beachMyPhoto.jpg (case insensitive).
My proposed solution (which I know how to do) is to use NSFileManager class, and use the function contentsOfDirectoryAtPath:error: to read all the directory contents, and then search by a loop for a hit. If a hit is found I delete that file.
What I don't like in my proposed solution is that it is not that efficient especially if the directory contains too many files and the hit is a small number. Is there a more efficient way to do this?
If you don't want a big array loaded into memory, you can try -[NSFileManager enumeratorAtURL:includingPropertiesForKeys:options:errorHandler:]. Since you only want the immediate contents of the directory, you would invoke -[NSDirectoryEnumerator skipDescendants] for each directory that it returns.
If your concern is iterating over all of the items in the directory, testing for your match pattern, well that's unavoidable. Any technique you would hope to use has to somehow iterate over all of the items in the directory and test for a match. The only question is whether that iteration is exposed to you or not. In Cocoa, it is. You could drop down to the glob() function if you want an alternative where it isn't.

How to detect changing directory size in Perl

I am trying to find a way of monitoring directories in Perl, in particular the size of a directory, and upon detecting a change in directory size, perform a particular action.
The issue I have is with large files that require a noticeable amount of time to copy into this directory, i.e. > 100MB. What happens (in Windows, not Unix) is the system reserves enough disk space for the entire file, even though the file is still copying in progress. This causes problems for me, because my script will try to perform an action on this file that has not finished copying over. I can easily detect directory size changes in Unix via 'du', but 'du' in Windows does not behave the same way.
Are there any accurate methods of detecting directory size changes in Perl?
Edit: Some points to clarify:
- My Perl script is only monitoring a particular directory, and upon detecting a new file or a new directory, perform an action on this new file or directory. It is not copying any files; users on the network will be copying files into the directory I am monitoring.
- The problem occurs when a new file or directory appears (copied, not moved) that is significantly large (> 100MB, but usually a couple GB) and my program fires before this copy completes
- In Unix I can easily 'du' to see that the file/directory in question is growing in size, and take the appropriate action
- In Windows the size is static, so I cannot detect this change
- opendir/readdir/closedir is not feasible, as some of the directories that appear may contain thousands of files, and I want to avoid the overhead of
Ideally I would like my program to be triggered on change, but I am not sure how to do this. As of right now it busy waits until it detects a change. The change in file/directory size is not in my control.
You seem to be working around the underlying issue rather than addressing it -- your program is not properly sending a notification when it is finished copying a file. Why not do that instead of using OS-specific mechanisms to try to indirectly determine when the operation is complete?
You can use Linux::Inotify2 or Win32::ChangeNotify to detect directory/file changes.
EDIT: File::ChangeNotify seems a better option (cross-platform & used by Catalyst)
As I understand it, you are polling a directory with thousands of files. When you see a new file, there is an action that is taken on the file. This causes problems if the file is in use or still being copied, correct?
There are potentially several solutions:
1) Use flock to detect if the file is still in use by another process (test if it works properly on your OS, file system, and Perl version).
2) Use a LockFile call on Windows. If it fails, the OS or another process is using that file.
3) Change the poll interval to a non busy time on the server and take the directory off line while your process completes.
Evaluating the size of a directory is something all but the most inexperienced Perl programmers should be able to do. You can write your own portable version of du in 15 lines of code if you know about:
Either glob or opendir / readdir / closedir to iterate through the files in a directory
The filetest operators (-f file, -d file, etc.) to distinguish between regular files and directory names
The stat function or file size operator -s file to obtain the size of a file
There is a nice module called File::Monitor, it will detect new files, deleted files, changes in size and any other attribute that can be done with stat. It will then go and out put the files for you.
http://metacpan.org/pod/File::Monitor
You set up a baseline scan, then set up a call back for each item you are looking for, so new changes you can see via
$monitor->watch( {
name => 'somedir',
recurse => 1,
callback => {
files_created => sub {
my ($name, $event, $change) = #_;
# Do stuff
}
}
} );
If you need to go deeper than one level just do it to whatever level you need. After this is done and it finds new files you can trigger you application to do what you want on the files.