How exactly does unlink work? - perl

I'm having a little difficulty understanding how exactly this works.
It seems that unlink() will remove the inode which refers to the file's data, but won't actually delete the data. If this is the case,
a) what happens to the data? Presumably it doesn't stick around forever, or people would be running out of disk space all the time. Does something else eventually get around to deleting data without associated inodes, or what?
b) if nothing happens to the data: how can I actually delete it? If something automatically happens to it: how can I make that happen on command?
(Auxiliary question: if the shell commands rm and unlink do essentially the same thing, as I've read on other questions here, and Perl unlink is just another call to that, then what's the point of a module like File::Remove, which seems to do exactly the same thing again? I realize "there's more than one way to do it", but this seems to be a case of "more than one way to say it", with "it" always referring to the same operation.)
In short: can I make sure deleting a file actually results in its disk space being freed up immediately?

Each inode on your disk has a reference count - it knows how many places refer to it. A directory entry is a reference. Multiple references to the same inode can exist. unlink removes a reference. When the reference count is zero, then the inode is no longer in use and may be deleted. This is how many things work, such as hard linking and snap shots.
In particular - an open file handle is a reference. So you can open a file, unlink it, and continue to use it - it'll only be actually removed after the file handle is closed (provided the reference count drops to zero, and it's not open/hard linked anywhere else).

unlink() removes an link (a name if you want, but technically a record in some directory file) to the data (referred by an inode). Once there is no more link to the data, the system automatically free the associated space. The number of links to an inode is tracked into the inode. You can observe the number of actual links to a file with ls -l for example :
789994 drwxr-xr-x+ 29 john staff 986 11 nov 2010 SCANS
23453 -rw-r--r--+ 1 erik staff 460 19 mar 2011 SQL.java
This means that the inode 789994 has 29 links to it and that inode 23453 has only 1.
SQL.java is an entry into the current directory which points to inode 23453, if you remove that record from the directory (system call unlink or command rm) then the count goes to 0 and the system free the corresponding space, because if the count is 0 then this means that there is no more link/name to access the data! So it can be freed.

Removing the link just means the space is no longer reserved for a certain file name. The data will be erased/destroyed when something else is allocated to that space. This is why people write zeroes or random data to a drive after deleting sensitive data like financial records.

Related

rename file when using DD name

In a 'C' language LP64 compiled program, which will run in Batch, TSO and z/OS UNIX, when opening a PDS(E) member using the following notation (recommended in order to allow file disposition to be used):-
hFile = fopen("DD:CONFIG(COPY)", "w");
fclose(hFile);
I am surprised to discover that the following does not appear to work:-
rename("DD:CONFIG(COPY)","DD:CONFIG(MAIN)");
Failing as it does with an errno of ENOENT (EDC5129I No such file or directory.)
The documentation for rename says:-
The rename() function renames memory files and DASD data sets. It also renames individual members of PDSs (and PDSEs)
If instead I do:-
rename("//'MYUSER.CONFIG(COPY)'","//'MYUSER.CONFIG(MAIN)'");
the rename() works.
Alternatively if I do:-
rename("//'MYUSER.CONFIG(COPY)'","DD:CONFIG(MAIN)");
if fails with an errno of EINVAL (EDC5121I Invalid argument.)
Why does it not accept the same file name notation that is used for fopen?
The reason this is important is because the rename() cannot succeed while the PDSE is being browsed by someone. Whereas, using the DD: notation allows an fopen() for write to succeed when the PDSE is being browsed because the DISP=SHR coded on the DD name in the JCL is adopted by the fopen().
So, I suppose the real question is - how can my program rename a PDSE member in a way that will succeed when the PDSE is also being browsed by someone?
The technique required to rename a dataset is different than the technique to rename a member inside a PDS/PDSE...I'd wager that the system rename() function you're calling is just getting this wrong. In z/OS, there are lots of combinations functions like "rename()" have to handle, and it's not unusual to find some that don't work as you expect.
Certainly it's worth a call to IBM Support to see if there's something else going on here...what you're trying to do seems like it should work, so I think there's something to be said for treating it like a bug or documentation error.
Beyond that, as you suggest, you can either use the form of rename that works, or you can replace the system's rename function with something that actually works properly.
One simple way would be to create the rename() as you show it:
rename("//'MYUSER.CONFIG(COPY)'","//'MYUSER.CONFIG(MAIN)'");
You can get the DSN for a DDNAME using the fldata() function, so it's not hard to create a rename like this on the fly given an open file handle. Beware that the form of rename may allocate the file you specify with DISP=OLD, and hence cause problems if some other task has the file allocated. Also, if this is supposed to be commercial quality code, as a customer, my eyebrows would go up if I found out you needed to launch some external program because you couldn't figure out how to rename a PDS/PDSE member - but that might just be me.
The other alternative is to write your own "rename()" function...unfortunately, it most likely would need to be assembler language if you want it to be efficient. As others suggest, you might spawn off a shell, REXX or TSO command, but of course, that means creating a new process, etc etc etc just to rename the PDS/PDSE member. Keep in mind also that some of these approaches might also have issues with trying to allocate the input file with DISP=OLD.
If that's too slow for your needs, the way to do what you want is to call a small assembler routine that invokes the system STOW service against your DDNAME to do your rename. The flow would be something like this:
You'd create a 16-byte area containing the old and new member names. They're 8 characters each and blank padded.
You'd need the address of an open DCB that describes the file you're looking at. You can get the DCB address from the FILE structure, I believe - or you could just open a second DCB to the DDNAME you have allocated.
You'd call the system STOW service with the parameters that tell it to rename a PDS/PDSE member:
STOW dcb,area_from_step1,C
In the STOW macro above, the "directory option" of "C" tells STOW that you want to rename an existing member. The area_from_step1 has the current and new member names - the system searches the directory for the current name and rewrites it with the new member name in place.
To be honest, what I describe above is exactly what the system runtime should be doing, but if it's not and IBM doesn't want to fix it, then you might prefer to do this sort of thing "by hand".
Not sure if this will work, but since you have the dataset already allocated, perhaps you could "call" (for some value of call) IEHPROGM from your program, constucting the proper SYSIN before making the call?
Here's a link to the IBM example for IEHPROGM (mind any break):
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.idau100/u1354.htm
--Scott

INode File System, what is the extra space in a data block used for?

So, I am currently learning about the INode file system and am asked to write a simple file system using Inodes.
So far, I understand that there is an INode table that has a mapping from INode-> Data blocks through direct/indirect pointers.
Let's assume data gets written into a file, the data is stored into two blocks. Let's say each block is 512bytes, and the file takes one full block, and only 200 bytes of the second block. What happens with the rest of the space in that data block? Is it reserved for that file only or do other files use this block?
Depending on the file system, usually and most likely this area is now lost. I think the Reiser File System actually reclaimed this area, but I could be wrong.
Creating your own File System can be a challenging experience, but also an enjoyable experience. I have created a few myself and worked on another. If you are creating your own file system, you can have it do whatever you wish.
Look at the bottom of this page for a few that I am working on/with. The LeanFS in particular, uses Inodes as well. The SFS is a very simple file system. Each is well documented so that you can research and decide what you would like to do.

Automatically running a script to read particular information from a .txt file ? (Perl Script, or suggest)

My scenario: A text file(s) will keep coming into say a folder, I need to detect the new text file, and read particular information from it, say format being (word : info, OR word and under it a column of info, etc.). And, this process needs to keep going on always.
Problem: How should I go about doing this, I guess use perl scipt, but where to go from there ?, I am getting ideas, and also help on the internet, but I thought asking it here might make my thoughts clearer.
Kindly help, please suggest a path to do this.
Regards,
Chirayu
First thing: you want a daemon process, so you may want to have a look at Proc::Daemon.
Second thing, you need to read & parse your file. Parsing it, depends on its format, and your question is not really clear about it.
Finally, you may want to consider moving a newly detected file (or renaming it) while processing it, end then (possibly) deleting it after having processed. This depends on the requirements that you have. Alternatively, you may want to move the newly detected file into an archive directory after having processed them.
One approach might be to have a perl process that regularly (say every 5 seconds, every 5 minutes or every 5 hours, your call really) scans said directory and as soon as any new text file appears, spawn a child process that process it.
The child process might be another perl script which gets the name of the text file as it's argument and which reads the file, detects the word you mention and then extracts the information you are interested in (and then does whatever you consider necessary with that information).
Things to look out for is what to do with the text files once they are processed. Are they supposed to stay around? Then you need to keep track of which of them you have processed, so they do not get processed again in the case your master process (the one that scans the directory and spawn perl children) has to be restarted (due to either a crash or a deliberate restart).
If the text files are supposed to disappear once they are processed, then I assume it could be a good idea to either let the children remove them after completion or to let the master process remove them provided the master process always waits for the children to complete before it continues running. The drawback with a master process waiting for children to complete is that children then cannot be run in parallell but has to be run in strict sequence (not necessary a drawback depending on your situation).
(If you have a master process always waiting for the child process to run, you can actually skip having child processes altogether and create a subroutine in the master program which reads and processes the text file).
High level description but hope it helps.
What is the operating system you are using?
On Windows, you can use Win32::ChangeNotify and on Linux, you can use Linux::Inotify2 to be notified of changes to the contents of a directory.
Your script can simply wait to be notified and take action when notified instead of polling the contents of the directory which will either waste resources or potentially miss some changes.

A rotating log file in perl

I have implemented a log file that will be storing the cpu and memory state of a process after every minute.I have limited the maximum size of the file to 3MB (thats enough for my purpose).
The script will be called by a cron job after every minute and the script will log the details for that minute and will rename the file as "Log_.log".
When the size reaches "3MB - 100 bytes" I reset the file pointer to point to the begining and will overwrite the first entry in the log file and will now rename the file as "Log_<0+some offset>.log".
As I am renaming the file after every minute to update the file pointer position, is it a good/efficient way ?
I do not want to maintain more than one log file for this purpose.
Another option for me is to maintain the file pointer position in a file ,but ....another file !! not interested in maintaining one if this option is good :)
Thanks in Advance.
Are you an engineer? This is a nice example of some simple task, solved by a perfectly working but overly complex solution.
Unless the content you put in takes exactly as many bytes as the content you take out, writing "in" a file will actually cause the whole following part after your writing position to be rewritten to disk. Append is much cheaper.
Renaming the file to store the pointer works - but it's not very elegant, and makes stuff more complex (for one, your process needs write rights to the directory in which the file resides - else just write access to two files is sufficient)
Unless disk space is an issue (and really, it rarely is), your approach is less efficient than say, append everything to a file, and rotate the file when it reaches its maximum size. This way you always have the last 3MB of logs available, and maximum 3MB more in your current file. It will make parsing the file a lot easier too, instead of recalculating the entire pointer position thing.
Update to answer your comment:
Renaming a file every minute (or even every second) shouldn't slow down your system significantly, don't worry about that.
Our concerns are mainly with "why you think you need to rename the file". It's not better technically, it's not better from a logical point of view, it makes a lot of other (future) tasks harder. You could store the file pointer in a seperate file, or at the end of your file, and there are better^H^H^H^H^H^H simpler solutions that don't require the file pointer at all.
I'm confused why you would rename your file. What does this accomplish?
Are the log entries fixed size? Or variable size?
If the entries are fixed size, then there is no trouble in re-writing the existing file from the start: you won't ever have incomplete entries in your file, and if you are writing a counter or timestamps to the file, it should be clear where the 'cursor' is located.
If the entries are variable size, then you should probably not begin re-writing the file from the beginning without somehow making it clear where the 'cursor' is located in the file, and write code that is resilient to reading truncated log entries.
Can you re-use existing tools such as RRDtool?

How to detect changing directory size in Perl

I am trying to find a way of monitoring directories in Perl, in particular the size of a directory, and upon detecting a change in directory size, perform a particular action.
The issue I have is with large files that require a noticeable amount of time to copy into this directory, i.e. > 100MB. What happens (in Windows, not Unix) is the system reserves enough disk space for the entire file, even though the file is still copying in progress. This causes problems for me, because my script will try to perform an action on this file that has not finished copying over. I can easily detect directory size changes in Unix via 'du', but 'du' in Windows does not behave the same way.
Are there any accurate methods of detecting directory size changes in Perl?
Edit: Some points to clarify:
- My Perl script is only monitoring a particular directory, and upon detecting a new file or a new directory, perform an action on this new file or directory. It is not copying any files; users on the network will be copying files into the directory I am monitoring.
- The problem occurs when a new file or directory appears (copied, not moved) that is significantly large (> 100MB, but usually a couple GB) and my program fires before this copy completes
- In Unix I can easily 'du' to see that the file/directory in question is growing in size, and take the appropriate action
- In Windows the size is static, so I cannot detect this change
- opendir/readdir/closedir is not feasible, as some of the directories that appear may contain thousands of files, and I want to avoid the overhead of
Ideally I would like my program to be triggered on change, but I am not sure how to do this. As of right now it busy waits until it detects a change. The change in file/directory size is not in my control.
You seem to be working around the underlying issue rather than addressing it -- your program is not properly sending a notification when it is finished copying a file. Why not do that instead of using OS-specific mechanisms to try to indirectly determine when the operation is complete?
You can use Linux::Inotify2 or Win32::ChangeNotify to detect directory/file changes.
EDIT: File::ChangeNotify seems a better option (cross-platform & used by Catalyst)
As I understand it, you are polling a directory with thousands of files. When you see a new file, there is an action that is taken on the file. This causes problems if the file is in use or still being copied, correct?
There are potentially several solutions:
1) Use flock to detect if the file is still in use by another process (test if it works properly on your OS, file system, and Perl version).
2) Use a LockFile call on Windows. If it fails, the OS or another process is using that file.
3) Change the poll interval to a non busy time on the server and take the directory off line while your process completes.
Evaluating the size of a directory is something all but the most inexperienced Perl programmers should be able to do. You can write your own portable version of du in 15 lines of code if you know about:
Either glob or opendir / readdir / closedir to iterate through the files in a directory
The filetest operators (-f file, -d file, etc.) to distinguish between regular files and directory names
The stat function or file size operator -s file to obtain the size of a file
There is a nice module called File::Monitor, it will detect new files, deleted files, changes in size and any other attribute that can be done with stat. It will then go and out put the files for you.
http://metacpan.org/pod/File::Monitor
You set up a baseline scan, then set up a call back for each item you are looking for, so new changes you can see via
$monitor->watch( {
name => 'somedir',
recurse => 1,
callback => {
files_created => sub {
my ($name, $event, $change) = #_;
# Do stuff
}
}
} );
If you need to go deeper than one level just do it to whatever level you need. After this is done and it finds new files you can trigger you application to do what you want on the files.