How does a operating system create a unique file handle? - operating-system

While creating a file, you provide the name of a file to operating system and
it creates file and returns you the handle.
I am curious to know how it is created.
Does the operating system do some kind of hashing on file name to create a unique file handle
or does it increase count of created files and return it as a file handle?

No, it's an index into an array inside the OS kernel, unique to that one process. File handles are usually just small integers.

File handles are only unique within one process at a given time. On Linux I think that it is a simple counter that is incremented. I think that Windows returns the memory address to an information block about the file (the information block's structure is internal to the operating system, so it's not possible to deal with it directly).

The file handle (file descriptor) is just a number which is unique for that particular process. For instance, standard input, output and error has fds (0, 1, 2). In linux you can check the process' file descriptor in /proc/PID/fd where PID is process id.
Here is an example:
$ pidof executable
4711
$ ls -l /proc/4711/fd
...
$ # Use 'lsof' to show file descriptor opened by this process:
$ lsof -p 4711
...

Related

Core dump: how to determine version of crashed application

I need to strictly bind every core file generated by system to certain bin version of crashed application. I can specify core-name pattern in sysctl.conf:kernel.core_pattern, but there is no way to put bin version here.
How can I put the version of crashed program into core file (revision number) or any other way to determine version of crashed bin?
I'm using qmake VERSION variable in .pro file, which contains revision number from SVN. Its available by QCoreApplication::applicationVersion(), in my every bin by flag --version.
Assuming your app can get far enough to print out its version number without a core dump, you can write a small program (python would probably be easiest) that is invoked by a core dump. The program would read stdin, dump it to a file, then rename the file based on the version number.
From man 5 core:
Piping core dumps to a program
Since kernel 2.6.19, Linux supports an alternate syntax for the
/proc/sys/kernel/core_pattern file. If the first character of this
file is a pipe symbol (|), then the remainder of the line is inter‐
preted as a program to be executed. Instead of being written to a disk
file, the core dump is given as standard input to the program. Note
the following points:
* The program must be specified using an absolute pathname (or a path‐
name relative to the root directory, /), and must immediately follow
the '|' character.
* The process created to run the program runs as user and group root.
* Command-line arguments can be supplied to the program (since Linux
2.6.24), delimited by white space (up to a total line length of 128
bytes).
* The command-line arguments can include any of the % specifiers
listed above. For example, to pass the PID of the process that is
being dumped, specify %p in an argument.
If you call your script /usr/local/bin/dumper, then
echo "| /usr/local/bin/dumper %E" > /proc/sys/kernel/core_pattern
The dumper should copy stdin to a file, then try to run the program named on its command line to extract a version number and use that to rename the file.
Something like this might work (I haven't tried it, so use at extreme risk:)
#!/usr/bin/python
import sys,os,subprocess
from subprocess import check_output
CORE_FNAME="/tmp/core"
with open(CORE_FNAME,"f") as f:
while buf=sys.stdin.read(10000):
f.write(buf)
pname=sys.argv[1].replace('!','/')
out=subprocess.check_output([pname, "--version"])
version=out.split('\n')[0].split()[-1]
os.rename(CORE_FNAME, CORE_FNAME+version)
The really big risk of doing this is recursive core dumps that may crash your system. Be sure to use ulimit to only allow core dumps from processes that can print out their own versions without core dumping.
It would be a good idea to change the script to re-run the program to get the version info only if it is the program you are expecting.

Limit to number of files to cp in parallel

Im running the gsutil cp command in parallel (with the -m option) on a directory with 25 4gb json files (that i am also compressing with the -z option).
gsutil -m cp -z json -R dir_with_4g_chunks gs://my_bucket/
When I run it, it will print out to terminal that it is copying all but one of the files. By this I mean that it prints one of these lines per file:
Copying file://dir_with_4g_chunks/a_4g_chunk [Content-Type=application/octet-stream]...
Once the transfer for one of them is complete, it says that it'll be copying the last file.
The result of this is that there is one file that only starts to copy only when one of the others finishes copying, significantly slowing down the process
Is there a limit to the number of files I can upload with the -m option? Is this configurable in the boto config file?
I was not able to find the .boto file on my Mac (as per jterrace's answer above), instead I specified these values using the -o switch:
gsutil -m -o "Boto:parallel_thread_count=4" cp directory1/* gs://my-bucket/
This seemed to control the rate of transfer.
From the description of the -m option:
gsutil performs the specified operation using a combination of
multi-threading and multi-processing, using a number of threads and
processors determined by the parallel_thread_count and
parallel_process_count values set in the boto configuration file. You
might want to experiment with these value, as the best value can vary
based on a number of factors, including network speed, number of CPUs,
and available memory.
If you take a look at your .boto file, you should see this generated comment:
# 'parallel_process_count' and 'parallel_thread_count' specify the number
# of OS processes and Python threads, respectively, to use when executing
# operations in parallel. The default settings should work well as configured,
# however, to enhance performance for transfers involving large numbers of
# files, you may experiment with hand tuning these values to optimize
# performance for your particular system configuration.
# MacOS and Windows users should see
# https://github.com/GoogleCloudPlatform/gsutil/issues/77 before attempting
# to experiment with these values.
#parallel_process_count = 12
#parallel_thread_count = 10
I'm guessing that you're on Windows or Mac, because the default values for non-Linux machines is 24 threads and 1 process. This would result in copying 24 of your files first, then the last 1 file afterward. Try experimenting with increasing these values to transfer all 25 files at once.

perl - disk name on Linux

What module would you recommend to get a disk name on Linux? I've done some search on CPAN but all modules I've found are too old. In Bash I can use something like:
disk_name=$(df |grep -w '/'|awk '{print $1}'|cut -d/ -f3)
echo $disk_name
sda6
Please help me to understand how to do same in Perl.
Thanks.
The "proper" way to list mounted disks on Linux is through the getmntent() system call, which can be accessed from Perl using the Quota module:
use Quota;
Quota::setmntent();
while (my ($dev, $path, $type, $opts) = Quota::getmntent()) {
print "The root device is $dev.\n" if $path eq "/";
}
Quota::endmntent();
As a bonus, using the Quota module to list device mount points should be fairly portable to other Unixish systems, which parsing various system files or the output of df may not be. Unfortunately, this seemingly basic module is not included in the standard Perl distribution, so you have to get it from CPAN (or from your distro's package repository — for example, Debian / Ubuntu have the libquota-perl package).
Ps. Simply splitting the device name on / and taking the third element (as your cut command does) is not a safe way to turn, say, /dev/sdb1 into sdb1. Some issues with it are that:
Not all block devices have to live under /dev — it's really just a convention.
Even if the device file is under /dev, it might be in a subdirectory of it. For example, my root filesystem is on the device /dev/disk/by-uuid/627f8512-f037-4c6c-9892-6130090c0e0f.
Sometimes, the device name might not even be an actual filesystem path: for example, virtual or in-memory filesystems such as tmpfs are often mounted with the device name none, but it's possible to use any device name with them.
If you do want to get rid of the /dev/ part, I'd suggest a conservative approach using a regexp, for example like this:
if ($dev =~ m(^/dev/(.*)$)s) {
print "The directory $path is mounted from device $1 under /dev.\n";
} else {
print "The directory $path is not mounted from a device under /dev.\n"
}
What you're describing is not the disk name but the device name of the block device representing the partition mounted at root (/). On a regular computer it would normally be something like /dev/sdXN or /dev/hdXN with X being the disk number (primary hard drive is usually A, secondary is B, etc.) and N is the partition number on that device.
Provided you're always running on a unix system, you can try reading /etc/mtab file, which lists all mounted partitions, or the special file /proc/mounts, which pretty much does the same. You'll need to parse it afterwards to find the one you need and get the device name from it.
Alternatively, you can just run df as a process and get its input into perl, something like
open(DF, "df|");
#mount_points = <DF>;
close(DF);
and then iterate over the data to find what you need. I'm not aware of any modules of the top of my head that would do the job for you, but the code seems pretty simple to me anyway.
P.S. Note that Max OS X, while being a derivative of BSD, doesn't have the same file structure and therefore this approach wouldn't work. On Mac OS X, you can read file /etc/fstab.hd, which contains similar info but in a slightly different format.
One way to do just what you are doing in the question
df / | perl -ne 'm"^/\w+/(\w+)";print "$1\n" if defined $1;'
but using a CPAN library to do it is probably better.

Why does grep hang when run against the / directory?

My question is in two parts :
1) Why does grep hang when I grep all files under "/" ?
for example :
grep -r 'h' ./
(note : right before the hang/crash, I note that I see some "no such device or address" messages , regarding sockets....
Of course, I know that grep shouldn't run against a socket, but I would think that since sockets are just files in Unix, it should return a negative result, rather than crashing.
2) Now, my follow up question : In any case -- how can I grep the whole filesystem? Are there certain *NIX directories which we should leave out when doing this ? In particular, I'm looking for all recently written log files.
As #ninjalj said, if you don't use -D skip, grep will try to read all your device files, socket files, and FIFO files. In particular, on a Linux system (and many Unix systems), it will try to read /dev/zero, which appears to be infinitely long.
You'll be waiting for a while.
If you're looking for a system log, starting from /var/log is probably the best approach.
If you're looking for something that really could be anywhere in your file system, you can do something like this:
find / -xdev -type f -print0 | xargs -0 grep -H pattern
The -xdev argument to find tells it to stay within a single filesystem; this will avoid /proc and /dev (as well as any mounted filesystems). -type f limits the search to ordinary files. -print0 prints the file names separated by null characters rather than newlines; this avoid problems with files having spaces or other funny characters in their names.
xargs reads a list of file names (or anything else) on its standard input and invokes the specified command on everything in the list. The -0 option works with find's -print0.
The -H option to grep tells it to prefix each match with the file name. By default, grep does this only if there are two or more file names on its command line. Since xargs splits its arguments into batches, it's possible that the last batch will have just one file, which would give you inconsistent results.
Consider using find ... -name '*.log' to limit the search to files with names ending in .log (assuming your log files have such names), and/or using grep -I ... to skip binary files.
Note that all this depends on GNU-specific features. Some of these options might not be available on MacOS (which is based on BSD) or on other Unix systems. Consult your local documentation, and consider installing GNU findutils (for find and xargs) and/or GNU grep.
Before trying any of this, use df to see just how big your root filesystem is. Mine is currently 268 gigabytes; searching all of it would probably take several hours. A few minutes spent (a) restricting the files you search and (b) making sure the command is correct will be well worth the time you spend.
By default, grep tries to read every file. Use -D skip to skip device files, socket files and FIFO files.
If you keep seeing error messages, then grep is not hanging. Keep iotop open in a second window to see how hard your system is working to pull all the contents off its storage media into main memory, piece by piece. This operation should be slow, or you have a very barebones system.
Now, my follow up question : In any case -- how can I grep the whole filesystem? Are there certain *NIX directories which we should leave out when doing this ? In particular, Im looking for all recently written log files.
Grepping the whole FS is very rarely a good idea. Try grepping the directory where the log files should have been written; likely /var/log. Even better, if you know anything about the names of the files you're looking for (say, they have the extension .log), then do a find or locate and grep the files reported by those programs.

How can I extract DLL file from memory dump?

I have a memory dump (unmanaged process) .
How can I extract (using windbg) one of the dlls loaded into the process ? I mean actually saving the dll file into the disk
You can use the sos.dll inside windbg directory.
First, load the sos.dll in windbg:
.load clr10\sos.dll
Then use !sam OR !SaveAllModule to extract the modules on specific disk location:
!sam c:\notepad
To extract a DLL without using SOS, use the .writemem extension as follows:
discover the module start and end addresses using lmvm dllname
example output for ieframe:
start end module name
61370000 61fb8000 ieframe
calculate the length = end-start: ? 61fb8000 - 61370000
output: Evaluate expression: 12877823 = 00c48000
then save the DLL as follows:
.writemem C:\tmp\mydll.dll 61370000 L?00c48000
This is unlikely to give you the exact DLL as it was loaded from disk, fixing this up is non-trivial.
(Partly based on this article)
Yes, it's true. calc.exe will also pull up its multi user language interface information and attach it in memory, as will a lot of Windows programs like mspaint, photoviewer, etc.