POSIX sh: Best solution to create a unique temporary directory - sh

Currently, the only POSIX compliant way of creating a unique directory (that I know) is by creating a unique file using the mkstemp() function exposed by m4 and then replacing this file with a directory:
tmpdir="$(printf "mkstemp(tmp.)" | m4)"
unlink "$tmpdir"
mkdir "$tmpdir"
This seems rather hacky though, and I also don't know how safe/secure it is.
Is there better/more direct POSIX compliant way to create a unique temporary directory in shellscript, or is this as good as it gets?
The mktemp command is out of the question because it is not defined in POSIX.

I'd expect using unlink/mkdir to be statistically safe as the window of opportunity for another process to create the directory is likely to be small. But a simple fix is just to retry on failure:
while
tmpdir="$(printf "mkstemp(tmp.)" | m4)"
unlink "$tmpdir"
! mkdir "$tmpdir"
do : ; done
Similarly, we could simply attempt to create a directory directly without creating a file first. Directory creation is atomic so there is no race condition. We do have to pick a name that doesn't exist but, as above, if we fail we can just try again.
For example, using a simple random number generator:
mkdtemp.sh
#!/bin/sh
# initial entropy, the more we can get the better
random=$(( $(date +%s) + $$ ))
while
# C standard rand(), without truncation
# cf. https://en.wikipedia.org/wiki/Linear_congruential_generator
random=$(( (1103515245*random + 12345) % 2147483648 ))
# optionally, shorten name a bit
tmpdir=$( printf "tmp.%x" $random )
# loop until new directory is created
! mkdir $tmpdir 2>&-
do : ; done
printf %s "$tmpdir"
Notes:
%s (seconds since epoch) is not a POSIX-standard format option to date; you could use something like %S%M%H%j instead
POSIX says "Only signed long integer arithmetic is required" which I believe means at least 2^31

Related

What order does find(1) list files in?

On extfs, if there are only file-creations and no -deletions in a directory, I expect that find . -type f would list the files either in their chronological order of creation (or mtime), or if not, at least in their reverse chronological order... depending on how a directory's contents are traversed.
But that isn't the behavior I'm seeing.
The following code, eg, creates a fresh set of directories and files:
#!/bin/bash -u
for i in a/ a/{1,2,3,4,5} b/ b/{1,2,3,4,5}; do
if echo "$i" | egrep -q "/$"; then
echo "Creating dir $i"
mkdir -p "$i"
else
echo "Creating file $i"
touch "$i"
fi
sleep 0.500
done
Output of the above snippet:
Creating dir a/
Creating file a/1
Creating file a/2
Creating file a/3
Creating file a/4
Creating file a/5
Creating dir b/
Creating file b/1
Creating file b/2
Creating file b/3
Creating file b/4
Creating file b/5
However, find lists files in somewhat random order. For example, a/2 doesn't follows a/1, and b/2 doesn't follow b/1:
$ find . -type f
./a/1
./a/3
./a/4
./a/2 <----
./a/5
./b/1
./b/3
./b/4
./b/2 <----
./b/5
Any idea why this should happen?
My main problem is: I have a very large volume storing 100s of 1000s of files. I need to traverse these files and directories in the order of their creation/modification (mtime) and pipe each file to another process for further processing. But I don't necessarily want to first create a temporary list of this large set of files and then sort it based on mtime before piping it to my process.
find lists objects in the order that they are reported by the underlying filesystem implementation. You can tell ls to show you this "raw" order by passing it the -f option.
The order could be anything at all -- alphabetical, by mtime, by atime, by length of name, by permissions, or something completely different. The ordering can even vary from one listing to the next.
It's common for filesystems to report in an order that reflects the filesystem's strategy for allocating directory slots to files. If this is some sort of hash-based strategy based on filename then the order can appear nonsensical. This is what happens with widely-used Linux and BSD filesystem implementations. Since you mention extfs this is probably what causes the ordering you're seeing.
So, if you need the output from find to be ordered in a particular way then you'll have to create that order yourself. Maybe based on something like:
find . -type f -exec ls -ltr --time-style=+%s {} \; | sort -n -k6

gsutil cp: concurrent execution leads to local file corruption

I have a Perl script which calls 'gsutil cp' to copy a selected from from GCS to a local folder:
$cmd = "[bin-path]/gsutil cp -n gs://[gcs-file-path] [local-folder]";
$output = `$cmd 2>&1`;
The script is called via HTTP and hence can be initiated multiple times (e.g. by double-clicking on a link). When this happens, the local file can end up being exactly double the correct size, and hence obviously corrupt. Three things appear odd:
gsutil seems not to be locking the local file while it is writing to
it, allowing another thread (in this case another instance of gsutil)
to write to the same file.
The '-n' seems to have no effect. I would have expected it to prevent
the second instance of gsutil from attempting the copy action.
The MD5 signature check is failing: normally gsutil deletes the
target file if there is a signature mismatch, but this is clearly
not always happening.
The files in question are larger than 2MB (typically around 5MB) so there may be some interaction with the automated resume feature. The Perl script only calls gsutil if the local file does not already exist, but this doesn't catch a double-click (because of the time lag for the GCS transfer authentication).
gsutil version: 3.42 on FreeBSD 8.2
Anyone experiencing a similar problem? Anyone with any insights?
Edward Leigh
1) You're right, I don't see a lock in the source.
2) This can be caused by a race condition - Process 1 checks, sees the file is not there. Process 2 checks, sees the file is not there. Process 1 begins upload. Process 2 begins upload. The docs say this is a HEAD operation before the actual upload process -- that's not atomic with the actual upload.
3) No input on this.
You can fix the issue by having your script maintain an atomic lock of some sort on the file prior to initiating the transfer - i.e. your check would be something along the lines of:
use Lock::File qw(lockfile);
if (my $lock = lockfile("$localfile.lock", { blocking => 0 } )) {
... perform transfer ...
undef $lock;
}
else {
die "Unable to retrieve $localfile, file is locked";
}
1) gsutil doesn't currently do file locking.
2) -n does not protect against other instances of gsutil run concurrently with an overlapping destination.
3) Hash digests are calculated on the bytes as they are being downloaded as a performance optimization. This avoids a long-running computation once the download completes. If the hash validation succeeds, you're guaranteed that the bytes were written successfully at one point. But if something (even another instance of gsutil) modifies the contents in-place while the process is running, the digesters will not detect this.
Thanks to Oesor and Travis for answering all points between them. As an addendum to Oesor's suggested solution, I offer this alternative for systems lacking Lock::File:
use Fcntl ':flock'; # import LOCK_* constants
# if lock file exists ...
if (-e($lockFile))
{
# abort if lock file still locked (or sleep and re-check)
abort() if !unlink($lockFile);
# otherwise delete local file and download again
unlink($filePath);
}
# if file has not been downloaded already ...
if (!-e($filePath))
{
$cmd = "[bin-path]/gsutil cp -n gs://[gcs-file-path] [local-dir]";
abort() if !open(LOCKFILE, ">$lockFile");
flock(LOCKFILE, LOCK_EX);
my $output = `$cmd 2>&1`;
flock(LOCKFILE, LOCK_UN);
unlink($lockFile);
}

Utility/Tool to get hash value of a data block in ext3

I have been searching for a utility/tool that can provide the md5sum(or any unique checksum) of a data block inside ext3 inode structure.
The requirement is to verify whether certain data blocks get zeroed, after a particular operation.
I am new to file systems and do not know if any existing tool can do the job, or I need to write this test utility myself.
Thanks...
A colleague provided a very elegant solution. Here is the script.
It needs the name of file as a parameter, and assumes the file system blocksize to be 4K
A further extension of this idea:
If you know the data blocks associated with the file (stat ), you can use 'skip' option of 'dd' command and build small files, each of 1 block size length. Further, you can get the md5sum of these blocks. So, this way you can get md5sum directly from the block device. Not something you would want to do everyday, but a nice analytical trick.
==================================================================================
#!/bin/bash
absname=$1
testdir="/root/test/"
mdfile="md5"
statfile="stat"
blksize=4096
fname=$(basename $absname)
fsize=$( ls -al $absname | cut -d " " -f 5 )
numblk=$(( fsize/blksize ))
x=1
#Create the test directory, if it does not exist already
if [[ ! -d $testdir ]];
then
`mkdir -p $testdir`
fi
#Create multiple files from the test file, each 1 block sized
while [[ $x -le $numblk ]]
do
(( s=x-1 ))
`dd if=$absname of=$testdir$fname$x bs=4096 count=1 skip=$s`
`md5sum $testdir$fname$x >> $testdir$mdfile`
(( x=x+1 ))
done

Why isn't this command taking the diff of two directories?

I am asked to diff two directories using Perl but I think something is wrong with my command,
$diff = system("sudo diff -r '/Volumes/$vol1' '/Volumes/$vol2\\ 1/' >> $diff.txt");
It doesn't display and output. Can someone help me with this? Thanks!
It seems that you want to store all differences in a string.
If this is the case, the command in the question is not going to work for a few reasons:
It's hard to tell whether it's intended or not, but the $diff variable is being used to set the filename storing the differences. Perhaps this should be diff.txt, not $diff.txt
The result of the diff command is saved in $diff.txt. It doesn't display anything in STDOUT. This can be remedied by omitting the >> $diff.txt part. If it also needs to be stored in file, consider the tee command:
sudo diff -r dir1/ dir2/ | tee diff.txt
When a system call is assigned to a variable, it will return 0 upon success. To quote the documentation:
The return value is the exit status of the program as returned by the wait call.
This means that $diff won't store the differences, but the command exit status. A more sensible approach would be to use backticks. Doing this will allow $diff to store whatever is output to STDOUT by the command:
my $diff = `sudo diff -r dir1/ dir2/ | tee diff.txt`; # Not $diff.txt
Is it a must to use the sudo command? Avoid using it if even remotely possible:
my $diff = `diff -r dir1/ dir2/ | tee diff.txt`; # Not $diff.txt
A final recommendation
Let a good CPAN module take care of this task, as backtick calls can only go so far. Some have already been suggested here; it may be well worth a look.
Is sudo diff being prompted for a password?
If possible, take out the sudo from the invocation of diff, and run your script with sudo.
"It doesn't display and output." -- this is becuase you are saving the differences to a file, and then (presumably) not doing anything with that resulting file.
However, I expect "diff two directories using Perl" does not mean "use system() to do it in the shell and then capture the results". Have you considered doing this in the language itself? For example, see Text::Diff. For more nuanced control over what constitutes a "difference", you can simply read in each file and craft your own algorithm to perform the comparisons and compile the similarities and differences.
You might want to check out Test::Differences for a more flexible diff implementation.

How can I remove a file based on its creation date time in Perl?

My webapp is hosted on a unix server using MySQL as database.
I wrote a Perl script to run backup of my database. The Perl script is inside the cgi-bin folde and it is working. I only need to set the cronjob and run the Perl script once a day.
The backups are stored in a folder named db_backups,. However, I also want to add a command inside my Perl script to remove any files inside the folder db_backups that are older than say 10 days ago.
I have searched high and low for unix commands and cannot find anything that matches what I needed.
if (-M $file > 10) { unlink $file }
or, coupled with File::Find::Rule
my $ten_days_ago = time() - 10 * 86400;
my #to_delete = File::Find::Rule->file()
->mtime("<=$ten_days_ago")
->in("/path/to/db_backup");
unlink #to_delete;
On Unix you can't, because the file's creation date is not stored in the filesystem.
You may want to check out stat, and -M (modification time)/-C (inode change time)/-A (access time) if you want a simple expression with relative timestamps (how long ago).
i have searched high and low for unix commands
and cannot find anything that matches what i needed.
Check out find(1) and xargs(1). Warning: these commands may change your life at the shell prompt.
$ find /path/to/backup -type f -mtime +10 -print0 | xargs -0 echo rm -f
When you're confident that will Do What You Want (tm), remove the echo. It says, roughly, starting in /path/to/backup, descend looking for plain files whose mtime is greater than 10 days, and print their names to xargs, which will pass those names to rm in batches.
(print0 and its complement -0 are GNU extensions -- you mentioned you were on Linux -- which let you deal with whitespace in filenames safely.)
You should be able to do it without resorting to Unix commands. Loop through the files in your directory, use stat on each file to get its last modify time for a file, then use unlink on the file to delete it if it's older than what you want.