Utility/Tool to get hash value of a data block in ext3 - hash

I have been searching for a utility/tool that can provide the md5sum(or any unique checksum) of a data block inside ext3 inode structure.
The requirement is to verify whether certain data blocks get zeroed, after a particular operation.
I am new to file systems and do not know if any existing tool can do the job, or I need to write this test utility myself.
Thanks...

A colleague provided a very elegant solution. Here is the script.
It needs the name of file as a parameter, and assumes the file system blocksize to be 4K
A further extension of this idea:
If you know the data blocks associated with the file (stat ), you can use 'skip' option of 'dd' command and build small files, each of 1 block size length. Further, you can get the md5sum of these blocks. So, this way you can get md5sum directly from the block device. Not something you would want to do everyday, but a nice analytical trick.
==================================================================================
#!/bin/bash
absname=$1
testdir="/root/test/"
mdfile="md5"
statfile="stat"
blksize=4096
fname=$(basename $absname)
fsize=$( ls -al $absname | cut -d " " -f 5 )
numblk=$(( fsize/blksize ))
x=1
#Create the test directory, if it does not exist already
if [[ ! -d $testdir ]];
then
`mkdir -p $testdir`
fi
#Create multiple files from the test file, each 1 block sized
while [[ $x -le $numblk ]]
do
(( s=x-1 ))
`dd if=$absname of=$testdir$fname$x bs=4096 count=1 skip=$s`
`md5sum $testdir$fname$x >> $testdir$mdfile`
(( x=x+1 ))
done

Related

POSIX sh: Best solution to create a unique temporary directory

Currently, the only POSIX compliant way of creating a unique directory (that I know) is by creating a unique file using the mkstemp() function exposed by m4 and then replacing this file with a directory:
tmpdir="$(printf "mkstemp(tmp.)" | m4)"
unlink "$tmpdir"
mkdir "$tmpdir"
This seems rather hacky though, and I also don't know how safe/secure it is.
Is there better/more direct POSIX compliant way to create a unique temporary directory in shellscript, or is this as good as it gets?
The mktemp command is out of the question because it is not defined in POSIX.
I'd expect using unlink/mkdir to be statistically safe as the window of opportunity for another process to create the directory is likely to be small. But a simple fix is just to retry on failure:
while
tmpdir="$(printf "mkstemp(tmp.)" | m4)"
unlink "$tmpdir"
! mkdir "$tmpdir"
do : ; done
Similarly, we could simply attempt to create a directory directly without creating a file first. Directory creation is atomic so there is no race condition. We do have to pick a name that doesn't exist but, as above, if we fail we can just try again.
For example, using a simple random number generator:
mkdtemp.sh
#!/bin/sh
# initial entropy, the more we can get the better
random=$(( $(date +%s) + $$ ))
while
# C standard rand(), without truncation
# cf. https://en.wikipedia.org/wiki/Linear_congruential_generator
random=$(( (1103515245*random + 12345) % 2147483648 ))
# optionally, shorten name a bit
tmpdir=$( printf "tmp.%x" $random )
# loop until new directory is created
! mkdir $tmpdir 2>&-
do : ; done
printf %s "$tmpdir"
Notes:
%s (seconds since epoch) is not a POSIX-standard format option to date; you could use something like %S%M%H%j instead
POSIX says "Only signed long integer arithmetic is required" which I believe means at least 2^31

ExifTool - check if all files have the same amount of channels

I would need your help with ExifTool. I am trying to check if all of my .wav files have the same amount of channels through meta-data. How should I proceed? Should I print out the tags first and then write a script to check if they are all the same or is there a better way?
Thank you for your help.
You'd have to do it externally (via a file or your shell controls or something.
e.g., in bash:
if [ "$(exiftool -NUMCHANNELS a.wav)" == "$(exiftool -NUMCHANNELS b.wav)" ]
then
echo match
fi
To see if many files all match, you could do something like
exiftool -q -NUMCHANNELS *.wav | uniq | wc -l
And verify the output is 1

What order does find(1) list files in?

On extfs, if there are only file-creations and no -deletions in a directory, I expect that find . -type f would list the files either in their chronological order of creation (or mtime), or if not, at least in their reverse chronological order... depending on how a directory's contents are traversed.
But that isn't the behavior I'm seeing.
The following code, eg, creates a fresh set of directories and files:
#!/bin/bash -u
for i in a/ a/{1,2,3,4,5} b/ b/{1,2,3,4,5}; do
if echo "$i" | egrep -q "/$"; then
echo "Creating dir $i"
mkdir -p "$i"
else
echo "Creating file $i"
touch "$i"
fi
sleep 0.500
done
Output of the above snippet:
Creating dir a/
Creating file a/1
Creating file a/2
Creating file a/3
Creating file a/4
Creating file a/5
Creating dir b/
Creating file b/1
Creating file b/2
Creating file b/3
Creating file b/4
Creating file b/5
However, find lists files in somewhat random order. For example, a/2 doesn't follows a/1, and b/2 doesn't follow b/1:
$ find . -type f
./a/1
./a/3
./a/4
./a/2 <----
./a/5
./b/1
./b/3
./b/4
./b/2 <----
./b/5
Any idea why this should happen?
My main problem is: I have a very large volume storing 100s of 1000s of files. I need to traverse these files and directories in the order of their creation/modification (mtime) and pipe each file to another process for further processing. But I don't necessarily want to first create a temporary list of this large set of files and then sort it based on mtime before piping it to my process.
find lists objects in the order that they are reported by the underlying filesystem implementation. You can tell ls to show you this "raw" order by passing it the -f option.
The order could be anything at all -- alphabetical, by mtime, by atime, by length of name, by permissions, or something completely different. The ordering can even vary from one listing to the next.
It's common for filesystems to report in an order that reflects the filesystem's strategy for allocating directory slots to files. If this is some sort of hash-based strategy based on filename then the order can appear nonsensical. This is what happens with widely-used Linux and BSD filesystem implementations. Since you mention extfs this is probably what causes the ordering you're seeing.
So, if you need the output from find to be ordered in a particular way then you'll have to create that order yourself. Maybe based on something like:
find . -type f -exec ls -ltr --time-style=+%s {} \; | sort -n -k6

OpenStack Swift client for the fastest synchronisation of huge amount of files?

I have folder with a lot of files (~50k, 3Gb). I need to sync this folder recursively to my container in OpenStak Swift-like storage.
I have tried to use cli duck (cyberduck), but it crashes on huge list of files in prepare process.
I am trying to use supload utility, but it is so slow :(
May be somebody recommends me the best approach (some cli better) for this situation?
You should use official python-swiftclient package and simply :
# load your openstack credentials
source openrc.sh
cd path_to_directory_you_want_to_sync
# upload all the files recursively keeping good paths
swift upload --changed your_container *
Swift does not support rsync-like synchronization, but I use this little script to delete in the container the files you deleted locally and to upload new files without asking swift to compare each files :
#!/bin/bash
cd $2
diff <(find * -type f -print | sort) <(swift list $1 | sort) | while read x; do
if [[ $x == \>* ]]; then
echo "Need to delete ${x:2}"
swift delete $1 "${x:2}"
elif [[ $x == \<* ]]; then
echo "Need to upload ${x:2}"
swift upload $1 "${x:2}"
fi
done
cd -
Use with :
./swift_sync.sh your_container directory_to_sync
Try http://rclone.org/docs/
It has "sync" operation and "Bandwidth limit".
If this not solve your speed problem, the root cause is not on the clients.

File movement issue on NFS file system on Unix box

Currently there are 4.5 million files in a single directory on an NFS file system. As a result any read or write operation on that directory is causing a huge delay.
In order to over come this problem, all the files in that directory will be moved onto different directories based on the year of its creation.
Apparently, the find command that we are using with the -ctime option is not working because of the huge file volume.
We tried listing the files based on the year of creation and then feed the list to a script that will move them in a for loop. But even this failed as ls -lrt went for a hang.
Is there any other way to tackle this problem?
Please help.
Script contents:
1) filelist.sh
ls -tlr|awk '{print $8,$9,$6,$7}'|grep ^2011|awk '{print $2,$1,$3,$4}' 1>>inboundstore_$1.txt 2>>Error_$1.log
ls -tlr|awk '{print $8,$9,$6,$7}'|grep ^2011|wc -l 1>>count_$1.log
2) filemove.sh
INPUT_FILE=$1 ##text file which has the list of files from the previous script
FINAL_LOCATION=$2 ##destination directory
if [ -r $INPUT_FILE ]
then
for file in `cat $INPUT_FILE`
do
echo "TIME OF FILE COPY OF [$file] IS : `date`" >> xyz/IBSCopyTime.log
mv $file $FINAL_LOCATION
done
else
echo "$INPUT_FILE does not exist"
fi
Use the readdir iterator.