How to compare the content of a tarball with a folder - diff

How can I compare a tar file (already compressed) of the original folder with the original folder?
First I created archive file using
tar -kzcvf directory_name.zip directory_name
Then I tried to compare using
tar -diff -vf directory_name.zip directory_name
But it didn't work.

--compare (-d) is more handy for that.
tar --compare --file=archive-file.tar
works if archive-file.tar is in the directory it was created. To compare archive-file.tar against a remote target (eg if you have moved archive-file.tar to /some/where/) use the -C parameter:
tar --compare --file=archive-file.tar -C /some/where/
If you want to see tar working, use -v without -v only errors (missing files/folders) are reported.
Tipp: This works with compressed tar.bz/ tar.gz archives, too.

It should be --diff
Try this (without the last directory_name):
tar --diff -vf directory_name.zip
The problem is that the --diff command only looks for differences on the existing files among the tar file and the folder. So, if a new file is added to the folder, the diff command does not report this.

The method of pix is way slow for large compressed tar files, because it extracts each file individually. I use the tar --diff method loking for files with different modification time and extract and diff only these. The files are extracted into a folder base.orig where base is either the top level folder of the tar file or teh given comparison folder. This results in diffs including the date of the original file.
Here is the script:
#!/bin/bash
set -o nounset
# Print usage
if [ "$#" -lt 1 ] ; then
echo 'Diff a tar (or compressed tar) file with a folder'
echo 'difftar-folder.sh <tarfile> [<folder>] [strip]'
echo default for folder is .
echo default for strip is 0.
echo 'strip must be 0 or 1.'
exit 1
fi
# Parse parameters
tarfile=$1
if [ "$#" -ge 2 ] ; then
folder=$2
else
folder=.
fi
if [ "$#" -ge 3 ] ; then
strip=$3
else
strip=0
fi
# Get path prefix if --strip is used
if [ "$strip" -gt 0 ] ; then
prefix=`tar -t -f $tarfile | head -1`
else
prefix=
fi
# Original folder
if [ "$strip" -gt 0 ] ; then
orig=${prefix%/}.orig
elif [ "$folder" = "." ] ; then
orig=${tarfile##*/}
orig=./${orig%%.tar*}.orig
elif [ "$folder" = "" ] ; then
orig=${tarfile##*/}
orig=${orig%%.tar*}.orig
else
orig=$folder.orig
fi
echo $orig
mkdir -p "$orig"
# Make sure tar uses english output (for Mod time differs)
export LC_ALL=C
# Search all files with a deviating modification time using tar --diff
tar --diff -a -f "$tarfile" --strip $strip --directory "$folder" | grep "Mod time differs" | while read -r file ; do
# Substitute ': Mod time differs' with nothing
file=${file/: Mod time differs/}
# Check if file exists
if [ -f "$folder/$file" ] ; then
# Extract original file
tar -x -a -f "$tarfile" --strip $strip --directory "$orig" "$prefix$file"
# Compute diff
diff -u "$orig/$file" "$folder/$file"
fi
done

To ignore differences in some or all of the metadata (user, time, permissions), you can pipe the result to awk:
tar --compare --file=archive-file.tar -C /some/where/ | awk '!/Mode/ && !/Uid/ && !/Gid/ && !/time/'
That should output only the true differences between the tar and the directory /some/where/

I recently needed a better compare than what "tar --diff" produced so I made this short script:
#!/bin/bash
tar tf "$1" | while read ; do
if [ "${REPLY%/}" = "$REPLY" ] ; then
tar xOf "$1" "$REPLY" | diff -u - "$REPLY"
fi
done

The easy way is to write:
tar df file This compares the file with the current working directory, and tell us about if any of the files has been removed.
tar df file -C path/folder This compares the file with the folder.

Related

How to remove some text in long filename from bunch of files in directory

Can't boot my Windows PC today and I am on 2nd OS Linux Mint. With my limited knowledge on Linux and shell scripts, I really don't have an idea how to do this.
I have a bunch of files in a directory generated from my system, need to remove the last 12 characters from the left of ".txt"
Sample filenames:
filename1--2c4wRK77Wk.txt
filename2-2ZUX3j6WLiQ.txt
filename3-8MJT42wEGqQ.txt
filename4-sQ5Q1-l3ozU.txt
filename5--Way7CDEyAI.txt
Desired result:
filename1.txt
filename2.txt
filename3.txt
filename4.txt
filename5.txt
Any help would be greatly appreciated.
Here is a programmatic way of doing this while still trying to account for pesky edge cases:
#!/bin/sh
set -e
find . -name "filename*" > /tmp/filenames.list
while read -r FILENAME; do
NEW_FILENAME="$(
echo "$FILENAME" | \
awk -F '.' '{$NF=""; gsub(/ /, "", $0); print}' | \
awk -F '/' '{print $NF}' | \
awk -F '-' '{print $1}'
)"
EXTENSION="$(echo "$FILENAME" | awk -F '.' '{print $NF}')"
if [[ "$EXTENSION" == "backup" ]]; then
continue
else
cp "$FILENAME" "${FILENAME}.backup"
fi
if [[ -z "$EXTENSION" ]]; then
mv "$FILENAME" "$NEW_FILENAME"
else
mv "$FILENAME" "${NEW_FILENAME}.${EXTENSION}"
fi
done < /tmp/filenames.list
Create a List of Files to Edit
First up create a list of files that you would like to edit (assuming that they all start with filename) and under the current working directory (.):
find . -name "filename*" > /tmp/filenames.list
If they don't start with filename fret not you could always use a find command like:
find . -type f > /tmp/filenames.list
Iterate over a list of files
To accomplish this we use a while read loop:
while read -r LINE; do
# perform action
done < file
If you had the ability to use bash you could always use a named pipe redirect:
while read -r LINE; do
# perform action
done < <(
find . -type f
)
Create a rename variable
Next, we create a variable NEW_FILENAME and using awk we strip off the file extension and any trailing spaces using:
awk -F '.' '{$NF=""; gsub(/ /, "", $0); print}'
We could just use the following though if you know for certain that there aren't multiple periods in the filename:
awk -F '.' '{print $1}'
The leading ./ is stripped off via
awk -F '/' '{print $NF}'
although this could have been easily done via basename
With the following command, we strip everything after the first -:
awk -F '-' '{print $1}'
Creating backups
Feel free to remove this if you deem unnecessary:
if [[ "$EXTENSION" == "backup" ]]; then
continue
else
cp "$FILENAME" "${FILENAME}.backup"
fi
One thing that we definitely don't want is to make backups of backups. The above logic accounts for this.
Renaming the files
One thing that we don't want to do is append a period to a filename that doesn't have an extension. This accounts for that.
if [[ -z "$EXTENSION" ]]; then
mv "$FILENAME" "$NEW_FILENAME"
else
mv "$FILENAME" "${NEW_FILENAME}.${EXTENSION}"
fi
Other things of note
Odds are that your Linux Mint installation has a bash shell so you could simplify some of these commands. For instance, you could use variable substitution: echo "$FILENAME" | awk -F '.' '{print $NF}' would become "${FILENAME##.*}"
[[ is not defined in POSIX sh so you will likely just need to replace [[ with [, but review this document first:
https://mywiki.wooledge.org/BashFAQ/031
From the pattern of filenames it looks like that the first token can be picked before "-" from filenames. Use following command to rename these files after changing directory to where files are located -
for srcFile in `ls -1`; do fileN=`echo $srcFile | cut -d"-" -f1`; targetFile="$fileN.txt"; mv $srcFile $targetFile; done
If above observation is wrong, following command can be used to remove exactly 12 characters before .txt (4 chars) -
for srcFile in `ls -1`; do fileN=`echo $srcFile | rev | cut -c17- | rev`; targetFile="$fileN.txt"; mv $srcFile $targetFile; done
In ls -1, a pattern can be added to filter files from current directory if that is required.

Talend multiple build jobs

We are using open source Talend studio and we have more then 50 jobs.
Each build generate zip file contains all it's artifacts ( .bat .sh context, jar files)
Is there a way to generate multiple build process from the studio or command line ( Talend open source tool )
In the "build job" window, there is a double arrow in the left,
Click on it, and you get the job tree, select all jobs or what you want, and you will get a single zip file containing all your jobs each one in a separate folder.
Not an ideal solution but you can use a small script to split the whole zip into separate job zips:
ZIP=test.zip # path to your all-in-one zip file
ROOT=$(basename $ZIP .zip)
DEST=./dest
rm -rf $DEST # be careful with this one!
mkdir -p $DEST
unzip $ZIP
find $ROOT -mindepth 1 -maxdepth 1 -type d ! -name lib|while read JOBPATH
do
JOB=$(basename $JOBPATH)
echo "job: $JOB"
DJOB="$DEST/$JOB"
mkdir -p "$DJOB"
cp -R "$JOBPATH" "$DJOB/$JOB"
cp $ROOT/jobInfo.properties $DJOB # here you should replace job=<proper job name> and jobId, but not sure you really need it
mkdir -p "$DJOB/lib"
RUNFILE="${JOBPATH}/${JOB}_run.sh"
LIBS=$(grep "^java" "$RUNFILE"|cut -d' ' -f 5)
IFS=':' read -ra ALIB <<< "$LIBS"
for LIB in "${ALIB[#]}"; do
if [ "$LIB" = "." -o "$LIB" = "\$ROOT_PATH" ]; then continue; fi
echo "$LIB"
done|grep "\$ROOT_PATH/../lib"|cut -b 19-|while read DEP
do
cp "$ROOT/lib/$DEP" "$DJOB/lib/"
done
(cd $DJOB ; zip -r -m ../$JOB.zip .)
rmdir $DJOB
done

Bourne Shell Script

I'm attempting to write a script in the Bourne shell that will do the following:
Read in a filename
If the file does not exist in the target directory, it will display a message to the user stating such
If the file exists in the target directory, it will be moved to a /trash folder
If the file exists in the target directory, but a file of the same name is in the /trash folder, it will still move the file to the /trash directory, but will attach a _bak extention to the file.
My use of the Bourne shell is minimal, so here's what I have so far. Any pointers or tips would be greatly appreciated, thanks!
#!/bin/sh
#Scriptname: Trash Utility
source_dir=~/p6_tmp
target_dir=~/trash
echo "Please enter the filename you wish to trash:"
read filename
if [ -f $source_dir $filename]
then mv "$filename" "$target_dir"
else
echo "$filename does not exist"
fi
You cannot use ~ to refer to $HOME in a sh script. Switch to $HOME (or change the shebang to a shell which supports this, such as #!/bin/bash).
To refer to a file in a directory, join them with a slash:
if [ -f "$source_dir/$filename" ]
Notice also the required space before the terminating ] token.
To actually move the file you tested for, use the same expression for the source argument to mv:
mv "$source_dir/$filename" "$target_dir"
As a general design, a script which takes a command-line parameter is much easier to integrate into future scripts than one wich does interactive prompting. Most modern shells offer file name completion and history mechanisms, so a noninteractive script also tends to be more usable (you practically never need to transcribe a file name manually).
A Bash Solution:
#!/bin/bash
source_dir="~/p6_tmp"
target_dir="~/trash"
echo "Please enter the filename you wish to trash:"
read filename
if [ -f ${source_dir}/${filename} ]
then
if [ -f ${target_dir}/${filename} ]
then
mv "${source_dir}/${filename}" "${target_dir}/${filename}_bak"
else
mv "${source_dir}/${filename}" "$target_dir"
fi
else
echo "The file ${source_dir}/${filename} does not exist"
fi
Here's the completed script. Thanks again to all who helped!
#!/bin/sh
#Scriptname: Trash Utility
#Description: This script will allow the user to enter a filename they wish to send to the trash folder.
source_dir=~/p6_tmp
target_dir=~/trash
echo "Please enter the file you wish to trash:"
read filename
if [ -f "$source_dir/$filename" ]
then
if [ -f "$target_dir/$filename" ]
then mv "$source_dir/$filename" "$target_dir/$(basename "$filename")_bak"
date "+%Y-%m-%d %T - Trash renamed ~/$(basename "$source_dir")/$filename to ~/$(basename "/$target_dir")/$(basename "$filename")_bak" >> .trashlog
else mv "$source_dir/$filename" "$target_dir"
date "+%Y-%m-%d %T - Trash moved ~/$(basename "/$source_dir")/$filename to ~/$(basename "/$target_dir")/$filename" >> .trashlog
fi
else
date "+%Y-%m-%d %T - Trash of ~/$(basename "/$source_dir")/$filename does not exist" >> .trashlog
fi

Linux bash script to copy directory and sub directories

I want to copy all the files and sub directories from a directory to a different directory. However I only want to copy them if they are not already in the destination directory or if the timestamp on the source directory file is newer than the timestamp on the destination directory. I am having troubles getting into all of the sub directories. I am able to get down one level but not the next. For example, with directory /a/b/c, I am able to get to sub directory b but not to c. The only way I could see doing this was with a recursive function. My code is below.
#!/bin/bash
SOURCEDIR=/home/kyle/Smaug/csis252
DESTDIR=/home/kyle/Desktop/csis252
copy() {
local DIRECTORY=$1
for FILE in `ls $DIRECTORY`
do
if [ -f $DIRECTORY/$FILE ]
then
echo $FILE file
cp $DIRECTORY/$FILE $DESTDIR/$DIRECTORY/$FILE
fi
if [ -d $FILE ]
then
echo $FILE directory
mkdir $DESTDIR/$DIRECTORY/$FILE
copy $DIRECTORY/$FILE
fi
done
}
cd $SOURCEDIR
copy .
I forgot the $DIRECTORY in the second if statement. I feel stupid now but sometimes it just takes someone else to read through to find stuff like this. I did not use the cp -r $SOURCEDIR $DESTDIR because this would copy everything and sometimes I do not want that. I tried the cp -ur $SOURCEDIR $DESTDIR but it would only copy new stuff over, not update the existing stuff. The final version of my code is below.
#!/bin/bash
SOURCEDIR=/home/kyle/Smaug/csis252
DESTDIR=/home/kyle/Desktop/csis252
copy() {
local DIRECTORY=$1
for FILE in `ls $DIRECTORY`
do
if [ ! -f $DESTDIR/$DIRECTORY/$FILE ] && [ ! -d $DESTDIR/$DIRECTORY/$FILE ]
then
if [ -f $DIRECTORY/$FILE ]
then
echo "$DIRECTORY/$FILE copied"
cp $DIRECTORY/$FILE $DESTDIR/$DIRECTORY/$FILE
fi
if [ -d $DIRECTORY/$FILE ]
then
echo "$DIRECTORY/$FILE directory made"
mkdir $DESTDIR/$DIRECTORY/$FILE
copy $DIRECTORY/$FILE
fi
else
if [ $DESTDIR/$DIRECTORY/$FILE -nt $DIRECTORY/$FILE ] && [ ! -d $DIRECTORY/$FILE ]
then
cp $DIRECTORY/$FILE $DESTDIR/$DIRECTORY/$FILE
echo "$DIRECTORY/$FILE updated"
fi
fi
done
}
cd $SOURCEDIR
copy .
Simpler would be
cp -ur $SOURCEDIR $DESTDIR
-r recursivly copies folders and subfolders
-u updates, copies only when the source is newer
if [ -f $DIRECTORY/$FILE ]
...
if [ -d $FILE ]
You forgot $DIRECTORY/ in your -d check. This isn't a problem for the top-level directories, because when DIRECTORY is ., [ -d dir ] and [ -d ./dir ] will always give the same result, but for subdirectories it does matter.
Note: you may want to look at pre-written programs that do this. cp (at least the GNU version) or rsync can probably avoid the need for any custom script, and also handle special files (special file name characters, or special file types) better than any script will.

comparing two directories with separate diff output per file

I'd need to see what has been changed between two directories which contain different version of a software sourcecode. While I have found a way to get a unique .diff file, how can I obtain a different file for each changed file in the two directories? I'd need this, as the "main" is about 6 MB and wanted some more handy thing.
I came around this problem too, so I ended up with some lines of a shell script. It takes three arguments: Source and destination directory (as used for diff) and a target folder (should exist) for the output.
It's a bit hacky, but maybe it would be useful for someone. So use with care, especially if your paths have special characters.
#!/bin/sh
DIFFARGS="-wb"
LANG=C
TARGET=$3
SRC=`echo $1 | sed -e 's/\//\\\\\\//g'`
DST=`echo $2 | sed -e 's/\//\\\\\\//g'`
if [ ! -d "$TARGET" ]; then
echo "'$TARGET' is not a directory." >&2
exit 1
fi
diff -rqN $DIFFARGS "$1" "$2" | sed "s/Files $SRC\/\(.*\?\) and $DST\/\(.*\?\) differ/\1/" | \
while read file
do
if [ ! -d "$TARGET/`dirname \"$file\"`" ]; then
mkdir -p "$TARGET/`dirname \"$file\"`"
fi
diff $DIFFARGS -N "$1/$file" "$2/$file" > "$TARGET"/"$file.diff"
done
if you want to compare source code it is better to commit it to a source vesioning program as "svn".
after you have done so. do a diff of your uploaded code and pipe it to file.diff
svn diff --old svn:url1 --new svn:url2 > file.diff
A bash for loop will work for you. The following will diff two directories with C source code and produce a separate diff for each file.
for FILE in $(find <FIRST_DIR> -name '*.[ch]'); do DIFF=<DIFF_DIR>/$(echo $FILE | grep -o '[-_a-zA-Z0-9.]*$').diff; diff -u $FILE <SECOND_DIR>/$FILE > $DIFF; done
Use the correct patch level for the lines starting with +++