Failed to open the file - sed

I am trying to figure out the picture date of files in a folder structure. Some of the folder names contain with whitespaces. Now I try to set the quotes, but it doesn't work.
Can anyone give me a hint?
find . -name "*.jpg" -or -name "*.JPG" >> files.txt
sed -e "s/\(.*\)/'\1'/" files.txt >> files2.txt
for fn in `cat files2.txt`; do
DATEI=$( echo "$fn" | cut -c 3-)
EXIF=$(/usr/bin/exiv2 -pa --grep DateTimeOriginal "'"$PWD$DATEI | awk -F" " '{print $4","$5}')
if [ -z "$EXIF" ]
then
:
else
echo "$PWD$DATEI,$EXIF" >> ausgabe.csv
fi
done
echo "DONE!"
EDIT: This is the output that I get:
'/volume1/Intern/path/to/images/IMG_4206.jpg': Failed to open the file

I take it your result is supposed to look like
"/path/to/photo1.jpg","2017:01:15","22:19:15"
"/path/to/another/photo.JPG","2017:01:15","22:10:01"
The absolute path to the photo, then the DateTimeOriginal date/time, all in quotes.
exiv2 can actually take multiple photos in the file argument, so the whole process can be simplified to a pipeline of two commands:
# Need this for the fileglob
shopt -s globstar extglob
exiv2 -pa -g DateTimeOriginal **/*.#(jpg|JPG) |
awk -v pwd="$PWD/" -v dq='"' -v OFS=',' '{
fn = substr($0, 1, match($0, / *Exif\.Photo/)-1)
print dq pwd fn dq, dq $(NF-1) dq, dq $NF dq
}'
The shell options, globstar and extglob, enable the **/*.*#(jpg|JPG) expression, which returns all files ending in jpg or JPG for the whole directory tree.
exiv2 returns only something for the files that contain DateTimeOriginal data. The intermediate output looks something like this (some whitespace removed):
dir1/photo1.jpg Exif.Photo.DateTimeOriginal Ascii 20 2017:01:22 10:20:36
dir1/photo3.jpg Exif.Photo.DateTimeOriginal Ascii 20 2017:01:22 10:20:36
dir with space/photo2.JPG Exif.Photo.DateTimeOriginal Ascii 20 2017:01:22 10:20:38
dir with space/photo4.JPG Exif.Photo.DateTimeOriginal Ascii 20 2017:01:22 10:40:09
photo5.jpg Exif.Photo.DateTimeOriginal Ascii 20 2017:01:24 20:06:38
photo6.JPG Exif.Photo.DateTimeOriginal Ascii 20 2017:01:22 10:00:55
This would be straightforward with awk, were it not for the paths with spaces as mentioned in the question. The exiv2 output is space separated, and there doesn't seem to be an option to get tabs, so some awk trickery is required:
The path of the current directory, followed by a slash, is passed into the command using -v pwd="$PWD/".
To avoid messy escaping, we define the double quote with -v dq='"'.
The output field separator is set to a comma with -v OFS=','.
To get the filename, we search for the index of a series of spaces followed by Exif.Photo, then we assign a substring that ends just before that index to fn.
To print quoted and comma separated, we use our dq variable, prepend the filename with the path from pwd, and use $(NF-1) and $NF to get the second to last and last field, respectively.
The result is something like
"/home/benjamin/tinker/space dir/dir1/photo1.jpg","2017:01:22","10:20:36"
"/home/benjamin/tinker/space dir/dir1/photo3.jpg","2017:01:22","10:20:36"
"/home/benjamin/tinker/space dir/dir with space/photo2.JPG","2017:01:22","10:20:38"
"/home/benjamin/tinker/space dir/dir with space/photo4.JPG","2017:01:22","10:40:09"
"/home/benjamin/tinker/space dir/photo5.jpg","2017:01:24","20:06:38"
"/home/benjamin/tinker/space dir/photo6.JPG","2017:01:22","10:00:55"
To get this into a file, a redirection > ausgabe.csv has to be appended to the command.
As for why your command didn't work: let's look at a single file as an example. After the sed step, you have something like './photo5.jpg'. Now, you use cut -c 3-, which gives you /photo5.jpg'.
In your EXIF line, you add another single quote, so now exiv2 is looking for a file literally called '/photo5.jpg', including the single quotes – which doesn't exist.

Related

How to Find & Replace a String Within Files with Find / Grep / Sed

I have a folder of 500 *.INI files that I need to manually edit. Within each INI file, I have the line Source =. I would like that line to become Source = C:\software\{filename}.
For instance, a dx4.ini file would need to be fixed to become: Source = C:\software\dx4
Is there a quick way to do this with Find, Grep, or Sed functions?
You can try with sed
For example
Input file contents:
file.txt
Source =
some lines..
script:
newstring='Source = C:\software\dx4'
oldstring='Source ='
echo `sed "s/$oldstring/$newstring/g" file.txt` > file.txt
After running the above commands
output:
Source = C:\software\dx4
some lines..
If you want to edit a file in a script, I think ed is the way to go. Combined with a shell for loop:
for file in *.INI; do
base=$(basename "$file" .INI)
ed -s "$file" <<EOF
/^Source =/s/=/= C:\\\\software\\\\$base/
w
EOF
done
(This does assume that filenames will not have newlines or ampersands in their names)
With GNU awk for the 3rd arg to match(), gensub(), and "inplace" editing:
awk -i inplace '
match($0,/(.*Source = C:\\software\\){filename}(.*)/,a) {
fname = gensub(/\..*/,"",1,FILENAME)
$0 = a[1] fname a[2]
}
1' *.INI
The above assumes you're running in a UNIX environment though your use of the term folder instead of directory and that path starting with C: and containing backslashes makes me suspicious. If you're on Windows then save the part between the 2 's (exclusive) in a file named foo.awk and execute it as awk -i inplace foo.awk *.INI or however it is you normally execute commands like this in Windows.
find *.ini -type -f > stack
while read line
do
sed -i s"#Source =#Source = C:\\software\\dx4#" "${line}"
done < stack
Assuming that a} You have sed with "-i" (the insert flag, which AFAIK is not always portable) and b} sed doesn't crap itself about a double escape sequence, I think that will work.

Remove everything in a line before comma

I have multiple files with lines like:
foo, 123456
bar, 654321
baz, 098765
I would like to remove everything on each line before (and including) the comma.
The output would be:
123456
654321
098765
I attempted to use the following after seeing something similar on another question, but the user didn't leave an explanation, so I'm not sure how the wildcard would be handled:
find . -name "*.csv" -type f | xargs sed -i -e '/*,/d'
Thank you for any help you can offer.
METHOD 1:
If it's always the 2nd column you want, you can do this with awk -- this command is actually splitting the rows on the whitespace rather than the comma, so it gets your second column -- the numbers, but without the leading space:
awk '{print $2}' < whatever.csv
METHOD 2:
Or to get everything after the comma (including the space):
sed -e 's/^.*,//g' < whatever.csv
METHOD 3:
If you want to find all of the .csv files and get the output of all of them together, you can do:
sed -e 's/^.*,//g' `find . -name '*.csv' -print`
METHOD 4:
Or the same way you were starting to -- with find and xargs:
find . -name '*.csv' -type f -print | xargs sed -e 's/^.*,//'
METHOD 5:
Making all of the .csv files into .txt files, processed in the way described above, you can make a brief shell script. Like this:
Create a script "bla.sh":
#!/bin/sh
for infile in `find . -name '*.csv' -print` ; do
outfile=`echo $infile | sed -e 's/.csv/.txt/'`
echo "$infile --> $outfile"
sed -e 's/^.*,//g' < $infile > $outfile
done
Make it executable by typing this:
chmod 755 bla.sh
Then run it:
./bla.sh
This will create a .txt output file with everything after the comma for each .csv input file.
ALTERNATE METHOD 5:
Or if you need them to be named .csv, the script could be updated like this -- this just makes an output file named "file-new.csv" for each input file named "file.csv":
#!/bin/sh
for infile in `find . -name '*.csv' -print` ; do
outfile=`echo $infile | sed -e 's/.csv/-new.csv/'`
echo "$infile --> $outfile"
sed -e 's/^.*,//g' < $infile > $outfile
done
Something like this should work for a single file. Let's say the
input is 'yourfile' and you want the output to go to 'outfile'.
sed 's/^.*,//' < yourfile > outfile
The syntax to do a search-and-replace is s/input_pattern/replacement/
The ^ anchors the input pattern to the beginning of the line.
A dot . matches any single character; .* matches a string of zero or more of any character.
The , matches the comma.
The replacement pattern is empty, so whatever matched the input_pattern
will be removed.

how to use the name of the input file in sed replace

i have several files in which i want to replace a certain word with the name of the file itself..
for example i have 2 files named test1.txt and test2.txt
both files are equal and look like
bla1,bla2,temp
bla2,bla3,temp
with the sed i want to replace the word temp with the name of the file itself
so after the sed operation i have 2 different files
test1.txt , which looks like :
bla1,bla2,test1
bla2,bla3,test1
test2.txt, which looks like :
bla1,bla2,test2
bla2,bla3,test2
so my question ... how do i use the actual name of the input file itself as part of the replace command?
sed "s/temp/ ??filename??/ ??? " *.txt
thanks for your suggestions
I'm not sure you can reference the filename using sed although I could be wrong. You would probably use a shell hack. A better aproach to substitute all occurrences of temp with the filename would be the following awk script:
$ awk '{gsub(/temp/,FILENAME)}1' file
use awk, awk has FILENAME variable:
awk '{sub(/temp/,FILENAME)}7' yourfile
awk 'BEGIN{FS=OFS=","} {$NF=FILENAME}1' file
The difference between this and the sub() solutions is that this will work even if the word "temp" exists elsewhere in your file, e.g. if "bla1" contains the word "temperature".
If you need to strip ".txt" from the file name as it appears from your posted desired output, tweak it to:
awk 'BEGIN{FS=OFS=","} {t=FILENAME; sub(/\.txt$/,"",t); $NF=t}1' file
You can probably edit FILENAME itself but I find it best not to mess with the builtin variables if you don't have to.
You could do it with a little bit of bash to help you out, if that's available.
find . -name "test*.txt" -type f | awk -F '/' '{print $2;}' | while read file; do sed -i "s|temp|$file|" ./$file; done
That's a kind of hacky adaptation of a script I have to do something similar. It can undoubtedly be shortened.
no sed internal variable for the file name so you need some previous batch command for a generic process
for FileName in MyFileShellFilter
do
cat <> ${FileName} | sed "s|,temp$|,${FileName}|"
done
just be carrefull with file name used, they normaly don't have \ but could have & that are s// special meaning. I use | as separator to allow / in file name but for this reason, no unescaped | are allowed in file name (normaly not)
with xargs:
printf "%s\n" *.txt | xargs -I FILE -L 1 sed 's/temp/FILE/' FILE
The filename cannot have: newlines, slashes, ampersand, single quote.

perl -pe to manipulate filenames

I was trying to do some quick filename cleanup at the shell (zsh, if it matters). Renaming files. (I'm using cp instead of mv just to be safe)
foreach f (\#*.ogg)
cp $f `echo $f | perl -pe 's/\#\d+ (.+)$/"\1"/'`
end
Now, I know there are tools to do stuff like this, but for personal interest I'm wondering how I can do it this way. Right now, I get an error:
cp: target `When.ogg"' is not a directory
Where 'When.ogg' is the last part of the filename. I've tried adding quotes (see above) and escaping the spaces, but nonetheless this is what I get.
Is there a reason I can't use the output of s perl pmr=;omrt as the final argument to another command line tool?
It looks like you have a space in the file names being processed, so each of your cp command lines evaluates to something like
cp \#nnnn When.Ogg When.ogg
When the cp command sees more than two arguments, the last one must be a target directory name for all the files to be copied to - hence the error message. Because your source filename ($f) contains a space it is being treated as two arguments - cp sees three args, rather than the two you intend.
If you put double quotes around the first $f that should prevent the two 'halves' of the name from being treated as separate file names:
cp "$f" `echo ...
This is what you need in bash, hope it's good for zsh too.
cp "$f" "`echo $f | perl -pe 's/\#\d+ (.+)$/\1/'`"
If the filename contains spaces, you also have quote the second argument of cp.
I often use
dir /b ... | perl -nle"$o=$_; s/.../.../; $n=$_; rename $o,$n if !-e $n"
The -l chomps the input.
The -e check is to avoid accidentally renaming all the files to one name. I've done that a couple of times.
In bash (and I'm guessing zsh), that would be
foreach f (...)
echo "$f" | perl -nle'$o=$_; s/.../.../; $n=$_; rename $o,$n if !-e $n'
end
or
find -name '...' -maxdepth 1 \
| perl -nle'$o=$_; s/.../.../; $n=$_; rename $o,$n if !-e $n'
or
find -name '...' -maxdepth 1 -exec \
perl -e'for (#ARGV) {
$o=$_; s/.../.../; $n=$_;
rename $o,$n if !-e $n;
}' {} +
The last supports file names with newlines in them.

help using command line to extract snippets of data on stdout

I would like the option of extracting the following string/data:
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample
=or=
25
myproxy
sample
But it would help if I see both.
From this output using cut or perl or anything else that would work:
Found 3 items
drwxr-xr-x - foo_hd foo_users 0 2011-03-16 18:46 /work/foo/processed/25
drwxr-xr-x - foo_hd foo_users 0 2011-04-05 07:10 /work/foo/processed/myproxy
drwxr-x--- - foo_hd testcont 0 2011-04-08 07:19 /work/foo/processed/sample
Doing a cut -d" " -f6 will get me foo_users, testcont. I tried increasing the field to higher values and I'm just not able to get what I want.
I'm not sure if cut is good for this or something like perl?
The base directories will remain static /work/foo/processed.
Also, I need the first line Found Xn items removed. Thanks.
You can do a substitution from beginning to the first occurrence of / , (non greedily)
$ your_command | ruby -ne 'print $_.sub(/.*?\/(.*)/,"/\\1") if /\//'
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample
Or you can find a unique separator (field delimiter) to split on. for example, the time portion is unique , so you can split on that and get the last element. (2nd element)
$ ruby -ne 'print $_.split(/\s+\d+:\d+\s+/)[-1] if /\//' file
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample
With awk,
$ awk -F"[0-9][0-9]:[0-9][0-9]" '/\//{print $NF}' file
/work/foo/processed/25
/work/foo/processed/myproxy
/work/foo/processed/sample
perl -lanF"\s+" -e 'print #F[-1] unless /^Found/' file
Here is an explanation of the command-line switches used:
-l: remove line break from each line of input, then add one back on print
-a: auto-split each line of input into an #F array
-n: loop through each line of input
-F: the regexp pattern to use for the auto-split (with -a)
-e: the perl code to execute (for each line of input if using -n or -p)
If you want to just output the last portion of your directory path, and the basedir is always '/work/foo/processed', I would do this:
perl -nle 'print $1 if m|/work/foo/processed/(\S+)|' file
Try this out :
<Your Command> | grep -P -o '[\/\.\w]+$'
OR if the directory '/work/foo/processed' is always static then:
<Your Command>| grep -P -o '\/work\/foo\/processed\/.+$'
-o : Show only the part of a matching line that matches PATTERN.
-P : Interpret PATTERN as a Perl regular expression.
In this example, the last word in the input will be matched .
(The word can also contain dot(s)),so file names like 'text_file1.txt', can be matched).
Ofcourse, you can change the pattern, as per your requirement.
If you know the columns will be the same, and you always list the full path name, you could try something like:
ls -l | cut -c79-
which would cut out the 79th character until the end. That might work in this exact case, but I think it would be better to find the basename of the last field. You could easily do this in awk or perl. Respond if this is not what you want and I'll add the awk and perl versions.
take the output of your ls command and pipe it to awk
your command|awk -F'/' '{print $NF}'
your_command | perl -pe 's#.*/##'