How to change encoding of files in batch - encoding

I have tons of files encoded in Japanese (Shift JIS) and I have to change the encoding of them to UTF-8
With VSCode, or some other editors such as Sublime, Emacs, I can open those files with encoding Shift JIS and then save them with encoding UTF-8.
How to change encoding of all files under a folder, including subfolders?

Here is the shell script:
function encode()
{
iconv -f shift_jis -t utf-8 "$1" > test
# iconv -f iso8859-15 -t utf8 "$1" > test;
cat test > "$1";
}
function walk()
{
for file in `ls $1`
do
local path=$1"/"$file
if [ -d $path ]
then
echo "DIR $path"
walk $path
else
echo "FILE $path"
encode $path
fi
done
}
if [ $# -ne 1 ]
then
echo "USAGE: $0 TOP_DIR"
else
walk $1
fi

Related

Bash or Python efficient substring matching and filtering

I have a set of filenames in a directory, some of which are likely to have identical substrings but not known in advance. This is a sorting exercise. I want to move the files with the maximum substring ordered letter match together in a subdirectory named with that number of letters and progress to the minimum match until no matches of 2 or more letters remain. Ignore extensions. Case insensitive. Ignore special characters.
Example.
AfricanElephant.jpg
elephant.jpg
grant.png
ant.png
el_gordo.tif
snowbell.png
Starting from maximum length matches to minimum length matches will result in:
./8/AfricanElephant.jpg and ./8/elephant.jpg
./3/grant.png and ./3/ant.png
./2/snowbell.png and ./2/el_gordo.tif
Completely lost on an efficient bash or python way to do what seems a complex sort.
I found some awk code which is almost there:
{
count=0
while ( match($0,/elephant/) ) {
count++
$0=substr($0,RSTART+1)
}
print count
}
where temp.txt contains a list of the files and is invoked as eg
awk -f test_match.awk temp.txt
Drawback is that a) this is hardwired to look for "elephant" as a string (I don't know how to make it take an input string (rather than file) and an input test string to count against, and
b) I really just want to call a bash function to do the sort as specified
If I had this I could wrap some bash script around this core awk to make it work.
function longest_common_substrings () {
shopt -s nocasematch
for file1 in * ; do for file in * ; do \
if [[ -f "$file1" ]]; then
if [[ -f "$file" ]]; then
base1=$(basename "$file" | cut -d. -f1)
base2=$(basename "$file1" | cut -d. -f1)
if [[ "$file" == "$file1" ]]; then
echo -n ""
else
echo -n "$file $file1 " ; $HOME/Scripts/longest_common_substring.sh "$base1" "$base2" | tr -d '\n' | wc -c | awk '{$1=$1;print}' ;
fi
fi
fi
done ;
done | sort -r -k3 | awk '{ print $1, $3 }' > /tmp/filesort_substring.txt
while IFS= read -r line; do \
file_to_move=$(echo "$line" | awk '{ print $1 }') ;
directory_to_move_to=$(echo "$line" | awk '{ print $2 }') ;
if [[ -f "$file_to_move" ]]; then
mkdir -p "$directory_to_move_to"
\gmv -b "$file_to_move" "$directory_to_move_to"
fi
done < /tmp/filesort_substring.txt
shopt -u nocasematch
where $HOME/Scripts/longest_common_substring.sh is
#!/bin/bash
shopt -s nocasematch
if ((${#1}>${#2})); then
long=$1 short=$2
else
long=$2 short=$1
fi
lshort=${#short}
score=0
for ((i=0;i<lshort-score;++i)); do
for ((l=score+1;l<=lshort-i;++l)); do
sub=${short:i:l}
[[ $long != *$sub* ]] && break
subfound=$sub score=$l
done
done
if ((score)); then
echo "$subfound"
fi
shopt -u nocasematch
Kudos to the original solution for computing the match in the script which I found elsewhere in this site

How do I fix 'command not found' that popped out when I tried 'egrep' from a variable?

I wanted to make a program that searches all the lines that contains all the factors given, from a file mydata. I tried to egrep first factor from mydata and save it in a variable a. Then, I tried to egrep the next factor from a and save the result to a again until I egrep all the factors. But when I executed the program, it said
"command not found" in line 14.
if [ $# -eq 0 ]
then
echo -e "Usage: phoneA searchfor [...searchfor]\n(You didn't tell me what you want to search for.)"
else
a=""
for i in $*
do
if [ -z "$a" ]
then
a=$(egrep "$i" mydata)
else
a=$("$a" | egrep "$i")
fi
done
awk -f display.awk "$a"
fi
I expected all the lines including all the factors outputted on the screen in the pattern that I made in display.awk.
$a contains data, not a command. You need to write that data to the pipe.
if [ $# -eq 0 ]
then
printf '%s\n' "Usage: phoneA searchfor [...searchfor]" "(You didn't tell me what you want to search for.)" >&2
exit 1
fi
a=""
for i in "$#"; do
if [ -z "$a" ]; then
a=$(egrep "$i" mydata)
else
a=$(printf '%s' "$a" | egrep "$i")
fi
done
awk -f display.awk "$a"

How to run unix block command eg:if from perl

How to run a if block in unix from perl.
Eg:
location="/home/shon";
if [[ -f $location/sample.txt ]]
then
echo "file found...."
else
echo "Error in getting file"
exit 255
fi
from perl.
As suggested, this code is simple to translate into perl. Assuming you have something more complex, you can spawn a shell to run it: put the shell code in a quoted heredoc so that perl does not substitute the shell variables.
system 'bash', '-c', <<'END_SHELL_CODE';
location="/home/shon"
if [[ -f $location/sample.txt ]]; then
echo "file found...."
else
echo "Error in getting file"
exit 255
fi
END_SHELL_CODE
Try this
#! /usr/bin/perl
my $location ="/home/shon/sample.txt";
if (-f $location)
{
print "file found....";
}
else
{
print "Error in getting file";
}

How to delete unused org-mode attachment files from disc

In org-mode, after deleting a lot of headlines that have had attachment files, the unrefereced files now remain on the disc in my data subdirectory.
Is there a function or script that finds all unrefereced files and does a cleanup?
I faced the same problem today after messing with org-capture-templates and then deleting a bunch of entries that did not come out the way I wanted.
I wrote down this script, which gets the job done (for me).
#!/bin/sh
## Location where org-mode stores attachments
datadir="$HOME/Dropbox/Documents/Org/data";
orgdir="$HOME/Dropbox/Documents/Org/"
echo "The following files appear orphaned:";
files=$(find "$datadir" -type f|perl -ne 'print "$1\n" if /([^\/]*)\/[^\/]*$/'|uniq|while read id; do grep -qiR --include "*.org" "$id" "$orgdir" || find "$datadir" -ipath "*$id*" -type f; done)
echo "$files"
if [ "" == "$files" ]; then
echo "Nothing to do!"
exit
fi
echo "Delete? y/[n]"
read delete
case $delete in
y)
echo "$files" |
while read fn; do
rm "$fn";
done
echo "Done."
;;
*)
echo "Not deleting anything!"
;;
esac

Renaming and Moving Files in Bash or Perl

HI, I'm completely new to Bash and StackOverflow.
I need to move a set of files (all contained in the same folder) to a target folder where files with the same name could already exist.
In case a specific file exists, I need to rename the file before moving it, by appending for example an incremental integer to the file name.
The extensions should be preserved (in other words, that appended incremental integer should go before the extension). The file names could contain dots in the middle.
Originally, I was thinking about comparing the two folders to have a list of the existing files (I did this with "comm"), but then I got a bit stuck. I think I'm just trying to do things in the most complicated possible way.
Any hint to do this in the "bash way"? It's OK if it is done in a script other than bash script.
If you don't mind renaming the files that already exist, GNU mv has the --backup option:
mv --backup=numbered * /some/other/dir
Here is a Bash script:
source="/some/dir"
dest="/another/dir"
find "$source" -maxdepth 1 -type f -printf "%f\n" | while read -r file
do
suffix=
if [[ -a "$dest/$file" ]]
then
suffix=".new"
fi
# to make active, comment out the next line and uncomment the line below it
echo 'mv' "\"$source/$file\"" "\"$dest/$file$suffix\""
# mv "source/$file" "$dest/$file$suffix"
done
The suffix is added blindly. If you have files named like "foo.new" in both directories then the result will be one file named "foo.new" and the second named "foo.new.new" which might look silly, but is correct in that it doesn't overwrite the file. However, if the destination already contains "foo.new.new" (and "foo.new" is in both source and destination), then "foo.new.new" will be overwritten).
You can change the if above to a loop in order to deal with that situation. This version also preserves extensions:
source="/some/dir"
dest="/another/dir"
find "$source" -maxdepth 1 -type f -printf "%f\n" | while read -r file
do
suffix=
count=
ext=
base="${file%.*}"
if [[ $file =~ \. ]]
then
ext=".${file##*.}"
fi
while [[ -a "$dest/$base$suffix$count$ext" ]]
do
(( count+=1 ))
suffix="."
done
# to make active, comment out the next line and uncomment the line below it
echo 'mv' "\"$source/$file\"" "\"$dest/$file$suffix$count$ext\""
# mv "$source/$file" "$dest/$file$suffix$count$ext"
done
As per OP, this can be Perl, not just bash. Here we go
NEW SOLUTION: (paying attention to extension)
~/junk/a1$ ls
f1.txt f2.txt f3.txt z1 z2
~/junk/a1$ ls ../a2
f1.txt f2.1.txt f2.2.txt f2.3.txt f2.txt z1
# I split the one-liner into multiple lines for readability
$ perl5.8 -e
'{use strict; use warnings; use File::Copy; use File::Basename;
my #files = glob("*"); # assume current directory
foreach my $file (#files) {
my $file_base2 = basename($file);
my ($file_base, $ext) = ($file_base2 =~ /(.+?)([.][^.]+$)?$/);
my $new_file_base = "../a2/$file_base";
my $new_file = $new_file_base . $ext;
my $counter = 1;
while (-e $new_file) {
$new_file = "$new_file_base." . $counter++ . $ext;
}
copy($file, $new_file)
|| die "could not copy $file to $new_file: $!\n";
} }'
~/junk/a1> ls ../a2
f1.1.txt f1.txt f2.1.txt f2.2.txt f2.3.txt f2.4.txt f2.txt f3.txt
z1 z1.1 z2
OLD SOLUTION: (not paying attention to extension)
~/junk/a1$ ls
f1 f2 f3
~/junk/a1$ ls ../a2
f1 f2 f2.1 f2.2 f2.3
# I split the one-liner into multiple lines for readability
$ perl5.8 -e
'{use strict; use warnings; use File::Copy; use File::Basename;
my #files = glob("*"); # assume current directory
foreach my $file (#files) {
my $file_base = basename($file);
my $new_file_base = "../a2/$file_base";
my $new_file = $new_file_base;
my $counter = 1;
while (-e $new_file) { $new_file = "$new_file_base." . $counter++; }
copy($file,$new_file)
|| die "could not copy $file to $new_file: $!\n";
} }'
~/junk/a1> ls ../a2
f1 f1.1 f2 f2.1 f2.2 f2.3 f2.4 f3
I feel bad for posting this without testing it. However it is late and I have work in the morning. My attempt would look something like this:
## copy files from src to dst
## inserting ~XX into any name between base and extension
## where a name collision would occur
src="$1"
dst="$2"
case "$dst" in
/*) :;; # absolute dest is fine
*) dst=$(pwd)/$dst;; # relative needs to be fixed up
esac
cd "$src"
find . -type f | while read x; do
x=${x#./} # trim off the ./
t=$x; # initial target
d=$(dirname $x); # relative directory
b=$(basename $x); # initial basename
ext=${b%%.*}; # extension
b=${b##*.}; # basename with ext. stripped off
let zz=0; # initial numeric
while [ -e "$dst/$t" ]; do
# target exists, so try constructing a new target name
t="$d/$bb~$zz.$ext"
let zz+=1;
done
echo mv "./$x" "$dst/$t"
done
Overall the strategy is to get each name from the source path, break it into parts, and, for any collision, iterate over names of the form "base~XX.extension" until we find one that doesn't collide.
Obviously I have prepended the mv command with an echo because I'm a coward. Remove that at your own (files') peril.
If you dont need incremental suffix, rsync can do the job:
rsync --archive --backup --suffix=.sic src/ dst
Update:
find/sed/sort is used to manage versioned backup files:
#!/bin/bash
src="${1}"
dst="${2}"
if test ! -d "${src}" -o ! -d "${dst}" ;then
echo Usage: $0 SRC_DIR DST_DIR >&2
exit 1
fi
rsync --archive --backup "${src}/" "${dst}/"
new_name() {
local dst=$1
local prefix=$2
local suffix=$3
local max=$(find ${dst} -type f -regex ".*${prefix}.[0-9]*.${suffix}\$" \
| sed 's/.*\.\([0-9]*\)\..*/\1/'|sort -n|tail -n 1)
let max++
echo ${prefix}.${max}.${suffix}
}
# swap BACKUP-extension/real-extension
for backup_file in $(find $dst -name "*~"); do
file=${backup_file%~}
prefix=${file%.*}
suffix=${file##*.}
suffix=${suffix%\~}
mv ${backup_file} $(new_name $dst $prefix $suffix)
done