iterating a list of strings with whitespace in POSIX sh - sh

I have two lists of strings (with whitespace in each string), and want to read them one by one. The length of the strings are the same, so I get the length of one of them, via "${!argument[#]}", and try to read the elements of the lists. But it fails:
arguments="--a 100 --b 200" "--a 100 --b 200 --c"
settings="without_c" "with_c"
for index in "${!argument[#]}"
do
setting=${setting[$index]}
argument=${argument[$index]}
done
Gives the following error:
alp#ubuntu:~$ sh toy.sh
toy.sh: 1: toy.sh: --a 100 --b 200 --c: not found
toy.sh: 2: toy.sh: with_c: not found
toy.sh: 3: toy.sh: Bad substitution

Your first line reads
arguments="--a 100 --b 200" "--a 100 --b 200 --c"
This is a special case of the following structure:
V=X P
(with V being arguments, X being "--a 100 --b 200" and P being --a 100 --b 200 --c).
The semantic of such a statement is to execute the program P in an environment, where the environment variable V is set to X.
In your case, it means that you ask the shell to execute the "program" --a 100 --b 200 --c and such a program does not exist. This is what the error message says.
In your title, you say that you want to process a list, but you are using a programming language (Posix shell), which does not support lists. You are not using bash, as you claim in your comment, because if you would use bash, the error message would be different.
Of course, even in bash, the first line would be incorrect, because an array assignment (what you call a list is called an array in bash) in bash would follow the syntax name=(val1 val2 ....).

I like to do "mock" 2 dimensional arrays in POSIX sh (/bin/sh) using something like the following. Note, you'll need to pick a delimiter (in this case |) that is not used in your strings.
#!/bin/sh
## Save this script as testloop.sh
for STRING in "red|little corvette" "green|hip, funky onions" "blue|high, high moon"
do
IFS='|' read -r COLOUR THING <<-_EOF_
$STRING
_EOF_
echo "The colour of the $THING is $COLOUR"
done
When you run that, you get:
sh testloop.sh
The colour of the little corvette is red
The colour of the hip, funky onions is green
The colour of the high, high moon is blue
For your example, you would use something similar
#!/bin/sh
## Save this script as loopArguments.sh
for STRING in "--a 100 --b 200|without_c|1st" "--a 100 --b 200 --c|with_c|2nd" "--a 300 --b 400|without_c|3rd"
do
IFS='|' read -r LINE CSTATE NUMBER <<-_EOF_
$STRING
_EOF_
echo "The $NUMBER line is $LINE and it is $CSTATE"
done
which will do this:
sh loopArguments.sh
The 1st line is --a 100 --b 200 and it is without_c
The 2nd line is --a 100 --b 200 --c and it is with_c
The 3rd line is --a 300 --b 400 and it is without_c

Related

sh: can't return one result after comparing 2 files

as an example I will put different inputs to keep the privacy of my files and to avoid long text, these are of the following form :
INPUT1.cfg :
TC # aa # D317
TC # bb # D314
TC # cc # D315
TC # dd # D316
INPUT2.cfg
BL;nn;3
LY;ww;3
LO;xx;3
TC;vv;3
TC;dd;3
OD;pp;3
TC;aa;3
what I want to do is iterate the name (column 2) in the rows of input1 and compare with the name (column 2) in the rows of input2; if they match we will get the line of INPUT2 in an output file otherwise it will return that the table is not found, here is my try code:
#!/bin/bash
input1="input1.cfg";
input2="input2.cfg"
cat $input1|while read line
do
TableNameIN=`echo $line|cut -d"#" -f2`
cat $input2| while read line
do
TableNameOUT=`echo $line|cut -d";" -f2`
if echo "$TableNameOUT" | grep -q $TableNameIN;
then echo "$line" >> output.txt
else
echo "Table $TableNameIN non trouvé"
fi
done
done
this what i get as result :
Table bb not found
Table bb not found
Table bb not found
Table cc not found
Table cc not found
Table cc not found
I manage to write what is equal but the problem with my code is that it has in output "table not found" for each row whereas I just want to write only once at the end of the comparison of all the lines
here is the output i want to get :
Table bb not found
Table cc not found
Can any one help me with this , PS : I don't want to use awk because it's just a part of my code and i already use sh
Assumptions:
for file input2.cfg the 2nd column (table name) is unique
input2.cfg is not so large that we run the risk of using up all memory for storing intput2.cfg in an associative array (otherwise we could store the table names from input1.cfg's - assuming this is a smaller file - in the array and swap the processing order of the two files)
there are no explicit requirements for data to be sorted (otherwise we may need to add a sort or two)
a bash solution is sufficient (based on inclusion of the #!/bin/bash shebang in OPs current code)
There are many ways to slice-n-dice this one (awk being my preference but OP doesn't want to use awk). For this particular answer I'll pull the awk steps out into separate bash commands.
NOTE: While we could use a set of nested loops (as in the OPs code), I've opted to use an associative array to store input2.cfg thus eliminating the need to repeatedly scan input2.cfg.
#!/usr/bin/bash
input1=input1.cfg
input2=input2.cfg
> output.txt # clear out the target file
# load ${input2} into an associative array
unset lines
typeset -A lines # associative array for storing contents of ${input2}
while read -r line
do
x="${line%;*}" # use parameter expansion
tabname="${x#*;}" # to parse out table name
lines["${tabname}"]="${line}" # add to array
done < "${input2}"
# process ${input1}
while read -r c1 c2 tabname rest_of_line
do
[[ -v lines["${tabname}"] ]] && # if tabname has an entry in our array
echo "${lines[${tabname}]}" >> output.txt && # then dump the associated line (from ${input2}) to output.txt
continue # process next line from ${input1}
echo "Table ${tabname} not found" # otherwise print 'not found' message
done < "${input1}"
# display contents of output.txt
echo "++++++++++++++++ output.txt"
cat output.txt
echo "++++++++++++++++"
This generates the following:
Table bb not found
Table cc not found
++++++++++++++++ output.txt
TC;aa;3
TC;dd;3
++++++++++++++++

Textscan from end of line

I am trying to read a very heavy file. On each line I have some integers but the numbers of integers is not known. I just want to extract the last n items. I couldn't find the right syntax for doing this.
Example:
lineA='10 200 300 400 500';
lineB='300 400 500 550';
pA=textscan(lineA,'%u %u %u');
pB=textscan(lineB,'%u %u %u');
The results should be:
pA={[300]} {[400]} {[500]}
pB={[400]} {[500]} {[550]}
Currently I am not able to know the size of each line and I want to avoid having to. On this example I just read lines but in my actual script I read a file with 10e6 lines and I use the syntax textscan(fid,format,10e6).

Variable not being recognized after "read"

-- Edit : Resolved. See answer.
Background:
I'm writing a shell that will perform some extra actions required on our system when someone resizes a database.
The shell is written in ksh (requirement), the OS is Solaris 5.10 .
The problem is with one of the checks, which verifies there's enough free space on the underlying OS.
Problem:
The check reads the df -k line for root, which is what I check in this step, and prints it to a file. I then "read" the contents into variables which I use in calculations.
Unfortunately, when I try to run an arithmetic operation on one of the variables, I get an error indicating it is null. And a debug output line I've placed after that line verifies that it is null... It lost it's value...
I've tried every method of doing this I could find online, they work when I run it manually, but not inside the shell file.
(* The file does have #!/usr/bin/ksh)
Code:
df -k | grep "rpool/ROOT" > dftest.out
RPOOL_NAME=""; declare -i TOTAL_SIZE=0; USED_SPACE=0; AVAILABLE_SPACE=0; AVAILABLE_PERCENT=0; RSIGN=""
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
This is the result:
DBResize.sh[11]: TOTAL_SIZE=/1024: syntax error
I'm pulling hairs at this point, any help would be appreciated.
The code you posted cannot produce the output you posted. Most obviously, the error is signalled at line 11 but you posted fewer than 11 lines of code. The previous lines may matter. Always post complete code when you ask for help.
More concretely, the declare command doesn't exist in ksh, it's a bash thing. You can achieve the same result with typeset (declare is a bash equivalent to typeset, but not all options are the same). Either you're executing this script with bash, or there's another error message about declare, or you've defined some additional commands including declare which may change the behavior of this code.
None of this should have an impact on the particular problem that you're posting about, however. The variables created by read remain assigned until the end of the subshell, i.e. until the code hits a ), the end of a pipe (left-hand side of the pipe only in ksh), etc.
About the use of declare or typeset, note that you're only declaring TOTAL_SIZE as an integer. For the other variables, you're just assigning a value which happens to consist exclusively of digits. It doesn't matter for the code you posted, but it's probably not what you meant.
One thing that may be happening is that grep matches nothing, and therefore read reads an empty line. You should check for errors. Use set -e in scripts to exit at the first error. (There are cases where set -e doesn't catch errors, but it's a good start.)
Another thing that may be happening is that df is splitting its output onto multiple lines because the first column containing the filesystem name is too large. To prevent this splitting, pass the option -P.
Using a temporary file is fragile: the code may be executed in a read-only directory, another process may want to access the same file at the same time... Here a temporary file is useless. Just pipe directly into read. In ksh (unlike most other sh variants including bash), the right-hand side of a pipe runs in the main shell, so assignments to variables in the right-hand side of a pipe remain available in the following commands.
It doesn't matter in this particular script, but you can use a variable without $ in an arithmetic expression. Using $ substitutes a string which can have confusing results, e.g. a='1+2'; $((a*3)) expands to 7. Not using $ uses the numerical value (in ksh, a='1+2'; $((a*3)) expands to 9; in some sh implementations you get an error because a's value is not numeric).
#!/usr/bin/ksh
set -e
typeset -i TOTAL_SIZE=0 USED_SPACE=0 AVAILABLE_SPACE=0 AVAILABLE_PERCENT=0
df -Pk | grep "rpool/ROOT" | read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=TOTAL_SIZE/1024))
Strange...when I get rid of your "declare" line, your original code seems to work perfectly well (at least with ksh on Linux)
The code :
#!/bin/ksh
df -k | grep "/home" > dftest.out
read RPOOL_NAME TOTAL_SIZE USED_SPACE AVAILABLE_SPACE AVAILABLE_PERCENT RSIGN < dftest.out
\rm dftest.out
echo $RPOOL_NAME $TOTAL_SIZE $USED_SPACE $AVAILABLE_SPACE $AVAILABLE_PERCENT $RSIGN
((TOTAL_SIZE=$TOTAL_SIZE/1024))
print $TOTAL_SIZE
The result :
32962416 5732492 25552588 19% /home
5598
Which are the value a simple df -k is returning. The variables seem to last.
For those interested, I have figured out that it is not possible to use "read" the way I was using it.
The variable values assigned by "read" simply "do not last".
To remedy this, I have applied the less than ideal solution of using the standard "while read" format, and inside the loop, echo selected variables into a variable file.
Once said file was created, I just "loaded" it.
(pseudo code:)
LOOP START
echo "VAR_A="$VAR_A"; VAR_B="$VAR_B";" > somefile.out
LOOP END
. somefile.out

How to use fishshell to add numbers to files

I have a very simple mp3 player, and the order it plays audio files are based on the file names, and the rule is there must be a 3-size number in the beginning of file name, such as:
001file.mp3
002file.mp3
003file.mp3
I want to write a fish shell sortmp3 to add numbers to the files of a directory. Say directory myfiles contains files:
aaa.mp3
bbb.mp3
ccc.mp3`
When I run sortmp3 myfiles, the file names will be changed to:
001aaa.mp3
002bbb.mp3
003ccc.mp3
But my question is:
how to generate some sequential numbers?
how to make sure the size of each number is exactly 3?
I would write this, which makes no assumptions about how many files there are in a directory:
function sortmp3
set -l files *
set -l i
for i in (seq (count $files))
echo mv $files[$i] (printf "%03d%s" $i $files[$i])
end
end
Remove the "echo" if you like how it works.
You can generate sequential numbers with the seq tool - an external program.
This will only take care of the first part, it won't pad to three characters.
To do that, there's a variety of choices:
printf '%s\n' 00(seq 0 99) | rev | cut -c 1-3 | rev
printf '%s\n' 00(seq 0 99) | sed 's/^.*\(...\)$/\1/'
The 00(seq 0 99) part will generate numbers from "1" to "99" with two zeroes prepended - ie. from "001" to "0099". The later parts of the pipeline remove the superfluous zeroes again.
Or with the next fish version, you can use the new string tool:
string sub -s -3 -- 00(seq 0 99)
Depending on your specific situation you should use the "seq" command to generate sequential numbers or the "math" command to increment a counter. To format the number with a predictable number of leading zeros use the "printf" command:
set idx 12
printf '%03d' $idx

creating a hash with regex matches in perl

Lets say i have a file like below:
And i want to store all the decimal numbers in a hash.
hello world 10 20
world 10 10 10 10 hello 20
hello 30 20 10 world 10
i was looking at this
and this worked fine:
> perl -lne 'push #a,/\d+/g;END{print "#a"}' temp
10 20 10 10 10 10 20 30 20 10 10
Then what i need was to count number of occurrences of each regex.
for this i think it would be better to store all the matches in a hash and assign an incrementing value for each and every key.
so i tried :
perl -lne '$a{$1}++ for ($_=~/(\d+)/g);END{foreach(keys %a){print "$_.$a{$_}"}}' temp
which gives me an output of:
> perl -lne '$a{$1}++ for ($_=~/(\d+)/g);END{foreach(keys %a){print "$_.$a{$_}"}}' temp
10.4
20.7
Can anybody correct me whereever i was wrong?
the output i expect is:
10.7
20.3
30.1
although i can do this in awk,i would like to do it only in perl
Also order of the output is not a concern for me.
$a{$1}++ for ($_=~/(\d+)/g);
This should be
$a{$_}++ for ($_=~/(\d+)/g);
and can be simplified to
$a{$_}++ for /\d+/g;
The reason for this is that /\d+/g creates a list of matches, which is then iterated over by for. The current element is in $_. I imagine $1 would contain whatever was left in there by the last match, but it's definitely not what you want to use in this case.
Another option would be this:
$a{$1}++ while ($_=~/(\d+)/g);
This does what I think you expected your code to do: loop over each successful match as the matches happen. Thus the $1 will be what you think it is.
Just to be clear about the difference:
The single argument for loop in Perl means "do something for each element of a list":
for (#array)
{
#do something to each array element
}
So in your code, a list of matches was built first, and only after the whole list of matches was found did you have the opportunity to do something with the results. $1 got reset on each match as the list was being built, but by the time your code was run, it was set to the last match on that line. That is why your results didn't make sense.
On the other hand, a while loop means "check if this condition is true each time, and keep going until the condition is false". Therefore, the code in a while loop will be executed on each match of a regex, and $1 has the value for that match.
Another time this difference is important in Perl is file processing. for (<FILE>) { ... } reads the entire file into memory first, which is wasteful. It is recommended to use while (<FILE>) instead, because then you go through the file line by line and keep only the information you want.