For example, I get a phone number like 9191234567, how could I separate it into two parts, with the first part containing the three leading digits 919 and the other part containing the rest seven digits 1234567? After that, I want to store these two parts into two different variables in ksh.
I don't know if this could be done with sed?
You could try this :
echo "9191234567" | sed 's/^\([0-9]\{3\}\)\([0-9]\{7\}\)$/\1 \2/'
To store each part in a separate variable, you could do this :
phone="9191234567"
part1=$(echo $phone | sed 's/^\([0-9]\{3\}\)[0-9]\{7\}$/\1/')
part2=$(echo $phone | sed 's/^[0-9]\{3\}\([0-9]\{7\}\)$/\1/')
Or even more concise :
read part1 part2 <<< $(echo "9191234567" | sed 's/^\([0-9]\{3\}\)\([0-9]\{7\}\)$/\1 \2/')
cut should work
echo '9191234567' | cut --characters 1-3,4- --output-delimiter ' '
919 1234567
echo 9191234567 | sed 's/^\([1-9]\{3\}\)\([1-9]*\)/\1\-\2/'
Will print 919-1234567
Using bash
$ phone=9191234567
$ regex="^([0-9]{3})([0-9]{7})$"
$ [[ $phone =~ $regex ]] && part1="${BASH_REMATCH[1]}" && part2="${BASH_REMATCH[2]}"
$ echo $part1
919
$ echo $part2
1234567
Pure ksh, take number, print as two separate strings, separated by white space.
function split_at_third {
typeset number=$1 a b
b=${number#???} && a=${number%$b}
print $a $b
}
Related
I have a problem with replacing string.
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
I want to find occurrence of Svc till | appears and swap place with Stm till | appears.
My attempts went to replacing characters and this is not my goal.
awk -F'|' -v OFS='|'
'{a=b=0;
for(i=1;i<=NF;i++){a=$i~/^Stm=/?i:a;b=$i~/^Svc=/?i:b}
t=$a;$a=$b;$b=t}7' file
outputs:
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
the code exchange the column of Stm.. and Svc.., no matter which one comes first.
If perl solution is okay, assumes only one column matches each for search terms
$ cat ip.txt
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
$ perl -F'\|' -lane '
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F;
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t;
print join "|", #F;
' ip.txt
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
-F'\|' -lane split input line on |, see also Perl flags -pe, -pi, -p, -w, -d, -i, -t?
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F get index of columns matching Svc and Stm
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t swap the two columns
Or use ($F[$i[0]], $F[$i[1]]) = ($F[$i[1]], $F[$i[0]]); courtesy How can I swap two Perl variables
print join "|", #F print the modified array
You need to use capture groups and backreferences in a string substition.
The below will swap the 2:
echo '|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631' | sed 's/\(Stm.*|\)\(.*\)\(Svc.*|\)/\3\2\1/'
As pointed out in the comment from #Kent, this will not work if the strings were not in that order.
The format of MAC addresses varies with the platform.
E.g. on HPUX I could get something like:
0:0:c:7:ac:1e
While Linux gives me
00:00:0c:07:ac:1e
I used to use awk in a kornshell script on CentOS5 to format this to 00000c07ac1e like shown below.
MAC="0:0:c:7:ac:1e"
echo $MAC | awk -F: '{printf( "%02s%02s%02s%02s%02s%02s\n", $1,$2,$3,$4,$5,$6)}'
Unfortunately our admin server now is Ubuntu 14LTS with a newer version of awk which doesn't support the zero padding in the %s format anymore and I get an undesired 0 0 c 7ac1e
So I now switched to perl and do:
echo $MAC | perl -ne '{#A=split(":"); printf( "%02s%02s%02s%02s%02s%02s", #A)}'
As this may break too in upcoming releases I am looking for a more robust but still compact way to format the string.
Your Perl snippet will not break in future releases. This is basic functionality. Changing it will break many, many programs. (Plus, Perl has a mechanism for introducing backwards incompatible changes without breaking existing program.)
Cleaned up:
echo "$MAC" | perl -ne'#F=split(/:/); printf("%02s%02s%02s%02s%02s%02s\n", #F)'
Shorter:
echo "$MAC" | perl -ne'printf "%02s%02s%02s%02s%02s%02s\n", split /:/'
Without the repetition:
echo "$MAC" | perl -ple'$_ = join ":", map sprintf("%02s", $_), split /:/'
There's -a if you want something more awkish:
echo "$MAC" | perl -F: -aple'$_ = join ":", map sprintf("%02s", $_), #F'
Bit long but should be pretty robust
awk -F: '{for(i=1;i<=NF;i++){while(length($i)<2)$i=0$i;printf "%s",$i;}print ""}'
How it works
1.Loop through fields
2.Whilst the field is less than 2 characters long add zeros to the front
3.print the field
4.print newline character at end.
If you were dealing with a number rather than hex, you could use %.Xd to indicate you want at least X digits.
$ awk -F: '{printf( "%.2d%.2d\n", $1, $2)}' <<< "0:23"
0023
^^
two digits
From The GNU Awk User’s Guide #5.5.3 Modifiers for printf Formats:
.prec
A period followed by an integer constant specifies the precision to
use when printing. The meaning of the precision varies by control
letter:
%d, %i, %o, %u, %x, %X
Minimum number of digits to print.
In this case, you need a more general approach to deal with each one of the blocks of the MAC address. You can loop through the elements and add a 0 in case their length is just 1:
awk -F: '{for (i=1;i<=NF;i++) #loop through the elements
{
if (length($i)==1) #if length is 1
printf("0") #add a 0
printf ("%s", $i) #print the rest
}
print "" #print a new line at the end
}' <<< "0:0:c:7:ac:1e"
This returns:
00000c07ac1e
^^ ^^ ^^
^^ ^^ ^^
Note awk '...' <<< "$MAC" is the same as echo "$MAC" | awk '...'.
I am trying to numerically sort a series of files output by the ls command which match the pattern either ABCDE1234A1789.RST.txt or ABCDE12345A1789.RST.txt by the '789' field.
In the example patterns above, ABCDE is the same for all files, 1234 or 12345 are digits that vary but are always either 4 or 5 digits in length. A1 is the same length for all files, but value can vary so unfortunately it can't be used as a delimiter. Everything after the first . is the same for all files. Something like:
ls -l *.RST.txt | sort -k +9.13 | awk '{print $9} ' > file-list.txt
will match the shorter filenames but not the longer ones because of the variable length of characters before the field I want to sort by.
Is there a way to accomplish sorting all files without first padding the shorter-length files to make them all the same length?
Perl to the rescue!
perl -e 'print "$_\n" for sort { substr($a, -11, 3) cmp substr($b, -11, 3) } glob "*.RST.txt"'
If your perl is more recent (5.10 or newer), you can shorten it to
perl -E 'say for sort { substr($a, -11, 3) cmp substr($b, -11, 3) } glob "*.RST.txt"'
Because of the parts of the filename which you've identified as unchanging, you can actually build a key which sort will use:
$ echo ABCDE{99999,8765,9876,345,654,23,21,2,3}A1789.RST.txt \
| fmt -w1 \
| sort -tE -k2,2n --debug
ABCDE2A1789.RST.txt
_
___________________
ABCDE3A1789.RST.txt
_
___________________
ABCDE21A1789.RST.txt
__
etc.
What this does is tell sort to separate the fields on character E, then use the 2nd field numerically. --debug arrived in coreutils 8.6, and can be very helpful in seeing exactly what sort is doing.
The conventional way to do this in bash is to extract your sort field. Except for the sort command, the following is implemented in pure bash alone:
sort_names_by_first_num() {
shopt -s extglob
for f; do
first_num="${f##+([^0-9])}";
first_num=${first_num%[^0-9]*};
[[ $first_num ]] && printf '%s\t%s\n' "$first_num" "$f"
done | sort -n | while IFS='' read -r name; do name=${name#*$'\t'}; printf '%s\n' "$name"; done
}
sort_names_by_first_num *.RST.txt
That said, newline-delimiting filenames (as this question seems to call for) is a bad practice: Filenames on UNIX filesystems are allowed to contain newlines within their names, so separating them by newlines within a list means your list is unable to contain a substantial subset of the range of valid names. It's much better practice to NUL-delimit your lists. Doing that would look like so:
sort_names_by_first_num() {
shopt -s extglob
for f; do
first_num="${f##+([^0-9])}";
first_num=${first_num%[^0-9]*};
[[ $first_num ]] && printf '%s\t%s\0' "$first_num" "$f"
done | sort -n -z | while IFS='' read -r -d '' name; do name=${name#*$'\t'}; printf '%s\0' "$name"; done
}
sort_names_by_first_num *.RST.txt
I'm trying to extract data/urls (in this case - someurl) from a file that contains them within some tag ie.
xyz>someurl>xyz
I don't mind using either awk or sed.
I think the best, easiest, way is with cut:
$ echo "xyz>someurl>xyz" | cut -d'>' -f2
someurl
With awk can be done like:
$ echo "xyz>someurl>xyz" | awk 'BEGIN { FS = ">" } ; { print $2 }'
someurl
And with sed is a little bit more tricky:
$ echo "xyz>someurl>xyz" | sed 's/\(.*\)>\(.*\)>\(.*\)/\2/g'
someurl
we get blocks of something1<something2<something3 and print the 2nd one.
grep was born to extract things:
kent$ echo "xyz>someurl>xyz"|grep -Po '>\K[^>]*(?=>)'
someurl
you could kill a fly with a bomb of course:
kent$ echo "xyz>someurl>xyz"|awk -F\> '$0=$2'
someurl
If your grep supports P option then you can use lookahead and lookbehind regular expression to identify the url.
$ echo "xyz>someurl>xyz" | grep -oP '(?<=xyz>).*(?=>xyz)'
someurl
This is just a sample to get you started not the final answer.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Character Lowercase to Uppercase in Shell Scripting
I have value as: james,adam,john I am trying to make it James,Adam,John (First character of each name should be Uppercase).
echo 'james,adam,john' | sed 's/\<./\u&/g'
is not working in all the systems. In one system its showing ok..but not ok in another system...
A="james adam john"
B=( $A )
echo "${B[#]^}"
its throwing some syntax error...So, i am doing it through a long query sing while loop, which is too lengthy.
Is there any shortcut way to do this?
There are many ways to define "beginning of a name". This method chooses any letter after a word boundary and transforms it to upper case. As a side effect, this will also work with names such as "Sue Ellen", or "Billy-Bob".
echo "james,adam,john" | perl -pe 's/(\b\pL)/\U$1/g'
With Perl:
echo "james,adam,john" | \
perl -ne 'print join(",", map{ ucfirst } split(/,/))'
You can use awk like this to capitalize first letter of every word in your input:
echo "james,adam,john" | awk 'BEGIN { RS=","; FS=""; ORS=","; OFS=""; }
{ $1=toupper($1); print $0; }'
OUTPUT
James,Adam,John
Same method as TLP but with GNU sed:
echo "james,adam,john,sue ellen,billy-bob" | sed -r 's/\b(.)/\u\1/g'
output:
James,Adam,John,Sue Ellen,Billy-Bob
If only the first letter should be capitalized, use this instead:
echo "james,adam,john,sue ellen,billy-bob" | sed 's/[^,]*/\u&/g'
output:
James,Adam,John,Sue ellen,Billy-bob