Converting hex to decimal in awk or sed - sed

I have a list of numbers, comma-separated:
123711184642,02,3583090366663629,639f02012437d4
123715942138,01,3538710295145500,639f02afd6c643
123711616258,02,3548370476972758,639f0200485732
I need to split the 3rd column into three as below:
123711184642,02,3583090366663629,639f02,0124,37d4
123715942138,01,3538710295145500,639f02,afd6,c643
123711616258,02,3548370476972758,639f02,0048,5732
And convert the digits in the last two columns into decimal:
123711184642,02,3583090366663629,639f02,292,14292
123715942138,01,3538710295145500,639f02,45014,50755
123711616258,02,3548370476972758,639f02,72,22322

Here's a variation on Jonathan's answer:
awk $([[ $(awk --version) = GNU* ]] && echo --non-decimal-data) -F, '
BEGIN {OFS = FS}
{
$6 = sprintf("%d", "0x" substr($4, 11, 4))
$5 = sprintf("%d", "0x" substr($4, 7, 4))
$4 = substr($4, 1, 6)
print
}'
I included a rather contorted way of adding the --non-decimal-data option if it's needed.
Edit
Just for the heck of it, here's the pure-Bash equivalent:
saveIFS=$IFS
IFS=,
while read -r -a line
do
printf '%s,%s,%d,%d\n' "${line[*]:0:3}" "${line[3]:0:6}" "0x${line[3]:6:4}" "0x${line[3]:10:4}"
done
IFS=$saveIFS
The "${line[*]:0:3}" (quoted *) works similarly to AWK's OFS in that it causes Bash's IFS (here a comma) to be inserted between array elements on output. We can take further advantage of that feature by inserting array elements as follows which more closely parallels my AWK version above.
saveIFS=$IFS
IFS=,
while read -r -a line
do
line[6]=$(printf '%d' "0x${line[3]:10:4}")
line[5]=$(printf '%d' "0x${line[3]:6:4}")
line[4]=$(printf '%s' "${line[3]:0:6}")
printf '%s\n' "${line[*]}"
done
IFS=$saveIFS
Unfortunately, Bash doesn't allow printf -v (which is similar to sprintf()) to make assignments to array elements, so printf -v "line[6]" ... doesn't work.
Edit: As of Bash 4.1, printf -v can now make assignments to array elements. Example:
printf -v 'line[6]' '%d' "0x${line[3]:10:4}"
The quotes around the array reference are needed to prevent possible filename matching. If a file named "line6" existed in the current directory and the reference wasn't quoted, then a variable named line6 would be created (or updated) containing the printf output. Nothing else about the file, such as its contents, would come into play. Only the name - and only tangentially.

Foreword
In this answer I address converting hex numbers by AWK in general, not specifically in the case of the question.
In the following examples the first field (i.e. $1) of each record given to the interpreter is converted. Only hexadecimal digits are allowed in the input, not the "0x" prefix.
By GNU Awk arbitrary great hex values can be converted simply
If gawk is complied to use the GNU MPFR and GMP libraries, it can do arbitrary precision arithmetic numbers, when option -M is used.
gawk -M '{print strtonum("0x" $1)}'
By AWK portably
Using --non-decimal-data for gawk is not recommended according to GNU Awk User's Guide. And also using strtonum() is not portable but it is supported by gawk only as far as I know. So lets look at alternatives:
By user-defined function
Supposedly the most portable way of doing conversion is by a user-defined awk function [reference]:
function parsehex(V,OUT)
{
if(V ~ /^0x/) V=substr(V,3);
for(N=1; N<=length(V); N++)
OUT=(OUT*16) + H[substr(V, N, 1)]
return(OUT)
}
BEGIN { for(N=0; N<16; N++)
{ H[sprintf("%x",N)]=N; H[sprintf("%X",N)]=N } }
{ print parsehex($1) }
Note: You can convert greater hex numbers by replacing return(OUT) by return(sprintf("%.0f", OUT)), if your AWK interpreter does support only 32-bit integers; I could convert 0xFFFFFFFFFFFFF = 2^52-1 this way. The function ignores possible "0x" prefix.
By calling shell's printf
You could use this
awk '{cmd="printf %d 0x" $1; cmd | getline decimal; close(cmd); print decimal}'
but it is relatively slow as it requires starting a subshell. The following one is faster, if you have many newline-separated hexadecimal numbers to convert:
awk 'BEGIN{cmd="printf \"%d\n\""}{cmd=cmd " 0x" $1}END{while ((cmd | getline dec) > 0) { print dec }; close(cmd)}'
There might be a problem if very many arguments are added for the single printf command.
Also these methods have limitation on how large hex numbers they can convert. I could convert 0xFFFFFFFFFFFFFFF = 2^60-1 in my system.
By using AWK's printf (or sprintf)
In my experience the following works in Linux:
awk -Wposix '{ printf "%d\n", "0x" $1 }'
I tested it by gawk, mawk and original-awk in Ubuntu Linux 20.04. gawk requires -Wposix here. original-awk displays a warning message about the option, but you can hide it by redirection directive 2>/dev/null in shell. If you don't want to do that, you can make it use -Wposix merely with GNU Awk like this:
awk -Wversion 2>/dev/null | ( unset -v IFS; read -r word _; [ "$word" = GNU ] && exit 0 || exit 1 ) && posix_option="-Wposix" || posix_option=""
awk $posix_option '{ printf "%d\n", "0x" $1 }'
Note: Yet again implementation or your interpreter does limit the maximum hex value that can be converted by this way. E.g. mawk in my system has maximum-integer 2147483647; this is told in standard error output of mawk -Wversion (at least for version 1.3.4). You can convert greater hex numbers by replacing printf "%d\n", "0x" $1 by printf "%.0f\n", "0x" $1; I could convert 0xFFFFFFFFFFFFF = 2^52-1 this way.

This seems to work:
awk -F, '{ p1 = substr($4, 1, 6);
p2 = ("0x" substr($4, 7, 4)) + 0;
p3 = ("0x" substr($4, 11, 4)) + 0;
printf "%s,%s,%s,%s,%d,%d\n", $1, $2, $3, p1, p2, p3;
}'
For your sample input data, it produces:
123711184642,02,3583090366663629,639f02,292,14292
123715942138,01,3538710295145500,639f02,45014,50755
123711616258,02,3548370476972758,639f02,72,22322
The string concatenation of '0x' plus the 4-digit hex followed by adding 0 forces awk to treat the numbers as hexadecimals.
You can simplify this to:
awk -F, '{ p1 = substr($4, 1, 6);
p2 = "0x" substr($4, 7, 4);
p3 = "0x" substr($4, 11, 4);
printf "%s,%s,%s,%s,%d,%d\n", $1, $2, $3, p1, p2, p3;
}'
The strings prefixed with 0x are forced to integer when presented to printf() and the %d format.
The code above works beautifully with the native awk on MacOS X 10.6.5 (version 20070501); sadly, it does not work with GNU gawk 3.1.7. That, it seems, is permitted behaviour according to POSIX (see the comments below). However, gawk has a non-standard function strtonum that can be used to bludgeon it into performing correctly - pity that bludgeoning is necessary.
gawk -F, '{ p1 = substr($4, 1, 6);
p2 = "0x" substr($4, 7, 4);
p3 = "0x" substr($4, 11, 4);
printf "%s,%s,%s,%s,%d,%d\n", $1, $2, $3, p1, strtonum(p2), strtonum(p3);
}'

printf "%d\n", strtonum( "0x"$1 )"

This might work for you (GNU sed & printf):
sed -r 's/(....)(....)$/ 0x\1 0x\2/;s/.*/printf "%s,%d,%d" &/e' file
Split the last eight characters and add spaces preceeding the fields by the hex identifier and then evaluate the whole line using printf.

cat all_info_List.csv| awk 'BEGIN {FS="|"}{print $21}'| awk 'BEGIN {FS=":"}{p1=$1":"$2":"$3":"$4":"$5":"; p2 = strtonum("0x"$6); printf("%s%02X\n",p1,p2+1) }'
The above command prints the contents of "all_info_List.csv", a file where the field separator is "|".
Then takes field 21 (MAC address) and splits it using field separator ":".
It assigns to variable "p1" the first 5 bytes of each mac address, so if we had this mac address:"11:22:33:44:55:66", p1 would be: "11:22:33:44:55:".
p2 is assigned with the decimal value of the last byte: "0x66" would assign "102" decimal to p2.
Finally, I'm using printf to join p1 and p2, while converting p2 back to hex, after adding one to it.

--- My 5 Cents
I just want to add my 5 Cents in case this topic is still of interest. From the comments in the thread
I take it, there still is. Hope it helps:
Challenge: Convert a hex number to decimal on a Apple M1 laptop running the latest MacOS (2022)
With the following versions on MacOS
% uname -a
Darwin macbook 22.1.0 Darwin Kernel Version 22.1.0: Sun Oct 9 20:15:09 PDT 2022; root:xnu-8792.41.9~2/RELEASE_ARM64_T6000 arm64 arm Darwin
% gawk --version
GNU Awk 5.2.1, API 3.2, (GNU MPFR 4.1.0-p13, GNU MP 6.2.1)
Copyright (C) 1989, 1991-2022 Free Software Foundation.
--- gawk -Wposix needed
% echo "116B" | gawk '{p = ("0x" substr($1, 1, 4)) +0; printf("%d\n", p )}'
0
% echo "116B" | gawk -Wposix '{p = ("0x" substr($1, 1, 4)) +0; printf("%d\n", p )}'
4459
--- Some simplifications also work
% echo "116B" | gawk -Wposix '{p = "0x" substr($1, 1, 4); printf("%d\n", p )}'
4459
% echo "116B" | gawk -Wposix '{printf("%d\n", "0x" substr($1, 1, 4))}'
4459
--- Checking...
% echo "4459" | gawk '{printf("%X\n", $1 )}'
116B
--- This form is what I was looking for
% echo "00:11:6BX" | gawk -Wposix '{printf("%d\n", "0x" substr($1, 1, 2) substr($1, 4, 2) substr($1, 7, 2))}'
4459

this should be a cleaner approach than perl python or printf :
echo 0x7E07E30EAAC59DB8EB9FDAD2EE818EA7AEB70192DAE552AD06B9FE
593BE89BC258483EA07C972B0FE7BA0D7B6CAC6DF338571F49CABB
DD195629411CDF0F88858EC39F01AE181E60A4F0DAF5F4F0E86991
82243BDF159AB588F11E3FF68E799509128EA7BA957B62DF103D0E
B2C3195DA1CCDFDD0CAF0E9958C1AF3E2B6993AA74C255B711BE38
DB031B26A596EFE19051A864000FB99F161923F12C2F9F40F18B6E
064CCCAE4C0776D0EB815947A30AB68B1CF12CA6622CAECA530221
2C27FD1579178363FE2E87B1F02FC0FDFFF |
gawk -nMbe '$++NF = +$!_' OFS='\n\n'
1 0x7E07E30EAAC59DB8EB9FDAD2EE818EA7AEB70192DAE552AD06B9FE
593BE89BC258483EA07C972B0FE7BA0D7B6CAC6DF338571F49CABB
DD195629411CDF0F88858EC39F01AE181E60A4F0DAF5F4F0E86991
82243BDF159AB588F11E3FF68E799509128EA7BA957B62DF103D0E
B2C3195DA1CCDFDD0CAF0E9958C1AF3E2B6993AA74C255B711BE38
DB031B26A596EFE19051A864000FB99F161923F12C2F9F40F18B6E
064CCCAE4C0776D0EB815947A30AB68B1CF12CA6622CAECA530221
2C27FD1579178363FE2E87B1F02FC0FDFFF
2 985801769662049290799836483751359680713382803597807741
342261221390727037343867491391068497002991150267570021
888625408701957708383236015057159917981445085171196540
056449671723413767151987807183076995694938175592905407
706727043644590485574826597324100590757487981303537403
481578192766548120367625144822345612103264180960846560
558546717739085751660018602037450619797709845938562717
870137791128285871274530893277287577788311030033741131
093413810677239057304751530532826551215693481438241043
55789791231
in case you're wondering, this number is a Mersenne prime to the power of another Mersenne prime :
8191 ^ 127
And the 2 primes closest to it should be
8191 ^ 127 - ( 16 + 512 )
8191 ^ 127 + ( 1450 )

Perl version, with a tip of the hat to #Jonathan:
perl -F, -lane '$p1 = substr($F[3], 0, 6); $p2 = substr($F[3], 6, 4); $p3 = substr($F[3], 10, 4); printf "%s,%s,%s,%s,%d,%d\n", #F[0..2], $p1, hex($p2), hex($p3)' file
-a turn on autosplit mode, to populate the #F array
-F, changes the autosplit separator to , (default is whitespace)
The substr() indices are 1 less than their awk equivalents, since Perl arrays start from 0.
Output:
123711184642,02,3583090366663629,639f02,292,14292
123715942138,01,3538710295145500,639f02,45014,50755
123711616258,02,3548370476972758,639f02,72,22322

Related

perl or awk: zero proof division with perl or awk

I have to add a field showing the difference in percentage between 2 fields in a file like:
BI,1266,908
BIL,494,414
BKC,597,380
BOOM,2638,654
BRER,1453,1525
BRIG,1080,763
DCLE,0,775
The output should be:
BI,1266,908,-28.3%
BIL,494,414,-16.2%
BKC,597,380,-36.35%
BOOM,2638,654,-75.2%
BRER,1453,1525,5%
BRIG,1080,763,-29.4%
DCLE,0,775,-
Note the zero in the last row. Either of these fields could be zero. If a zero is present in either field, N/A or - is acceptable.
What I'm trying --
Perl:
perl -F, -ane 'if ($F[2] > 0 || $F[3] > 0){print $F[0],",",$F[1],",",$F[2],100*($F[2]/$F[3])}' file
I get Illegal division by zero at -e line 1, <> line 2. If I change the || to && it prints nothing.
In awk:
awk '$2>0{$4=sprintf("%d(%.2f%)", $3, ($3/$2)*100)}1' file
Just prints the file.
$ awk -F, '$2 == 0 || $3 == 0 { printf("%s,-\n", $0); next }
{ printf("%s,%.2f%%\n", $0, 100 * ($3 / $2) - 100) }' input.csv
BI,1266,908,-28.28%
BIL,494,414,-16.19%
BKC,597,380,-36.35%
BOOM,2638,654,-75.21%
BRER,1453,1525,4.96%
BRIG,1080,763,-29.35%
DCLE,0,775,-
How it works: if the second or third columns are equal to 0, add a - field to the line. Otherwise, calculate the percentage difference and add that.
Your perl's main issue was confusing awk's 1-based column indexes with perl's 0-based column indexes.
perl -F, -ane 'print "$1," if /(.+)/;if ($F[1] > 0 && $F[2] > 0){printf ("%.2f%", ((100*$F[2]/$F[1])-100)) } else {print "-"};print "\n"' file
The $1 here refers to the capture group (.+) which means "The whole line but the linefeed". The rest is probably self-explanatory if you understand the awk.
You're not telling awk that the fields are separated by commas so it's assuming the default, spaces, and so $2 is never greater than zero because it's null as there's only 1 space-separated field per line. Change it to:
$ awk 'BEGIN{FS=OFS=","} $2>0{$4=sprintf("%d(%.2f%)", $3, ($3/$2)*100)}1' file
BI,1266,908,908(71.72%)
BIL,494,414,414(83.81%)
BKC,597,380,380(63.65%)
BOOM,2638,654,654(24.79%)
BRER,1453,1525,1525(104.96%)
BRIG,1080,763,763(70.65%)
DCLE,0,775
and then tweak it for your desired output:
$ awk 'BEGIN{FS=OFS=","} {$4=($2 && $3 ? sprintf("%.2f%", (($3/$2)-1)*100) : "N/A")} 1' file
BI,1266,908,-28.28%
BIL,494,414,-16.19%
BKC,597,380,-36.35%
BOOM,2638,654,-75.21%
BRER,1453,1525,4.96%
BRIG,1080,763,-29.35%
DCLE,0,775,N/A

printf zero padded string

The format of MAC addresses varies with the platform.
E.g. on HPUX I could get something like:
0:0:c:7:ac:1e
While Linux gives me
00:00:0c:07:ac:1e
I used to use awk in a kornshell script on CentOS5 to format this to 00000c07ac1e like shown below.
MAC="0:0:c:7:ac:1e"
echo $MAC | awk -F: '{printf( "%02s%02s%02s%02s%02s%02s\n", $1,$2,$3,$4,$5,$6)}'
Unfortunately our admin server now is Ubuntu 14LTS with a newer version of awk which doesn't support the zero padding in the %s format anymore and I get an undesired 0 0 c 7ac1e
So I now switched to perl and do:
echo $MAC | perl -ne '{#A=split(":"); printf( "%02s%02s%02s%02s%02s%02s", #A)}'
As this may break too in upcoming releases I am looking for a more robust but still compact way to format the string.
Your Perl snippet will not break in future releases. This is basic functionality. Changing it will break many, many programs. (Plus, Perl has a mechanism for introducing backwards incompatible changes without breaking existing program.)
Cleaned up:
echo "$MAC" | perl -ne'#F=split(/:/); printf("%02s%02s%02s%02s%02s%02s\n", #F)'
Shorter:
echo "$MAC" | perl -ne'printf "%02s%02s%02s%02s%02s%02s\n", split /:/'
Without the repetition:
echo "$MAC" | perl -ple'$_ = join ":", map sprintf("%02s", $_), split /:/'
There's -a if you want something more awkish:
echo "$MAC" | perl -F: -aple'$_ = join ":", map sprintf("%02s", $_), #F'
Bit long but should be pretty robust
awk -F: '{for(i=1;i<=NF;i++){while(length($i)<2)$i=0$i;printf "%s",$i;}print ""}'
How it works
1.Loop through fields
2.Whilst the field is less than 2 characters long add zeros to the front
3.print the field
4.print newline character at end.
If you were dealing with a number rather than hex, you could use %.Xd to indicate you want at least X digits.
$ awk -F: '{printf( "%.2d%.2d\n", $1, $2)}' <<< "0:23"
0023
^^
two digits
From The GNU Awk User’s Guide #5.5.3 Modifiers for printf Formats:
.prec
A period followed by an integer constant specifies the precision to
use when printing. The meaning of the precision varies by control
letter:
%d, %i, %o, %u, %x, %X
Minimum number of digits to print.
In this case, you need a more general approach to deal with each one of the blocks of the MAC address. You can loop through the elements and add a 0 in case their length is just 1:
awk -F: '{for (i=1;i<=NF;i++) #loop through the elements
{
if (length($i)==1) #if length is 1
printf("0") #add a 0
printf ("%s", $i) #print the rest
}
print "" #print a new line at the end
}' <<< "0:0:c:7:ac:1e"
This returns:
00000c07ac1e
^^ ^^ ^^
^^ ^^ ^^
Note awk '...' <<< "$MAC" is the same as echo "$MAC" | awk '...'.

sed/awk/cut/grep - Best way to extract string

I have a results.txt file that is structured in this format:
Uncharted 3: Javithaxx l Rampant l Graveyard l Team Deathmatch HD (D1VpWBaxR8c)
Matt Darey feat. Kate Louise Smith - See The Sun (Toby Hedges Remix) (EQHdC_gGnA0)
The Matrix State (SXP06Oax70o)
Above & Beyond - Group Therapy Radio 014 (guest Lange) (2013-02-08) (8aOdRACuXiU)
I want to create a new file extracting the youtube URL ID specified in the last characters in each line line "8aOdRACuXiU"
I'm trying to build a URL like this in a new file:
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Note, I appended the &hd=1 to the string that I am trying to be replaced. I have tried using Linux reverse and cut but reverse or rev munges my data. The hard part here is that each line in my text file will have entries with parentheses and I only care about getting the data between the last set of parentheses. Each line has a variable length so that isn't helpful either. What about using grep and .$ for the end of the line?
In summary, I want to extract the youtube ID from results.txt and export it to a new file in the following format: http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Using awk:
awk '{
v = substr( $NF, 2, length( $NF ) - 2 )
printf "%s%s%s\n", "http://www.youtube.com/watch?v=", v, "&hd=1"
}' infile
It yields:
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
$ sed 's!.*(\(.*\))!http://www.youtube.com/watch?v=\1\&hd=1!' results.txt
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Here, .*(\(.*\)) looks for the last occurrence of a pair of parentheses, and captures the characters inside those parentheses. The captured group is then inserted into the URL using \1.
Using a perl one-liner :
perl -lne 'printf "http://www.youtube.com/watch?v=%s&hd=1\n", $& if /[^\(]+(?=\)$)/' file.txt
Or multi-line version :
perl -lne '
printf(
"http://www.youtube.com/watch?v=%s&hd=1\n",
$&
) if /[^\(]+(?=\)$)/
' file.txt

divide each line in equal part

I would be happy if anyone can suggest me command (sed or AWK one line command) to divide each line of file in equal number of part. For example divide each line in 4 part.
Input:
ATGCATHLMNPHLNTPLML
Output:
ATGCA THLMN PHLNT PLML
This should work using GNU sed:
sed -r 's/(.{4})/\1 /g'
-r is needed to use extended regular expressions
.{4} captures every four characters
\1 refers to the captured group which is surrounded by the parenthesis ( ) and adds a space behind this group
g makes sure that the replacement is done as many times as possible on each line
A test; this is the input and output in my terminal:
$ echo "ATGCATHLMNPHLNTPLML" | sed -r 's/(.{4})/\1 /g'
ATGC ATHL MNPH LNTP LML
I suspect awk is not the best tool for this, but:
gawk --posix '{ l = sprintf( "%d", 1 + (length()-1)/4);
gsub( ".{"l"}", "& " ) } 1' input-file
If you have a posix compliant awk you can omit the --posix, but --posix is necessary for gnu awk and since that seems to be the most commonly used implementation I've given the solution in terms of gawk.
This might work for you (GNU sed):
sed 'h;s/./X/g;s/^\(.*\)\1\1\1/\1 \1 \1 \1/;G;s/\n/&&/;:a;/^\n/bb;/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta;s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta;:b;s/\n//g' file
Explanation:
h copy the pattern space (PS) to the hold space (HS)
s/./X/g replace every character in the HS with the same non-space character (in this case X)
s/^\(.*\)\1\1\1/\1 \1 \1 \1/ split the line into 4 parts (space separated)
G append a newline followed by the contents of the HS to the PS
s/\n/&&/ double the newline (to be later used as markers)
:a introduce a loop namespace
/^\n/bb if we reach a newline we are done and branch to the b namespace
/^ /s/ \(.*\n.*\)\n\(.\)/\1 \n\2/;ta; if the first character is a space add a space to the real line at this point and repeat
s/^.\(.*\n.*\)\n\(.\)/\1\2\n/;ta any other character just bump along and repeat
:b;s/\n//g all done just remove the markers and print out the result
This work for any length of line, however is the line is not exactly divisible by 4 the last portion will contain the remainder as well.
perl
perl might be a better choice here:
export cols=4
perl -ne 'chomp; $fw = 1 + int length()/$ENV{cols}; while(/(.{1,$fw})/gm) { print $1 . " " } print "\n"'
This re-calculates field-width for every line.
coreutils
A GNU coreutils alternative, field-width is chosen based on the first line of infile:
cols=4
len=$(( $(head -n1 infile | wc -c) - 1 ))
fw=$(echo "scale=0; 1 + $len / 4" | bc)
cut_arg=$(paste -d- <(seq 1 $fw 19) <(seq $fw $fw $len) | head -c-1 | tr '\n' ',')
Value of cut_arg is in the above case:
1-5,6-10,11-15,16-
Now cut the line into appropriate chunks:
cut --output-delimiter=' ' -c $cut_arg infile

Swap two columns - awk, sed, python, perl

I've got data in a large file (280 columns wide, 7 million lines long!) and I need to swap the first two columns. I think I could do this with some kind of awk for loop, to print $2, $1, then a range to the end of the file - but I don't know how to do the range part, and I can't print $2, $1, $3...$280! Most of the column swap answers I've seen here are specific to small files with a manageable number of columns, so I need something that doesn't depend on specifying every column number.
The file is tab delimited:
Affy-id chr 0 pos NA06984 NA06985 NA06986 NA06989
You can do this by swapping values of the first two fields:
awk ' { t = $1; $1 = $2; $2 = t; print; } ' input_file
I tried the answer of perreal with cygwin on a windows system with a tab separated file. It didn't work, because the standard separator is space.
If you encounter the same problem, try this instead:
awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file
Incoming separator is defined by -F $'\t' and the seperator for output by OFS=$'\t'.
awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file > output_file
Try this more relevant to your question :
awk '{printf("%s\t%s\n", $2, $1)}' inputfile
This might work for you (GNU sed):
sed -i 's/^\([^\t]*\t\)\([^\t]*\t\)/\2\1/' file
Have you tried using the cut command? E.g.
cat myhugefile | cut -c10-20,c1-9,c21- > myrearrangedhugefile
This is also easy in perl:
perl -pe 's/^(\S+)\t(\S+)/$2\t$1/;' file > outputfile
You could do this in Perl:
perl -F\\t -nlae 'print join("\t", #F[1,0,2..$#F])' inputfile
The -F specifies the delimiter. In most shells you need to precede a backslash with another to escape it. On some platforms -F automatically implies -n and -a so they can be dropped.
For your problem you wouldn't need to use -l because the last columns appears last in the output. But if in a different situation, if the last column needs to appear between other columns, the newline character must be removed. The -l switch takes care of this.
The "\t" in join can be changed to anything else to produce a different delimiter in the output.
2..$#F specifies a range from 2 until the last column. As you might have guessed, inside the square brackets, you can put any single column or range of columns in the desired order.
No need to call anything else but your shell:
bash> while read col1 col2 rest; do
echo $col2 $col1 $rest
done <input_file
Test:
bash> echo "first second a c d e f g" |
while read col1 col2 rest; do
echo $col2 $col1 $rest
done
second first a b c d e f g
Maybe even with "inlined" Python - as in a Python script within a shell script - but only if you want to do some more scripting with Bash beforehand or afterwards... Otherwise it is unnecessarily complex.
Content of script file process.sh:
#!/bin/bash
# inline Python script
read -r -d '' PYSCR << EOSCR
from __future__ import print_function
import codecs
import sys
encoding = "utf-8"
fn_in = sys.argv[1]
fn_out = sys.argv[2]
# print("Input:", fn_in)
# print("Output:", fn_out)
with codecs.open(fn_in, "r", encoding) as fp_in, \
codecs.open(fn_out, "w", encoding) as fp_out:
for line in fp_in:
# split into two columns and rest
col1, col2, rest = line.split("\t", 2)
# swap columns in output
fp_out.write("{}\t{}\t{}".format(col2, col1, rest))
EOSCR
# ---------------------
# do setup work?
# e. g. list files for processing
# call python script with params
python3 -c "$PYSCR" "$inputfile" "$outputfile"
# do some more processing
# e. g. rename outputfile to inputfile, ...
If you only need to swap the columns for a single file, then you can also just create a single Python script and statically define the filenames. Or just use an answer above.
awk swapping sans temp-variable :
echo '777777744444444464449: 317 647 14423 262927714037 : 0x2A29D5A1BAA7A95541' |
mawk '1; ($1 = $2 substr(_, ($2 = $1)^_))^_' FS=':' OFS=':'
777777744444444464449: 317 647 14423 262927714037 : 0x2A29D5A1BAA7A95541
317 647 14423 262927714037 :777777744444444464449: 0x2A29D5A1BAA7A95541