delete a column with awk or sed - sed

I have a file with three columns. I would like to delete the 3rd column(in-place editing). How can I do this with awk or sed?
123 abc 22.3
453 abg 56.7
1236 hjg 2.3
Desired output
123 abc
453 abg
1236 hjg

try this short thing:
awk '!($3="")' file

With GNU awk for inplace editing, \s/\S, and gensub() to delete
1) the FIRST field:
awk -i inplace '{sub(/^\S+\s*/,"")}1' file
or
awk -i inplace '{$0=gensub(/^\S+\s*/,"",1)}1' file
2) the LAST field:
awk -i inplace '{sub(/\s*\S+$/,"")}1' file
or
awk -i inplace '{$0=gensub(/\s*\S+$/,"",1)}1' file
3) the Nth field where N=3:
awk -i inplace '{$0=gensub(/\s*\S+/,"",3)}1' file
Without GNU awk you need a match()+substr() combo or multiple sub()s + vars to remove a middle field. See also Print all but the first three columns.

This might work for you (GNU sed):
sed -i -r 's/\S+//3' file
If you want to delete the white space before the 3rd field:
sed -i -r 's/(\s+)?\S+//3' file

It seems you could simply go with
awk '{print $1 " " $2}' file
This prints the two first fields of each line in your input file, separated with a space.

Try using cut... its fast and easy
First you have repeated spaces, you can squeeze those down to a single space between columns if thats what you want with tr -s ' '
If each column already has just one delimiter between it, you can use cut -d ' ' -f-2 to print fields (columns) <= 2.
for example if your data is in a file input.txt you can do one of the following:
cat input.txt | tr -s ' ' | cut -d ' ' -f-2
Or if you better reason about this problem by removing the 3rd column you can write the following
cat input.txt | tr -s ' ' | cut -d ' ' --complement -f3
cut is pretty powerful, you can also extract ranges of bytes, or characters, in addition to columns
excerpt from the man page on the syntax of how to specify the list range
Each LIST is made up of one range, or many ranges separated by commas.
Selected input is written in the same order that it is read, and is
written exactly once. Each range is one of:
N N'th byte, character or field, counted from 1
N- from N'th byte, character or field, to end of line
N-M from N'th to M'th (included) byte, character or field
-M from first to M'th (included) byte, character or field
so you also could have said you want specific columns 1 and 2 with...
cat input.txt | tr -s ' ' | cut -d ' ' -f1,2

Try this :
awk '$3="";1' file.txt > new_file && mv new_file file.txt
or
awk '{$3="";print}' file.txt > new_file && mv new_file file.txt

Try
awk '{$3=""; print $0}'

If you're open to a Perl solution...
perl -ane 'print "$F[0] $F[1]\n"' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the following perl code

Related

Convert TSV to CSV, where TSV field has commas in it

I have a TSV with fields that look like:
name location 1,2,3,4,5
When I use sed 's/\w/,/g'
i end up with a csv where 1,2,3,4 and 5 are considered seperate entrys.
I would like it to be '1 2 3 4 5'
I've tried converting commas to white space before running the above command using
sed 's/,/\w/g'
however when converting the whitespace back to commas it includes single white spaces as well as the tabs, so what is the regex for just a single whitespace character?
Desired output:
name, location,1 2 3 4 5,
As mentionned in a comment CSV usually deals with occurences of its separator character in values by enclosing the value in quotes, so I suggest you simply deal with this by enclosing every value in quotes :
sed -E 's/([^\t]*)(\t|$)/"\1",/g'
You can try it here.
This leaves a trailing comma as in your sample output, if you want to avoid it you can use the following :
sed -E 's/\t+$//;s/^/"/;s/\t/","/g;s/$/"/'
If your original data contains " you will however need to escape those, which you can achieve by adding the following substitution before the other(s) :
s/"/\\"/g
As Ed Morton suggests we can also strip the trailing empty fields :
s/\t+$//
In conclusion I'd use the following :
sed -E 's/"/\\"/g;s/\t+$//;s/^/"/;s/\t/","/g;s/$/"/'
which you can try here.
Either replace tabs with "," and enclose lines between double quotes, or replace commas with spaces and tabs with commas. In both cases you'll get valid CSV.
$ cat file
name location 1,2,3,4,5
$
$ sed 's/\t/","/g; s/^\|$/"/g' file
"name","location","1,2,3,4,5"
$
$ sed 's/,/ /g; s/\t/,/g' file
name,location,1 2 3 4 5
And in awk:
$ awk -v OFS="," '{for(i=1;i<=NF;i++)if($i~/,/)$i="\"" $i "\"";$1=$1}1' file
name,location,"1,2,3,4,5"
Explained:
$ awk -v OFS="," '{ # output delimiter to a comma *
for(i=1;i<=NF;i++) # loop all fields
if($i~/,/) # if comma in field
$i="\"" $i "\"" # surround with quotes **
$1=$1 # rebuild record
}1' file # output
* if there is space in the record, consider input field separator to a tab with awk -F"\t".
** also, if there are quotes in the fields with commas, maybe they should be duplicated or escaped.
Depending on your real requirements:
$ awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) $i="\""$i"\""} 1' file
"name","location","1,2,3,4,5"
$ awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(OFS," ",$i); $1=$1} 1' file
name,location,1 2 3 4 5
$ awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(OFS," ",$i); $(NF+1)=""} 1' file
name,location,1 2 3 4 5,
$ echo 'a"b' | awk -F'\t' -v OFS=',' '{for (i=1;i<=NF;i++) { gsub(/"/,"\"\"",$i); $i="\""$i"\"" } } 1'
"a""b"
sed 's/\t/","/g; s/^\|$/"/g' file
doesn't work in MacOS
Instead use
sed 's/\t/","/g;s/^/"/;s/$/"/' file for MacOS.

tcsh & sed: no output

I’m trying to replace the 3rd column of a file for itself plus the value of column 2 (without any space). I get the proper value for variable c and a but then sed doesn't give any output. Any clue?
#!/bin/tcsh
setenv c `cat lig_mod.pdb | awk '{print $3}'`
echo $c
setenv a `cat lig_mod.pdb | awk '{print $3=$3$2}'`
echo $a
sed -i "" 's/^'"${c}"'$/^'"${a}"'$/g' lig_mod.pdb
Even though awk is usually better for columns parsing this one-liner sed can work for you as well:
sed -i 's/ \(\w*\) \(\w*\) / \1 \2\1 /1' lig_mod.pdb
the '/1' at the end denote the instance number you desire to change which for the 2nd and 3rd columns is the first, but you could use it for any adjacent columns.

How can I remove lines in which length of a field exceeds some threshold using awk or sed?

I have the file test.txt similar to that:
aa:bbbbbb:22.3
a:bb:33.2
a:bbbb:22.3
aaaa:bb:39.9
I know how to count and sort them like:
awk -F ':' '{print $2}' test.txt | awk '{print length($0),$0}' | sort -nr
Now I want to remove the 1st and 3rd lines from the file because the length of the second field (containing "b") in those lines is larger than 3. How can I do that using awk/sed? Thanks.
With awk:
This will output the lines whose 2nd field is >3:
$ awk -F: 'length($2)>3' file
aa:bbbbbb:22.3
a:bbbb:22.3
To do the opposite:
$ awk -F: 'length($2)<=3' file
a:bb:33.2
aaaa:bb:39.9
Code for sed:
sed '/.*:..:.*/!d' file
or more general:
sed '/.*:.\{2\}:.*/!d' file

Sed or awk: how to call line addresses from separate file?

I have 'file1' with (say) 100 lines. I want to use sed or awk to print lines 23, 71 and 84 (for example) to 'file2'. Those 3 line numbers are in a separate file, 'list', with each number on a separate line.
When I use either of these commands, only line 84 gets printed:
for i in $(cat list); do sed -n "${i}p" file1 > file2; done
for i in $(cat list); do awk 'NR==x {print}' x=$i file1 > file2; done
Can a for loop be used in this way to supply line addresses to sed or awk?
This might work for you (GNU sed):
sed 's/.*/&p/' list | sed -nf - file1 >file2
Use list to build a sed script.
You need to do > after the loop in order to capture everything. Since you are using it inside the loop, the file gets overwritten. Inside the loop you need to do >>.
Good practice is to or use > outside the loop so the file is not open for writing during every loop iteration.
However, you can do everything in awk without for loop.
awk 'NR==FNR{a[$1]++;next}FNR in a' list file1 > file2
You have to >>(append to the file) . But you are overwriting the file. That is why, You are always getting 84 line only in the file2.
Try use,
for i in $(cat list); do sed -n "${i}p" file1 >> file2; done
With sed:
sed -n $(sed -e 's/^/-e /' -e 's/$/p/' list) input
given the example input, the inner command create a string like this: `
-e 23p
-e 71p
-e 84p
so the outer sed then prints out given lines
You can avoid running sed/awk in a for/while loop altgether:
# store all lines numbers in a variable using pipe
lines=$(echo $(<list) | sed 's/ /|/g')
# print lines of specified line numbers and store output
awk -v lineS="^($lines)$" 'NR ~ lineS' file1 > out

AWK/SED. How to remove parentheses in simple text file

I have a text file looking like this:
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02) ... and so on.
I would like to modify the file by removing all the parenthesis and a new line for each couple
so that it look like this:
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
...
A simple way to do that?
Any help is appreciated,
Fred
I would use tr for this job:
cat in_file | tr -d '()' > out_file
With the -d switch it just deletes any characters in the given set.
To add new lines you could pipe it through two trs:
cat in_file | tr -d '(' | tr ')' '\n' > out_file
As was said, almost:
sed 's/[()]//g' inputfile > outputfile
or in awk:
awk '{gsub(/[()]/,""); print;}' inputfile > outputfile
This would work -
awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' inputfile > outputfile
Test:
[jaypal:~/Temp] cat file
(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)
[jaypal:~/Temp] awk -v FS="[()]" '{for (i=2;i<=NF;i+=2) print $i }' file
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
This might work for you:
echo "(-9.1744438E-02,7.6282293E-02) (-9.1744438E-02,7.6282293E-02)" |
sed 's/) (/\n/;s/[()]//g'
-9.1744438E-02,7.6282293E-02
-9.1744438E-02,7.6282293E-02
Guess we all know this, but just to emphasize:
Usage of bash commands is better in terms of time taken for execution, than using awk or sed to do the same job. For instance, try not to use sed/awk where grep can suffice.
In this particular case, I created a file 100000 lines long file, each containing characters "(" as well as ")". Then ran
$ /usr/bin/time -f%E -o log cat file | tr -d "()"
and again,
$ /usr/bin/time -f%E -ao log sed 's/[()]//g' file
And the results were:
05.44 sec : Using tr
05.57 sec : Using sed
cat in_file | sed 's/[()]//g' > out_file
Due to formatting issues, it is not entirely clear from your question whether you also need to insert newlines.