sed + count words on field 3 - sed

I used the following awk in order to count all words that appears in field 4
awk '{print $4}' file | awk '{print NF}' | grep -c 1
How we can to the same in sed?
Example of file:
1 2 3 4
1 2
1 2 3 4 5
1 2
1 2 3
1 2 3 4
From file sed should return the results 3 (three words on field 4)
yael

First of all, your awk is quite inefficient. Try this:
awk '$4{c++}END{print c}' file
Why do you want it in sed, BTW? This is what awk does well. If you really want it in sed, I guess something like this:
sed '/^\s*\S*\s*\S*\s*\S*\s*$/d' file | wc -l
awk explanation: In every line where fourth field is non-null, increment c. At the end, print c.
sed explanation: delete each line which matches the regexp. Then with wc count the lines of the sed output. The regexp basically says there can be maximum of two whitespace groups in the line, not counting initial and final ones, which then means there can be at most 3 fields in the line.

cut can also be used:
cut -f 5 -d' ' file | wc -w
Select the 5. column (the first one is empty due to the leading blank). The delimiter is a blank.

This might work for you:
sed 's/ *[^ ]*/&/4;t;d;' file | sed -n '$='

Related

How to replace every 2nd tab character with a newline character using sed

given the input
123\t456\tabc\tdef
create the output
123\t456\nabc\tdef
which would display like
123 456
abc def
Note that it needs to work across multiple lines, not just two.
EDIT
a better example might help clarify.
input (there is only expected to be 1 line of input)
1\t2\t3\t4\t5\t6\t7\t8
expected output
1 2
3 4
5 6
7 8
...
With GNU sed:
sed 's/\t/\n/2;P;D;' file
Replaces second occurrence of tab character with newline character.
This little trick should work:
sed 's/\(\t[^\t]*\)\t/\1\n/g' < input_file.txt
EDIT:
Below is an example:
$ cat 1.txt
one two three four five six seven
five six seven
$ sed 's/\(\t[^\t]*\)\t/\1\n/g' < 1.txt
one two
three four
five six
seven
five six
seven
$
EDIT2:
For MacOS' standard sed try this:
$ sed $'s/(\t[^\t]*\t/\\1\\\n/g' < 1.txt
$ is used for replacing escape characters on the bash-level.
Let's say following is the Input_file:
cat Input_file
123 456 abc def
Then to get them into 2 columns following may help you in same.
xargs -n2 < Input_file
Output will be as follows.
123 456
abc def

Insert filename into text file with sed

I've been learning about sed and finding it very useful, but cannot find an answer to this in any of the many guides and examples ... I'd like to insert the filename of a text file, minus its path and extension, into a specific line within the text itself. Possible?
In such cases, the correct starting point should be man pages. Manual of sed does not provide a feature for sed to understand "filename", but sed does support inserting a text before/after a line.
As a result you need to isolate the filename separatelly , store the text to a variable and inject this text after/before the line you wish.
Example:
$ a="/home/gv/Desktop/PythonTests/cpu.sh"
$ a="${a##*/}";echo "$a"
cpu.sh
$ a="${a%.*}"; echo "$a"
cpu
$ cat file1
LOCATION 0 X 0
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a $a" file1 # Inject the contents of variable $a after line2
LOCATION 0 X 0
VALUE 1a 2 3
cpu
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2i $a" file1 # Inject the contetns of variable $a before line2
LOCATION 0 X 0
cpu
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a George" file1 #Inject a fixed string "George" after line 2
LOCATION 0 X 0
VALUE 1a 2 3
George
VALUE 1b 2 3
VALUE 1c 2 3
Explanation:
a="${a##*/}" : Removes all chars from the beginning of string up to last found slash / (longer match)
a="${a%.*}" : Remove all chars starting from the end of the string up to the first found dot . (short match) . You can also use %% for the longest found dot.
sed "2a $a" : Insert after line 2 the contents of variable $a
sed "2i $q" : Insert before line 2 the contents of $a
Optionally you can use sed -i to make changes in-place / in file under process
wrt I've been learning about sed then you may have been wasting your time as there isn't a lot TO learn about sed beyond s/old/new. Sure there's a ton of other language constructs and things you could do with sed, but in practice you should avoid them all and simply use awk instead. If you edit your question to include concise, testable sample input and expected output and add an awk tag then we can show you how to do whatever you want to do the right way.
Meanwhile, it sounds like all you need is:
$ cat /usr/tmp/file
a
b
c
d
e
$ awk 'NR==3{print gensub(/.*\//,"",1,FILENAME)} 1' /usr/tmp/file
a
b
file
c
d
e
The above inserts the current file name before line 3 of the open file. It uses GNU awk for gensub(), with other awks you'd just use sub() and a variable.

How to remove empty lines to one empty line between sentences in text files?

I have a text file with many empty lines between sentences. I used sed, gawk, grep but they dont work. :(. How can I do now? Thanks.
Myfile: Desired file:
a a
b b
c c
. .
d d
e e
f f
g g
. .
h
i
h j
i k
j .
k
.
You can use awk for this:
awk 'BEGIN{prev="x"}
/^$/ {if (prev==""){next}}
{prev=$0;print}' inputFile
or the compressed one liner:
awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}' inFl
This is a simple state machine that collapses multi-blank-lines into a single one.
The basic idea is this. First, set the previous line to be non-empty.
Then, for every line in the file, if it and the previous one are blank, just throw it away.
Otherwise, set the previous line to that value, print the line, and carry on.
Sample transcript, the following command:
$ echo '1
2
3
4
5
6
7
8
9
10' | awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}'
outputs:
1
2
3
4
5
6
7
8
9
10
Keep in mind that this is for truly blank lines (no content). If you're trying to collapse lines that have an arbitrary number of spaces or tabs, that will be a little trickier.
In that case, you could pipe the file through something like:
sed 's/^\s*$//'
to ensure lines with just whitespace become truly empty.
In other words, something like:
sed 's/^\s*$//' infile | awk 'my previous awk command'
To suppress repeated empty output lines with GNU cat:
cat -s file1 > file2
Here's one way using sed:
sed ':a; N; $!ba; s/\n\n\+/\n\n/g' file
Otherwise, if you don't mind a trailing blank line, all you need is:
awk '1' RS= ORS="\n\n" file
The Perl solution is even shorter:
perl -00 -pe '' file
You could do like this also,
awk -v RS="\0" '{gsub(/\n\n+/,"\n\n");}1' file
Explanation:
RS="\0" Once we set the null character as Record Seperator value, awk will read the whole file as single record.
gsub(/\n\n+/,"\n\n"); this replaces one or more blank lines with a single blank line. Note that \n\n regex matches a blank line along with the previous line's new line character.
Here is an other awk
awk -v p=1 'p=="" {p=1;next} 1; {p=$0}' file

Should I use cut or awk to extract fields and field substrings?

I have a file with pipe-separated fields. I want to print a subset of field 1 and all of field 2:
cat tmpfile.txt
# 10 chars.|variable length num|text
ABCDEFGHIJ|99|U|HOMEWORK
JIDVESDFXW|8|C|CHORES
DDFEXFEWEW|73|B|AFTER-HOURS
I'd like the output to look like this:
# 6 chars.|variable length num
ABCDEF|99
JIDVES|8
DDFEXF|73
I know how to get fields 1 & 2:
cat tmpfile.txt | awk '{FS="|"} {print $1"|"$2}'
And know how to get the first 6 characters of field 1:
cat tmpfile.txt | cut -c 1-6
I know this is fairly simple, but I can't figure out is how to combine the awk and cut commands.
Any suggestions would be greatly appreciated.
You could use awk. Use the substr() function to trim the first field:
awk -F'|' '{print substr($1,1,6),$2}' OFS='|' inputfile
For your input, it'd produce:
ABCDEF|99
JIDVES|8
DDFEXF|73
Using sed, you could say:
sed -r 's/^(.{6})[^|]*([|][^|]*).*/\1\2/' inputfile
to produce the same output.
You could use cut and paste, but then you have to read the file twice, which is a big deal if the file is very large:
paste -d '|' <(cut -c 1-6 tmpfile.txt ) <(cut -d '|' -f2 tmpfile.txt )
Just for another variation: awk -F\| -vOFS=\| '{print $1,$2}' t.in | cut -c 1-6,11-
Also, as tripleee points out, two cuts can do this too: cut -c 1-6,11- t.in | cut -d\| -f 1,2
I like a combination of cut and sed, but that's just a preference:
cut -f1-2 -d"|" tmpfile.txt|sed 's/\([A-Z]\{6\}\)[A-Z]\{4\}/\1/g'
Result:
# 10-digits|variable length num
ABCDEF|99
JIDVES|8
DDFEXF|73
Edit: (Removed the useless cat) Thanks!

Keep the content of a text with specific same columns in command line

Basically I tried to operate files in command line like this:
File1:
,1,this is some content,
,2,another content,
,3,blablabla,
,4,xxxxxxxx,
,5,yyyyyyyy,
,6,zzzzzzzzzz,
... ...
File2:
1
3
4
5
Now I want to keep the content of file1 with the same column numbers in file2, so the output should be:
,1,this is some content,
,3,blablabla,
,4,xxxxxxxx,
,5,yyyyyyyy,
I used comm -3 file1 file2 but it doesn't work. Then I tried sed but also didn't work. Is there any other handy tool?
The following will work on the example as given - it won't work if numbers appear in your string after the comma:
grep -F -f File2 File1
An alternative would be
join -t, -1 2 -2 1 -o 1.1, 1.2, 1.3 File1 File2
Here is how that works:
-t, considers the `,` as terminator
-1 2 look at the second column in file 1
-2 1 look at the first column in file 2
-o 1.1, 1.2, 1.3 output the first, second, third column of file 1
This still has the drawback that if there are multiple commas in the text that follows, it terminates after the first comma ("field 3" is the last one output).
Fixing that issue requires the use of xargs:
join -t, -1 2 -2 1 -o 1.1, 1.2 File1 File2 | xargs -Ixx grep xx File1
Explanation:
-Ixx : replace the string xx in the command that follows with each of the output lines from the preceding command; the execute that command for each line. This means we will find the lines that match the first ,number, which should make us insensitive to anything else.