Insert filename into text file with sed - sed

I've been learning about sed and finding it very useful, but cannot find an answer to this in any of the many guides and examples ... I'd like to insert the filename of a text file, minus its path and extension, into a specific line within the text itself. Possible?

In such cases, the correct starting point should be man pages. Manual of sed does not provide a feature for sed to understand "filename", but sed does support inserting a text before/after a line.
As a result you need to isolate the filename separatelly , store the text to a variable and inject this text after/before the line you wish.
Example:
$ a="/home/gv/Desktop/PythonTests/cpu.sh"
$ a="${a##*/}";echo "$a"
cpu.sh
$ a="${a%.*}"; echo "$a"
cpu
$ cat file1
LOCATION 0 X 0
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a $a" file1 # Inject the contents of variable $a after line2
LOCATION 0 X 0
VALUE 1a 2 3
cpu
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2i $a" file1 # Inject the contetns of variable $a before line2
LOCATION 0 X 0
cpu
VALUE 1a 2 3
VALUE 1b 2 3
VALUE 1c 2 3
$ sed "2a George" file1 #Inject a fixed string "George" after line 2
LOCATION 0 X 0
VALUE 1a 2 3
George
VALUE 1b 2 3
VALUE 1c 2 3
Explanation:
a="${a##*/}" : Removes all chars from the beginning of string up to last found slash / (longer match)
a="${a%.*}" : Remove all chars starting from the end of the string up to the first found dot . (short match) . You can also use %% for the longest found dot.
sed "2a $a" : Insert after line 2 the contents of variable $a
sed "2i $q" : Insert before line 2 the contents of $a
Optionally you can use sed -i to make changes in-place / in file under process

wrt I've been learning about sed then you may have been wasting your time as there isn't a lot TO learn about sed beyond s/old/new. Sure there's a ton of other language constructs and things you could do with sed, but in practice you should avoid them all and simply use awk instead. If you edit your question to include concise, testable sample input and expected output and add an awk tag then we can show you how to do whatever you want to do the right way.
Meanwhile, it sounds like all you need is:
$ cat /usr/tmp/file
a
b
c
d
e
$ awk 'NR==3{print gensub(/.*\//,"",1,FILENAME)} 1' /usr/tmp/file
a
b
file
c
d
e
The above inserts the current file name before line 3 of the open file. It uses GNU awk for gensub(), with other awks you'd just use sub() and a variable.

Related

Join certain lines with sed

I have an input which looks like this:
1
2
3
4
5
6
And I want to transform it with sed to :
12
345
6
I know it can be easily done with other tools but I want to do it specifically with sed as a learning exercise.
I have attempted this:
sed ':x ; /^ *$/{ N; s/\n// ; bx; }'
But it prints :
123456
Can someone help me fix this?
Quoting from the GNU sed manual:
A common technique to process blocks of text such as paragraphs (instead of line-by-line) is using the following construct:
sed '/./{H;$!d} ; x ; s/REGEXP/REPLACEMENT/'
The first expression, /./{H;$!d} operates on all non-empty lines, and adds the current line (in the pattern space) to the hold space. On all lines except the last, the pattern space is deleted and the cycle is restarted.
The other expressions x and s are executed only on empty lines (i.e. paragraph separators). The x command fetches the accumulated lines from the hold space back to the pattern space. The s/// command then operates on all the text in the paragraph (including the embedded newlines).
And indeed,
sed '/./{H;$!d} ; x ; s/\n//g'
does what you want.
FWIW here's how to really do that task in UNIX:
$ awk -v RS= -v OFS= '{$1=$1}1' file
12
345
6
The above will work on any UNIX box.
A GNU awk approach:
$ awk -F"\n" '{gsub("\n","");}1' RS='\n{2,}' file
12
345
6
Note it will add a trailing newline\n after last line.

removing just the middle of the file with sed

I want to use sed as a part of the pipeline to preserve just 10 first and 10 last lines of its input. It would not be working on physical files, but just reading from STDIN and outputting to STDOUT. The amount of data in stream is bigger than machine RAM (or its disk space), so it needs to relatively efficient. It also must work in stream mode, without creating temporary files (no writeable filesystems).
Extra bonus if it could display one line instead of all of the middle it deleted:
for example, if I had input lines containing numbers from 1 to 100000, I would need it to output (line with literal <cut> text would be nice, but is optional):
1
2
3
4
5
6
7
8
9
10
<cut>
99991
99992
99993
99994
99995
99996
99997
99998
99999
100000
The best I've come up with is to have it output first 10 lines, and last one 1 line with:
yes ' ' | head -n 100000 |nl | \
sed -e '$q;11,$d'`
which outputs
1
2
3
4
5
6
7
8
9
10
100000
but I need it to output more context (10 lines instead of just 1) at the end of data too.
Update: length of the input stream is unknown and will vary, 100000 above is just an example.
Update: as noted in the question and the tag, I need it in sed, not awk, perl or other programming languages in which it is more easy to accomplish (that requirement, along with no tmp files, is due to fact it is embedded system with limited commands and resources available)
Update: if the input is less then that 10+10 lines, it should ideally just print the whole input
You can try following command:
sed -n 'H; 1,10 { p; b }; g; s/\n[^\n]*//; h; $ { s/\n/<cut>\n/; p }'
sed has two blocks to save content, pattern space and hold space. The first one is used to parse current line, and the second one can be used as a backup. The approach is to save in hold space the last ten lines processed.
H saves every line to hold space, g recover hold space, then remove oldest line and save again to hold space, and in last line ($) print adding your magic word in front of it.
The whole command:
yes ' ' | head -n 100000 |nl|\
sed -n 'H; 1,10 { p; b }; g; s/\n[^\n]*//; h; $ { s/\n/<cut>\n/; p }'
Yields:
1
2
3
4
5
6
7
8
9
10
<cut>
99991
99992
99993
99994
99995
99996
99997
99998
99999
100000
And said that, follow the advice from Ed Morton, because awk is simpler and easier to debug or modify some weeks later.
UPDATE:
You can append to hold space after first ten lines and check if there are more than 10 newline characters in it before removing oldest as FIFO structure:
sed -n '1,10 { p; b }; H; g; /\(\n[^\n]\+\)\{11\}/ s/\n[^\n]*//; h; $ { s/^\n//; p }'
Now it's more challenging to know where to add the <cut> string in the edge case of 20 input lines, but I will leave it as an exercise for you.
sed is for simple substitutions on a single line, that is all. For anything else, including this task, you should be using awk:
$ cat tst.awk
BEGIN { beg=(beg?beg:3); end=(end?end:3) }
NR<=beg
{ rec[(NR-1)%end+1] = $0 }
END {
print "<cut>"
for (i=1;i<=end;i++) {
print rec[(NR+i-1)%end+1]
}
}
$ seq 10 | awk -f tst.awk
1
2
3
<cut>
8
9
10
$ seq 10 | awk -v beg=2 -v end=4 -f tst.awk
1
2
<cut>
7
8
9
10
I see you've added a "it has to be sed" requirement to your question but I'll leave this answer here for future readers looking for a sensible way to perform the task.
This might work for you (GNU sed):
sed '1,10b;:a;$!{N;s/\n/&/10;Ta;D};i\<cut>' file
Print the first 10 lines as normal. Collect the next 11 lines and if it is not the end of file, delete the first of them and repeat always maintaining the last 10 lines. At the end of the file, insert a line containing <cut> and print the remaining 10 lines.

How to remove empty lines to one empty line between sentences in text files?

I have a text file with many empty lines between sentences. I used sed, gawk, grep but they dont work. :(. How can I do now? Thanks.
Myfile: Desired file:
a a
b b
c c
. .
d d
e e
f f
g g
. .
h
i
h j
i k
j .
k
.
You can use awk for this:
awk 'BEGIN{prev="x"}
/^$/ {if (prev==""){next}}
{prev=$0;print}' inputFile
or the compressed one liner:
awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}' inFl
This is a simple state machine that collapses multi-blank-lines into a single one.
The basic idea is this. First, set the previous line to be non-empty.
Then, for every line in the file, if it and the previous one are blank, just throw it away.
Otherwise, set the previous line to that value, print the line, and carry on.
Sample transcript, the following command:
$ echo '1
2
3
4
5
6
7
8
9
10' | awk 'BEGIN{p="x"}/^$/{if(p==""){next}}{p=$0;print}'
outputs:
1
2
3
4
5
6
7
8
9
10
Keep in mind that this is for truly blank lines (no content). If you're trying to collapse lines that have an arbitrary number of spaces or tabs, that will be a little trickier.
In that case, you could pipe the file through something like:
sed 's/^\s*$//'
to ensure lines with just whitespace become truly empty.
In other words, something like:
sed 's/^\s*$//' infile | awk 'my previous awk command'
To suppress repeated empty output lines with GNU cat:
cat -s file1 > file2
Here's one way using sed:
sed ':a; N; $!ba; s/\n\n\+/\n\n/g' file
Otherwise, if you don't mind a trailing blank line, all you need is:
awk '1' RS= ORS="\n\n" file
The Perl solution is even shorter:
perl -00 -pe '' file
You could do like this also,
awk -v RS="\0" '{gsub(/\n\n+/,"\n\n");}1' file
Explanation:
RS="\0" Once we set the null character as Record Seperator value, awk will read the whole file as single record.
gsub(/\n\n+/,"\n\n"); this replaces one or more blank lines with a single blank line. Note that \n\n regex matches a blank line along with the previous line's new line character.
Here is an other awk
awk -v p=1 'p=="" {p=1;next} 1; {p=$0}' file

How to use 'sed or gawk' to delete a text block until the third line previous the last one

Good day,
I was wondering how to delete a text block like this:
1
2
3
4
5
6
7
8
and delete from the second line until the third line previous the last one, to obtain:
1
2
6
7
8
Thanks in advance!!!
BTW This text block is just an example, the real text blocks I working on are huge and each one differs among them in the line numbers.
Getting the number of lines with wc and using awk to print the requested range:
$ awk 'NR<M || NR>N-M' M=3 N="$(wc -l file)" file
1
2
6
7
8
This allows you to easily change the range by just changing the value of M.
This might work for you (GNU sed):
sed '3,${:a;$!{N;s/\n/&/3;Ta;D}}' file
or i f you prefer:
sed '1,2b;:a;$!{N;s/\n/&/3;Ta;D}' file
These always print the first two lines, then build a running window of three lines.
Unless the end of file is reached the first line is popped off the window and deleted. At the end of file the remaining 3 lines are printed.
since you mentioned huge and also line numbers could be differ. I would suggest this awk one-liner:
awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}' file
it processes the input file only once, without (pre) calculating total line numbers
it stores minimal data in memory, in all processing time, only 3 lines data were stored.
If you want to change the filtering criteria, for example, removing from line x to $-y, you just simply change the offset in the oneliner.
add a test:
kent$ seq 8|awk 'NR<3{print;next}{delete a[NR-3];a[NR]=$0}END{for(x=NR-2;x<=NR;x++)print a[x]}'
1
2
6
7
8
Using sed:
sed -n '
## Append second line, print first two lines and delete them.
N;
p;
s/^.*$//;
## Read next three lines removing leading newline character inserted
## by the "N" command.
N;
s/^\n//;
N;
:a;
N;
## I will keep three lines in buffer until last line when I will print
## them and exit.
$ { p; q };
## Not last line yet, so remove one line of buffer based in FIFO algorithm.
s/^[^\n]*\n//;
## Goto label "a".
ba
' infile
It yields:
1
2
6
7
8

Keep the content of a text with specific same columns in command line

Basically I tried to operate files in command line like this:
File1:
,1,this is some content,
,2,another content,
,3,blablabla,
,4,xxxxxxxx,
,5,yyyyyyyy,
,6,zzzzzzzzzz,
... ...
File2:
1
3
4
5
Now I want to keep the content of file1 with the same column numbers in file2, so the output should be:
,1,this is some content,
,3,blablabla,
,4,xxxxxxxx,
,5,yyyyyyyy,
I used comm -3 file1 file2 but it doesn't work. Then I tried sed but also didn't work. Is there any other handy tool?
The following will work on the example as given - it won't work if numbers appear in your string after the comma:
grep -F -f File2 File1
An alternative would be
join -t, -1 2 -2 1 -o 1.1, 1.2, 1.3 File1 File2
Here is how that works:
-t, considers the `,` as terminator
-1 2 look at the second column in file 1
-2 1 look at the first column in file 2
-o 1.1, 1.2, 1.3 output the first, second, third column of file 1
This still has the drawback that if there are multiple commas in the text that follows, it terminates after the first comma ("field 3" is the last one output).
Fixing that issue requires the use of xargs:
join -t, -1 2 -2 1 -o 1.1, 1.2 File1 File2 | xargs -Ixx grep xx File1
Explanation:
-Ixx : replace the string xx in the command that follows with each of the output lines from the preceding command; the execute that command for each line. This means we will find the lines that match the first ,number, which should make us insensitive to anything else.