remove token repeatedly if line does not start with # - sed

I want to remove all commas from my text file unless a line starts with #
for example:
a, b, c
#a, b, c
should turn to:
a b c
#a, b, c
I don't mind double scan the file but I want to do that with sed

You could try the below sed command,
$ sed '/^ *#/!s/,//g' file
a b c
#a, b, c
^ asserts that we are at the start. So the above command will match the lines which starts with zero or more spaces and a # symbol. Then the following ! makes the sed to inverse the selections ie, it forces the sed to do the replacement on the lines which are not matched. s/,//g replaces all the commas with an empty string .
Through awk,
$ awk '!/^ *#/{gsub(/,/,"")}1' file
a b c
#a, b, c
! at the start negates the patten. Likewise , it will do the replacement only on the lines which don't have # at the start.

Related

Print specific lines that have two or more occurrences of a particular character

I have file with some text lines. I need to print lines 3-7 and 11 if it has two "b". I did
sed -n '/b\{2,\}/p' file but it printed lines where "b" occurs two times in a row
You can use
sed -n '3,7{/b[^b]*b/p};11{/b[^b]*b/p}' file
## that is equal to
sed -n '3,7{/b[^b]*b/p};11{//p}' file
Note that b[^b]*b matches b, then any zero or more chars other than b and then a b. The //p in the second part matches the most recent pattern , i.e. it matches the same b[^b]*b regex.
Note you might also use b.*b regex if you want, but the bracket expressions tend to word faster.
See an online demo, tested with sed (GNU sed) 4.7:
s='11bb1
b222b
b n b
ww
ee
bb
rrr
fff
999
10
11 b nnnn bb
www12'
sed -ne '3,7{/b[^b]*b/p};11{/b[^b]*b/p}' <<< "$s"
Output:
b n b
bb
11 b nnnn bb
Only lines 3, 6 and 11 are returned.
Just use awk for simplicity, clarity, portability, maintainability, etc. Using any awk in any shell on every Unix box:
awk '( (3<=NR && NR<=7) || (NR==11) ) && ( gsub(/b/,"&") >= 2 )' file
Notice how if you need to change a range, add a range, add other line numbers, change how many bs there are, add other chars and/or strings to match, add some completely different condition, etc. it's all absolutely clear and trivial.
For example, want to print the line if there's exactly either 13 or 27 bs instead of 2 or more:?
awk '( (3<=NR && NR<=7) || (NR==11) ) && ( gsub(/b/,"&") ~ /^(13|27)$/ )' file
Want to print the line if the line number is between 23 and 59 but isn't 34?
awk '( 23<=NR && NR<=59 && NR!=34 ) && ( gsub(/b/,"&") >= 2 )' file
Try making similar changes to a sed script. I'm not saying you can't force it to happen, but it's not nearly as trivial, clear, portable, etc. as it is using awk.

How to parse rows in my txt file properly using perl

I hope to parse a txt file that looks like this:
A a, b, c
B e
C f, g
The format I hope to get is:
A a
A b
A c
B e
C f
C g
I tried this:
perl -ane '#s=split(/\,/, $F[1]); foreach $k (#s){print "$F[0] $k\n";}' txt.txt
but it only works when there's no space after commas. In the original file, there is a space after each comma. What should I do?
$ perl -lane 'print "$F[0] $_" for map { tr/,//rd } #F[1..$#F]' input.txt
A a
A b
A c
B e
C f
C g
Use auto-split mode on whitespace like normal, and for each element of an array slice of #F from the second field to the last one, remove any commas (I used tr//d, the more usual s/// works too, of course) and print it with the first field prepended.
Alternatively, don't use -a because it splits too much.
perl -le'#F = split(" ", $_, 2); print "$F[0] $_" for split(/,\s*/, $F[1])'

join previous line with next depending of pattern of previous line

I have this Input:
1 a
a
2 b b
3 c
c
4 d d
5 e e
6 f
f
7 g
g
I want this output using sed command
1 a a
2 b b
3 c c
4 d d
5 e e
6 f f
7 g g
I'm trying this without success
sed '/^[^0-9]/ x; N; { s/\n/ / }; n' file
Another in awk:
$ awk 'BEGIN{RS=""}{for(i=1;i<=NF;i+=3)print $i,$(i+1),$(i+2)}' file
1 a a
2 b b
3 c c
4 d d
5 e e
6 f f
7 g g
Explained:
$ awk 'BEGIN {
RS="" # prime awk to read in a paragraph of data
}
{
for(i=1;i<=NF;i+=3) # jump forward 3 fields at a time
print $i,$(i+1),$(i+2) # print 3 fields
}' file
awk 'NR>1 && /^[0-9]/ {print substr(s,2); s=""} {s=s FS $0} END {print substr(s,2)}' file
NR>1 && /^[0-9]/: If a line is not the first and begins with a digit,
{print substr(s,2); s=""}: print "s" without the leading space, then clear it.
{s=s FS $0}: On every line, append the current line to the value of "s". FS is a space by default.
edit: Added END condition to catch last line, hated it, made a better separate answer.
Made it simpler with awk:
awk 'NF==2 {printf("%s ", $0); next} 1' file
Basically, "Don't print a newline if there are only exactly two fields."
This might work for you (GNU sed):
sed '/^[0-9]/{:a;N;s/\n\([^0-9]\)/ \1/;ta;P;D}' file
If the current line begins with an integer, append the following line. If that line does not begin with an integer, replace the newline by a space and repeat. Otherwise print/delete the first line in the pattern space and repeat.

sed editing multiple lines

Sed editing is always a new challenge to me when it comes to multiple line editing. In this case I have the following pattern:
RECORD 4,4 ,5,48 ,7,310 ,10,214608 ,12,199.2 ,13,-19.2 ,15,-83 ,17,35 \
,18,0.8 ,21,35 ,22,31.7 ,23,150 ,24,0.8 ,25,150 ,26,0.8 ,28,25 ,29,6 \
,30,1200 ,31,1 ,32,0.2 ,33,15 ,36,0.4 ,37,1 ,39,1.1 ,41,4 ,80,2 \
,82,1000 ,84,1 ,85,1
which I want to convert into:
#RECORD 4,4 ,5,48 ,7,310 ,10,214608 ,12,199.2 ,13,-19.2 ,15,-83 ,17,35 \
# ,18,0.8 ,21,35 ,22,31.7 ,23,150 ,24,0.8 ,25,150 ,26,0.8 ,28,25 ,29,6\
# ,30,1200 ,31,1 ,32,0.2 ,33,15 ,36,0.4 ,37,1 ,39,1.1 ,41,4 ,80,2 \
# ,82,1000 ,84,1 ,85,1
Besides this I would like to preserve the entirety of these 4 lines (which may be more or less than 4 (unpredictable as the appear in the input) into one (long) line without the backslashes or line wraps.
Two tasks in one so to say.
sed is mandatory.
It's not terribly clear how you recognize the blocks you want to comment out, so I'll use blocks from a line that starts with RECORD and process as long as there are backslashes at the end (if your requirements differ, the patterns used will need to be amended accordingly).
For that, you could use
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g; s/^/#/ }' filename
This works as follows:
/^RECORD/ { # if you find a line that starts with
# RECORD:
:a # jump label for looping
/\\$/ { # while there's a backslash at the end
# of the pattern space
N # fetch the next line
ba # loop.
}
# After you got the whole block:
s/[[:space:]]*\\\n[[:space:]]*/ /g # remove backslashes, newlines, spaces
# at the end, beginning of lines
s/^/#/ # and put a comment sign at the
# beginning.
}
Addendum: To keep the line structure intact, instead use
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/\(^\|\n\)/&#/g }' filename
This works pretty much the same way, except the newline-removal is removed, and the comment signs are inserted after every line break (and once at the beginning).
Addendum 2: To just put RECORD blocks onto a single line:
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g }' filename
This is just the first script with the s/^/#/ bit removed.
Addendum 3: To isolate RECORD blocks while putting them onto a single line at the same time,
sed -n '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g; p }' filename
The -n flag suppresses the normal default printing action, and the p command replaces it for those lines that we want printed.
To write those records out to a file while commenting them out in the normal output at the same time,
sed -e '/^RECORD/ { :a /\\$/ { N; ba }; h; s/[[:space:]]*\\\n[[:space:]]*/ /g; w saved_records.txt' -e 'x; s/\(^\|\n\)/&#/g }' foo.txt
There's actually new stuff in this. Shortly annotated:
#!/bin/sed -f
/^RECORD/ {
:a
/\\$/ {
N
ba
}
# after assembling the lines
h # copy them to the hold buffer
s/[[:space:]]*\\\n[[:space:]]*/ /g # put everything on a line
w saved_records.txt # write that to saved_records.txt
x # swap the original lines back
s/\(^\|\n\)/&#/g # and insert comment signs
}
When specifying this code directly on the command line, it is necessary to split it into several -e options because the w command is not terminated by ;.
This problem does not arise when putting the code into a file of its own (say foo.sed) and running sed -f foo.sed filename instead. Or, for the advanced, putting a #!/bin/sed -f shebang on top of the file, chmod +xing it and just calling ./foo.sed filename.
Lastly, to edit the input file in-place and print the records to stdout, this could be amended as follows:
sed -i -e '/^RECORD/ { :a /\\$/ { N; ba }; h; s/[[:space:]]*\\\n[[:space:]]*/ /g; w /dev/stdout' -e 'x; s/\(^\|\n\)/&#/g }' filename
The new things here are the -i flag for inplace editing of the file, and to have /dev/stdout as target for the w command.
sed '/^RECORD.*\\$/,/[^\\]$/ s/^/#/
s/^RECORD.*/#&/' YourFile
After several remark of #Wintermute and more information from OP
Assuming:
line with RECORD at start are a trigger to modify the next lines
structure is the same (no line with \ with a RECORD line following directly or empty lines)
Explain:
take block of line starting with RECORD and ending with \
add # in front of each line
take line (so after ana eventual modification from earlier block that leave only RECORD line without \ at the end or line without record) and add a # at the start if starting with RECORD

How do I assign the line number to each string present in a file

F1.txt
tom a b c d e boy
bob a b c sun
harry a c d e girl
result
F2.txt
tom1 a1 b1 c1 d1 e1 boy1
tom2 a2 b2 c2 sun2
tom3 a3 c3 d3 e3 girl3
Hello everyone, I am quite new to Perl,can you kindly help me out with this new problem of mine. I have a file F1.txt, my job is to assign numbers after each string in a file according to its line number as shown in an example above. I have so far just managed to assign a number to each of the lines with this Perl one-liner
perl -pe '$_= ++$a." $_" if /./'
Maybe as follows:
perl -pe 's/(?<=\w)\b/$./g;'
The special variable $. holds the current line number.
The regex /(?<=\w)\b/g matches each end of a word (or a number or underscore).
Or, more precise, a word boundary preceded by a "word" character which we don't include in our match. The \b assertion has zero width. Use the regex s/(?<=\S)(?=\s|$)/$./g to put a line number after each non-space sequence.
We can use the substitution operator s///g to append the line number in this way:
echo -e "a b\nc d" | perl -ne 's/(?<=\w)\b/$./g; print'
prints
a1 b1
c2 d2
in a one-liner:
perl -pe 's/(?<=\w)\b/$./g' F1.txt >F2.txt