Stop printing when non-comment lines are reached - sed

I am printing between sections of the following form
## FAML [ASMB] keyword,keyword
## Some text
## END OF FAML [ASMB]
I have encountered a problem that occurs when "## END OF FAML [ASMB]" is not reached. I want to stop as soon as lines not starting with comment characters "##" are encountered.
For instance, I want to stop upon reaching "Some code" even though it did not find "## END OF FAML [ASMB]" because the line does not start with "##".
## FAML [ASMB] keyword,keyword
## Some text
## End OF FAL
Some code
This is the implementation
spc='[[:space:]]*'
gph="[[:graph:]]+"
cmt='\/\/'
ebl='\['
ebr='\]'
local pn_ere="^[[:space:]]*([#;!]+|#c|${cmt})[[:space:]]+"
local kys="(([^,]+)(,[^,]+)*)?"
nfaml=${faml:-"[[:graph:]]+"}
nasmb=${asmb:-"[[:graph:]]+"}
beg_ere="${pn_ere}${nfaml} ${ebl}${nasmb}${ebr}${spc}${kys}$"
end_ere="${pn_ere}END OF ${nfaml} ${ebl}${nasmb}${ebr}${spc}$"
sed -E -n "/$beg_ere/,/$end_ere/ {
/$end_ere/z; s/$pn_ere// ; p
}" "$filename"

You could check for non-comment and ignore. e.g.
# ...
notcom_ere='^[^#[:space:]]+'
sed -E -n "/$beg_ere/,/($end_ere)|($notcom_ere)/ {
/$notcom_ere/d
/$end_ere/z
s/$pn_ere//
p
}" "$filename"

Related

sed editing multiple lines

Sed editing is always a new challenge to me when it comes to multiple line editing. In this case I have the following pattern:
RECORD 4,4 ,5,48 ,7,310 ,10,214608 ,12,199.2 ,13,-19.2 ,15,-83 ,17,35 \
,18,0.8 ,21,35 ,22,31.7 ,23,150 ,24,0.8 ,25,150 ,26,0.8 ,28,25 ,29,6 \
,30,1200 ,31,1 ,32,0.2 ,33,15 ,36,0.4 ,37,1 ,39,1.1 ,41,4 ,80,2 \
,82,1000 ,84,1 ,85,1
which I want to convert into:
#RECORD 4,4 ,5,48 ,7,310 ,10,214608 ,12,199.2 ,13,-19.2 ,15,-83 ,17,35 \
# ,18,0.8 ,21,35 ,22,31.7 ,23,150 ,24,0.8 ,25,150 ,26,0.8 ,28,25 ,29,6\
# ,30,1200 ,31,1 ,32,0.2 ,33,15 ,36,0.4 ,37,1 ,39,1.1 ,41,4 ,80,2 \
# ,82,1000 ,84,1 ,85,1
Besides this I would like to preserve the entirety of these 4 lines (which may be more or less than 4 (unpredictable as the appear in the input) into one (long) line without the backslashes or line wraps.
Two tasks in one so to say.
sed is mandatory.
It's not terribly clear how you recognize the blocks you want to comment out, so I'll use blocks from a line that starts with RECORD and process as long as there are backslashes at the end (if your requirements differ, the patterns used will need to be amended accordingly).
For that, you could use
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g; s/^/#/ }' filename
This works as follows:
/^RECORD/ { # if you find a line that starts with
# RECORD:
:a # jump label for looping
/\\$/ { # while there's a backslash at the end
# of the pattern space
N # fetch the next line
ba # loop.
}
# After you got the whole block:
s/[[:space:]]*\\\n[[:space:]]*/ /g # remove backslashes, newlines, spaces
# at the end, beginning of lines
s/^/#/ # and put a comment sign at the
# beginning.
}
Addendum: To keep the line structure intact, instead use
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/\(^\|\n\)/&#/g }' filename
This works pretty much the same way, except the newline-removal is removed, and the comment signs are inserted after every line break (and once at the beginning).
Addendum 2: To just put RECORD blocks onto a single line:
sed '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g }' filename
This is just the first script with the s/^/#/ bit removed.
Addendum 3: To isolate RECORD blocks while putting them onto a single line at the same time,
sed -n '/^RECORD/ { :a /\\$/ { N; ba }; s/[[:space:]]*\\\n[[:space:]]*/ /g; p }' filename
The -n flag suppresses the normal default printing action, and the p command replaces it for those lines that we want printed.
To write those records out to a file while commenting them out in the normal output at the same time,
sed -e '/^RECORD/ { :a /\\$/ { N; ba }; h; s/[[:space:]]*\\\n[[:space:]]*/ /g; w saved_records.txt' -e 'x; s/\(^\|\n\)/&#/g }' foo.txt
There's actually new stuff in this. Shortly annotated:
#!/bin/sed -f
/^RECORD/ {
:a
/\\$/ {
N
ba
}
# after assembling the lines
h # copy them to the hold buffer
s/[[:space:]]*\\\n[[:space:]]*/ /g # put everything on a line
w saved_records.txt # write that to saved_records.txt
x # swap the original lines back
s/\(^\|\n\)/&#/g # and insert comment signs
}
When specifying this code directly on the command line, it is necessary to split it into several -e options because the w command is not terminated by ;.
This problem does not arise when putting the code into a file of its own (say foo.sed) and running sed -f foo.sed filename instead. Or, for the advanced, putting a #!/bin/sed -f shebang on top of the file, chmod +xing it and just calling ./foo.sed filename.
Lastly, to edit the input file in-place and print the records to stdout, this could be amended as follows:
sed -i -e '/^RECORD/ { :a /\\$/ { N; ba }; h; s/[[:space:]]*\\\n[[:space:]]*/ /g; w /dev/stdout' -e 'x; s/\(^\|\n\)/&#/g }' filename
The new things here are the -i flag for inplace editing of the file, and to have /dev/stdout as target for the w command.
sed '/^RECORD.*\\$/,/[^\\]$/ s/^/#/
s/^RECORD.*/#&/' YourFile
After several remark of #Wintermute and more information from OP
Assuming:
line with RECORD at start are a trigger to modify the next lines
structure is the same (no line with \ with a RECORD line following directly or empty lines)
Explain:
take block of line starting with RECORD and ending with \
add # in front of each line
take line (so after ana eventual modification from earlier block that leave only RECORD line without \ at the end or line without record) and add a # at the start if starting with RECORD

How to remove YAML frontmatter from markdown files?

I have markdown files that contain YAML frontmatter metadata, like this:
---
title: Something Somethingelse
author: Somebody Sometheson
---
But the YAML is of varying widths. Can I use a Posix command like sed to remove that frontmatter when it's at the beginning of a file? Something that just removes everything between --- and ---, inclusive, but also ignores the rest of the file, in case there are ---s elsewhere.
I understand your question to mean that you want to remove the first ----enclosed block if it starts at the first line. In that case,
sed '1 { /^---/ { :a N; /\n---/! ba; d} }' filename
This is:
1 { # in the first line
/^---/ { # if it starts with ---
:a # jump label for looping
N # fetch the next line, append to pattern space
/\n---/! ba; # if the result does not contain \n--- (that is, if the last
# fetched line does not begin with ---), go back to :a
d # then delete the whole thing.
}
}
# otherwise drop off the end here and do the default (print
# the line)
Depending on how you want to handle lines that begin with ---abc or so, you may have to change the patterns a little (perhaps add $ at the end to only match when the whole line is ---). I'm a bit unclear on your precise requirements there.
If you want to remove only the front matter, you could simply run:
sed '1{/^---$/!q;};1,/^---$/d' infile
If the first line doesn't match ---, sed will quit; else it will delete everything from the 1st line up to (and including) the next line matching --- (i.e. the entire front matter).
If you don't mind the "or something" being perl.
Simply print after two instances of "---" have been found:
perl -ne 'if ($i > 1) { print } else { /^---/ && $i++ }' yaml
or a bit shorter if you don't mind abusing ?: for flow control:
perl -ne '$i > 1 ? print : /^---/ && $i++' yaml
Be sure to include -i if you want to replace inline.
you use a bash file, create script.sh and make it executable using chmod +x script.sh and run it ./script.sh.
#!/bin/bash
#folder articles contains a lot of markdown files
files=./articles/*.md
for f in $files;
do
#filename
echo "${f##*/}"
#replace frontmatter title attribute to "title"
sed -i -r 's/^title: (.*)$/title: "\1"/' $f
#...
done
This AWK based solution works for files with and without FrontMatter, doing nothing in the later case.
#!/bin/sh
# Strips YAML FrontMattter from a file (usually Markdown).
# Exit immediately on each error and unset variable;
# see: https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -Ee
print_help() {
echo "Strips YAML FrontMattter from a file (usually Markdown)."
echo
echo "Usage:"
echo " `basename $0` -h"
echo " `basename $0` --help"
echo " `basename $0` -i <file-with-front-matter>"
echo " `basename $0` --in-place <file-with-front-matter>"
echo " `basename $0` <file-with-front-matter> <file-to-be-without-front-matter>"
}
replace=false
in_file="-"
out_file="/dev/stdout"
if [ -n "$1" ]
then
if [ "$1" = "-h" ] || [ "$1" = "--help" ]
then
print_help
exit 0
elif [ "$1" = "-i" ] || [ "$1" = "--in-place" ]
then
replace=true
in_file="$2"
out_file="$in_file"
else
in_file="$1"
if [ -n "$2" ]
then
out_file="$2"
fi
fi
fi
tmp_out_file="$out_file"
if $replace
then
tmp_out_file="${in_file}_tmp"
fi
awk -e '
BEGIN {
is_first_line=1;
in_fm=0;
}
/^---$/ {
if (is_first_line) {
in_fm=1;
}
}
{
if (! in_fm) {
print $0;
}
}
/^(---|...)$/ {
if (! is_first_line) {
in_fm=0;
}
is_first_line=0;
}
' "$in_file" >> "$tmp_out_file"
if $replace
then
mv "$tmp_out_file" "$out_file"
fi

find the line number where a specific word appears with “sed” on tcl shell

I need to search for a specific word in a file starting from specific line and return the line numbers only for the matched lines.
Let's say I want to search a file called myfile for the word my_word and then store the returned line numbers.
By using shell script the command :
sed -n '10,$ { /$my_word /= }' $myfile
works fine but how to write that command on tcl shell?
% exec sed -n '10,$ { /$my_word/= }' $file
extra characters after close-brace.
I want to add that the following command works fine on tcl shell but it starts from the beginning of the file
% exec sed -n "/$my_word/=" $file
447431
447445
448434
448696
448711
448759
450979
451006
451119
451209
451245
452936
454408
I have solved the problem as follows
set lineno 10
if { ! [catch {exec sed -n "/$new_token/=" $file} lineFound] && [string length $lineFound] > 0 } {
set lineNumbers [split $lineFound "\n"]
foreach num $lineNumbers {
if {[expr {$num >= $lineno}] } {
lappend col $num
}
}
}
Still can't find a single line that solve the problem
Any suggestions ??
I don't understand a thing: is the text you are looking for stored inside the variable called my_word or is the literal value my_word?
In your line
% exec sed -n '10,$ { /$my_word/= }' $file
I'd say it's the first case. So you have before it something like
% set my_word wordtosearch
% set file filetosearchin
Your mistake is to use the single quote character ' to enclose the sed expression. That character is an enclosing operator in sh, but has no meaning in Tcl.
You use it in sh to group many words in a single argument that is passed to sed, so you have to do the same, but using Tcl syntax:
% set my_word wordtosearch
% set file filetosearchin
% exec sed -n "10,$ { /$my_word/= }" $file
Here, you use the "..." to group.
You don't escape the $ in $my_word because you want $my_word to be substitued with the string wordtosearch.
I hope this helps.
After a few trial-and-error I came up with:
set output [exec sed -n "10,\$ \{ /$myword/= \}" $myfile]
# Do something with the output
puts $output
The key is to escape characters that are special to TCL, such as the dollar sign, curly braces.
Update
Per Donal Fellows, we do not need to escape the dollar sign:
set output [exec sed -n "10,$ \{ /$myword/= \}" $myfile]
I have tried the new revision and found it works. Thank you, Donal.
Update 2
I finally gained access to a Windows 7 machine, installed Cygwin (which includes sed and tclsh). I tried out the above script and it works just fine. I don't know what your problem is. Interestingly, the same script failed on my Mac OS X system with the following error:
sed: 1: "10,$ { /ipsum/= }": extra characters at the end of = command
while executing
"exec sed -n "10,$ \{ /$myword/= \}" $myfile"
invoked from within
"set output [exec sed -n "10,$ \{ /$myword/= \}" $myfile]"
(file "sed.tcl" line 6)
I guess there is a difference between Linux and BSD systems.
Update 3
I have tried the same script under Linux/Tcl 8.4 and it works. That might mean Tcl 8.4 has nothing to do with it. Here is something else that might help: Tcl comes with a package called fileutil, which is part of the tcllib. The fileutil package contains a useful tool for this case: fileutil::grep. Here is a sample on how to use it in your case:
package require fileutil
proc grep_demo {myword myfile} {
foreach line [fileutil::grep $myword $myfile] {
# Each line is in the format:
# filename:linenumber:text
set lineNumber [lindex [split $line :] 1]
if {$lineNumber >= 10} { puts $lineNumber}
}
}
puts [grep_demo $myword $myfile]
Here is how to do it with awk
awk 'NR>10 && $0~f {print NR}' f="$my_word" "$myfile"
This search for all line larger than line number 10 that contains word in variable $my_word in file name stored in variable myfile

sed command to print paragraphs after a pattern

I have a log file where data entries are as follows: Each entry starts with time:
Now I want to print only the entries after a specific time. For example, after time : 20130309235926, I want to print all records, so in my case, it should print the last 2 records.
Is there a sed command for doing this?
time: 20130309235926
dn:
changetype: modify
-
replace: modifiersname
modifiersname:
dc=
-
replace: modifytimestamp
modifytimestamp: 20130310045926Z
-
time: 20130309235959
dn:
changetype: modify
-
replace: modifiersname
modifiersname:
dc=
-
replace: modifytimestamp
modifytimestamp: 20130310045926Z
-
time: 20130308025010
dn:
changetype: modify
-
replace: modifiersname
modifiersname:
dc=
-
replace: modifytimestamp
modifytimestamp: 20130310045926Z
-
I like perl for doing paragraph-y things:
perl -00 -ne '$t = (/time: (\d+)/)[0]; print if $t gt "20130309235926"'
The -00 flag provides the input in paragraphs (separated by empty lines)
It depends if the time to find matches exactly or not, because doing aritmethic with sed is difficult. For example, for an exact match, this command uses -n switch to disable automatic printing and uses a range to print from the line that matches your time until end of file ($):
sed -n '/time:[ ]*20130309235926/,$ p' infile
EDIT to fix previous command:
sed -n '
## When found a blank line between a line with your time and end of file,
## jump to label "a".
/time:[ ]*20130309235926/,$ {
/^[ ]*$/ ba
};
## Skip all lines until previous condition be true.
b;
## Label "a".
:a;
## Save all content from next entry until end of file.
$! {
N;
ba
};
## Remove extra newline and print.
s/^\n//;
p
' infile
EDIT to add the previous command as a one-line:
sed -n '/time:[ ]*20130309235926/,$ { /^[ ]*$/ ba }; b; :a; $! { N; ba }; s/^\n//; p' infile

tcl script to compare 2 texts and return the line numbers of the lines that differ only. Also, How to avoid "child process exit abnormally"?

can anybody guide me how to write tcl script to compare 2 texts and return the line numbers of the lines that differ only ?
I know how to do it in bash, but to include the bash in tcl doesnt seem to be very neat, here's the bash command :
diff --old-line-format '%L' --new-line-format '' --unchanged-line-format '' <(nl File1) <(nl File2) | awk '{print $1 }' > difflines
To include this in tcl, i did the following :
exec cat nl File1 > File11
exec cat nl File2 > File22
exec diff --old-line-format {%L} --new-line-format {} --unchanged-line-format {} <
File11 < File22 | awk {{print $1 }} > difflines
Is there a cleaner way ?
Also if there's difference i get the "child exit abnormally", how can i avoid this ?
Thanks
The struct::list package in Tcllib has tools for computing longest-common-subsequences, which is the key part of a diff tool. To use, you load your data into Tcl and split it into a list of lines:
proc getLines {filename} {
set f [open $filename]
set result [split [read $f] "\n"]
close $f
return $result
}
Then you can get the information about the common elements (== common lines):
set sharedLineInfo [struct::list longestCommonSubsequence $file1_lines $file2_lines]
This returns a pair of lists, where each list is the indices (counting from zero) of the common lines; the first list will be for the first file, and the second list for the second file. Any line number not listed will be one that has changed.
There's also a function to invert the information provided to get instructions on how to change one sequence into the other:
set changes [struct::list lcsInvert $sharedLineInfo \
[llength $file1_lines] [llength $file2_lines]]
This returns a list of triples, where the first is the operation performed (added, changed or deleted) and the second and third are the ranges of indices in each of the relevant lists (i.e., zero-based line numbers).
I'm not quite sure how to take this information and produce what you are looking for, but I guess we could put it together like this:
package require struct::list
proc getLines {filename} {
set f [open $filename]
set result [split [read $f] "\n"]
close $f
return $result
}
proc variedLines {filename1 filename2} {
set l1 [getLines $filename1]
set l2 [getLines $filename2]
lassign [struct::list longestCommonSubsequence $l1 $l2] common1
set result {}
for {set i 0} {$i < [llength $l1]} {incr i} {
if {$i ni $common1} {
lappend result [expr {$i + 1}]
}
}
return $result
}
If you want the results written to a file, puts $f [join $someList "\n"] is likely to feature, but I'll leave that as an exercise…
Regarding "child process exited abnormally", from the exec man page (emphasis mine):
If any of the commands in the pipeline exit abnormally or are killed or suspended, then exec will return an error and the error message will include the pipeline's output followed by error messages describing the abnormal terminations; the -errorcode return option will contain additional information about the last abnormal termination encountered. If any of the commands writes to its standard error file and that standard error is not redirected and -ignorestderr is not specified, then exec will return an error; the error message will include the pipeline's standard output, followed by messages about abnormal terminations (if any), followed by the standard error output.
"commands exit abnormally" means that the command exits with a non-zero status. Some common commands like grep and diff return a non-zero exit status to indicate something normal, so you have to wrap that exec call in a catch
set rc [catch {exec bash -c {
diff --old-line-format '%L' --new-line-format '' --unchanged-line-format '' <(nl File1) <(nl File2) | awk '{print $1}' > difflines
}} output]
if {$rc == 0} {
puts "no differences found"
} elseif {$rc == 1} {
puts "differences found:"
puts $output
} else {
puts "diff returned an error: $output"
}