How to delete multiple empty lines with SED? - sed

I'm trying to compress a text document by deleting of duplicated empty lines, with sed. This is what I'm doing (to no avail):
sed -i -E 's/\n{3,}/\n/g' file.txt
I understand that it's not correct, according to this manual, but I can't figure out how to do it correctly. Thanks.

I think you want to replace spans of multiple blank lines with a single blank line, even though your example replaces multiple runs of \n with a single \n instead of \n\n. With that in mind, here are two solutions:
sed '/^$/{ :l
N; s/^\n$//; t l
p; d; }' input
In many implementations of sed, that can be all on one line, with the embedded newlines replaced by ;.
awk 't || !/^$/; { t = !/^$/ }'

As tripleee suggested above, I'm using Perl instead of sed:
perl -0777pi -e 's/\n{3,}/\n\n/g'

Use the translate function
tr -s '\n'
the -s or --squeeze-repeats reduces a sequence of repeated character to a single instance.

This is much better handled by tr -s '\n' or cat -s, but if you insist on sed, here's an example from section 4.17 of the GNU sed manual:
#!/usr/bin/sed -f
# on empty lines, join with next
# Note there is a star in the regexp
:x
/^\n*$/ {
N
bx
}
# now, squeeze all '\n', this can be also done by:
# s/^\(\n\)*/\1/
s/\n*/\
/

I am not sure this is what the OP wanted but using the awk solution by William Pursell here is the approach if you want to delete ALL empty lines in the file:
awk '!/^$/' file.txt
Explanation:
The awk pattern
'!/^$/'
is testing whether the current line is consisting only of the beginning of a line (symbolised by '^') and the end of a line (symbolised by '$'), in other words, whether the line is empty.
If this pattern is true awk applies its default and prints the current line.
HTH

I think OP wants to compress empty lines, e.g. where there are 9 consecutive emty lines, he wants to have just three.
I have written a little bash script that does just that:
#! /bin/bash
TOTALLINES="$(cat file.txt|wc -l)"
CURRENTLINE=1
while [ $CURRENTLINE -le $TOTALLINES ]
do
L1=$CURRENTLINE
L2=$(($L1 + 1))
L3=$(($L1 +2))
if [[ $(cat file.txt|head -$L1|tail +$L1) == "" ]]||[[ $(cat file.txt|head -$L1|tail +$L1) == " " ]]
then
L1EMPTY=true
else
L1EMPTY=false
fi
if [[ $(cat file.txt|head -$L2|tail +$L2) == "" ]]||[[ $(cat file.txt|head -$L2|tail +$L2) == " " ]]
then
L2EMPTY=true
else
L2EMPTY=false
fi
if [[ $(cat file.txt|head -$L3|tail +$L3) == "" ]]||[[ $(cat file.txt|head -$L3|tail +$L3) == " " ]]
then
L3EMPTY=true
else
L3EMPTY=false
fi
if [ $L1EMPTY = true ]&&[ $L2EMPTY = true ]&&[ $L3EMPTY = true ]
then
#do not cat line to temp file
echo "Skipping line "$CURRENTLINE
else
echo "$(cat file.txt|head -$CURRENTLINE|tail +$CURRENTLINE)">>temp.txt
echo "Writing line " $CURRENTLINE
fi
((CURRENTLINE++))
done
cat temp.txt>file.txt
rm -r temp.txt
FINALTOTALLINES="$(cat file.txt|wc -l)"
EMPTYLINELINT=$(( $CURRENTLINE - $FINALTOTALLINES ))
echo "Deleted " $EMPTYLINELINT " empty lines."

Related

How to print some free text in addition to SED extract

Well-known SED command to extract a first line and print to another file
sed -n '1 p' /p/raw.txt | cat >> /p/001.txt ;
gives an output in /p/001.txt like
John Doe
But how to modify this command above to add some free text and have, for example, the output like
Name: John Doe
Thanks for any hint to try.
You can do that in a single command (and no sub-shells):
sed 's/^/Name: /;q' /p/raw.txt >> /p/001.txt
This prefixes "Name: " in front of the first line, prints it, then quits so you don't process additional lines. Add a line number before the q to print all lines up to (and including) that number. The output is appended to /p/001.txt just like your original code.
If you want a range of lines:
sed -n '3,9{s/^/Name: /;p}9q' /p/raw.txt >> /p/001.txt
This reads from lines 3-9, performs the substitution, prints, then quits after line 9.
If you want specific lines, I recommend awk:
awk 'NR==3 || NR==9 { print "Name: " $0 } NR>=9 { exit }' /p/raw.txt >> /p/001.txt
This has two clauses. One says the number of record (line number) is either 3 or 9, in which case we print the prefix and the line. The other tells us to stop reading the file after the 9th record.
Here are two more commands to show how awk can act on just the first line(s) or a given range:
awk '{ print "Name: " $0 } NR >= 1 { exit }' /p/raw.txt >> /p/001.txt
awk '3 <= NR { print "Name: " $0 } NR >= 9 { exit }' /p/raw.txt >> /p/001.txt
It appears you're continuously building one file from the other. Consider:
tail -Fn0 /p/raw.txt |sed 's/^/Name: /' >> /p/001.txt
This will run continuously, adding only new entries (added after the command is run) to /p/001.txt
Perhaps you have lots of duplicates to resolve?
awk 'NR != FNR { $0 = "Name: " $0 } !s[$0]++' \
/p/001.txt /p/raw.txt > /tmp/001.txt && mv /tmp/001.txt /p/001.txt
This folds together the previously saved names with any new names, printing names only once (!s[$0]++ is true when s[$0] is zero (its default state), but after the evaluation, it increments to one, making it false on the second occurrence. When a bare clause has no action, the line is printed.) Because we're reading the output file, we need a temporary output. Upon its successful completion, we then move it atop the target output file.
printf "Name : %s\n" "$(sed -n '1p;q' /p/raw.txt)" >/p/001.txt
should do it. If sed is not a requirement do
echo -e "Name : $(sed -n '1p;q' /p/raw.txt)" >/p/001.txt
Note
The q option with the sed quits it without processing any more commands or input.
The -e option tells echo to interpret escape sequences. This is a peculiarity of bash shell.

How to remove YAML frontmatter from markdown files?

I have markdown files that contain YAML frontmatter metadata, like this:
---
title: Something Somethingelse
author: Somebody Sometheson
---
But the YAML is of varying widths. Can I use a Posix command like sed to remove that frontmatter when it's at the beginning of a file? Something that just removes everything between --- and ---, inclusive, but also ignores the rest of the file, in case there are ---s elsewhere.
I understand your question to mean that you want to remove the first ----enclosed block if it starts at the first line. In that case,
sed '1 { /^---/ { :a N; /\n---/! ba; d} }' filename
This is:
1 { # in the first line
/^---/ { # if it starts with ---
:a # jump label for looping
N # fetch the next line, append to pattern space
/\n---/! ba; # if the result does not contain \n--- (that is, if the last
# fetched line does not begin with ---), go back to :a
d # then delete the whole thing.
}
}
# otherwise drop off the end here and do the default (print
# the line)
Depending on how you want to handle lines that begin with ---abc or so, you may have to change the patterns a little (perhaps add $ at the end to only match when the whole line is ---). I'm a bit unclear on your precise requirements there.
If you want to remove only the front matter, you could simply run:
sed '1{/^---$/!q;};1,/^---$/d' infile
If the first line doesn't match ---, sed will quit; else it will delete everything from the 1st line up to (and including) the next line matching --- (i.e. the entire front matter).
If you don't mind the "or something" being perl.
Simply print after two instances of "---" have been found:
perl -ne 'if ($i > 1) { print } else { /^---/ && $i++ }' yaml
or a bit shorter if you don't mind abusing ?: for flow control:
perl -ne '$i > 1 ? print : /^---/ && $i++' yaml
Be sure to include -i if you want to replace inline.
you use a bash file, create script.sh and make it executable using chmod +x script.sh and run it ./script.sh.
#!/bin/bash
#folder articles contains a lot of markdown files
files=./articles/*.md
for f in $files;
do
#filename
echo "${f##*/}"
#replace frontmatter title attribute to "title"
sed -i -r 's/^title: (.*)$/title: "\1"/' $f
#...
done
This AWK based solution works for files with and without FrontMatter, doing nothing in the later case.
#!/bin/sh
# Strips YAML FrontMattter from a file (usually Markdown).
# Exit immediately on each error and unset variable;
# see: https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -Ee
print_help() {
echo "Strips YAML FrontMattter from a file (usually Markdown)."
echo
echo "Usage:"
echo " `basename $0` -h"
echo " `basename $0` --help"
echo " `basename $0` -i <file-with-front-matter>"
echo " `basename $0` --in-place <file-with-front-matter>"
echo " `basename $0` <file-with-front-matter> <file-to-be-without-front-matter>"
}
replace=false
in_file="-"
out_file="/dev/stdout"
if [ -n "$1" ]
then
if [ "$1" = "-h" ] || [ "$1" = "--help" ]
then
print_help
exit 0
elif [ "$1" = "-i" ] || [ "$1" = "--in-place" ]
then
replace=true
in_file="$2"
out_file="$in_file"
else
in_file="$1"
if [ -n "$2" ]
then
out_file="$2"
fi
fi
fi
tmp_out_file="$out_file"
if $replace
then
tmp_out_file="${in_file}_tmp"
fi
awk -e '
BEGIN {
is_first_line=1;
in_fm=0;
}
/^---$/ {
if (is_first_line) {
in_fm=1;
}
}
{
if (! in_fm) {
print $0;
}
}
/^(---|...)$/ {
if (! is_first_line) {
in_fm=0;
}
is_first_line=0;
}
' "$in_file" >> "$tmp_out_file"
if $replace
then
mv "$tmp_out_file" "$out_file"
fi

Make some replacements on a bunch of files depending the number of columns per line

I'm having a problem dealing with some files. I need to perform a column count for every line in a file and depending the number of columns i need to add severals ',' in in the end of each line. All lines should have 36 columns separated by ','
This line solves my problem, but how do I run it in a folder with several files in a automated way?
awk ' BEGIN { FS = "," } ;
{if (NF == 32) { print $0",,,," } else if (NF==31) { print $0",,,,," }
}' <SOURCE_FILE> > <DESTINATION_FILE>
Thank you for all your support
R&P
The answer depends on your OS, which you haven't told us. On UNIX and assuming you want to modify each original file, it'd be:
for file in *
do
awk '...' "$file" > tmp$$ && mv tmp$$ "$file"
done
Also, in general to get all records in a file to have the same number of fields you can do this without needing to specify what that number of fields is (though you can if appropriate):
$ cat tst.awk
BEGIN { FS=OFS=","; ARGV[ARGC++] = ARGV[ARGC-1] }
NR==FNR { nf = (NF > nf ? NF : nf); next }
{
tail = sprintf("%*s",nf-NF,"")
gsub(/ /,OFS,tail)
print $0 tail
}
$
$ cat file
a,b,c
a,b
a,b,c,d,e
$
$ awk -f tst.awk file
a,b,c,,
a,b,,,
a,b,c,d,e
$
$ awk -v nf=10 -f tst.awk file
a,b,c,,,,,,,
a,b,,,,,,,,
a,b,c,d,e,,,,,
It's a short one-liner with Perl:
perl -i.bak -F, -alpe '$_ .= "," x (36-#F)' *
if this is only a single folder without subfolders, use:
for oldfile in /path/to/files/*
do
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
done
if you also want to include subdirectories recursively, it's probably easiest to put the awk+redirection into a small shell-script, like this:
#!/bin/bash
oldfile=$1
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
and then run this script (let's calls it runawk.sh) via find:
find /path/to/files/ -type f -not -name "*.new" -exec runawk.sh \{\} \;

Any way to find if two adjacent new lines start with certain words?

Say I have a file like so:
+jaklfjdskalfjkdsaj
fkldsjafkljdkaljfsd
-jslakflkdsalfkdls;
+sdjafkdjsakfjdskal
I only want to find and count the amount of times during this file a line that starts with - is immediately followed by a line that starts with +.
Rules:
No external scripts
Must be done from within a bash script
Must be inline
I could figure out how to do this in a Python script, for instance, but I've never had to do something this extensive in Bash.
Could anyone help me out? I figure it'll end up being grep, perl, or maybe a talented sed line -- but these are things I'm still learning.
Thank you all!
grep -A1 "^-" $file | grep "^+" | wc -l
The first grep finds all of the lines starting with -, and the -A1 causes it to also output the line after the match too.
We then grep that output for any lines starting with +. Logically:
We know the output of the first grep is only the -XXX lines and the following lines
We know that a +xxx line cannot also be a -xxx line
Therefore, any +xxx lines must be following lines, and should be counted, which we do with wc -l
Easy in Perl:
perl -lne '$c++ if $p and /^\+/; $p = /^-/ }{ print $c' FILE
awk one-liner:
awk -v FS='' '{x=x sprintf("%s", $1)}END{print gsub(/-\+/,"",x)}' file
e.g.
kent$ cat file
+jaklfjdskalfjkdsaj
fkldsjafkljdkaljfsd
-jslakflkdsalfkdls;
+sdjafkdjsakfjdskal
-
-
-
+
-
+
foo
+
kent$ awk -v FS='' '{x=x sprintf("%s", $1)}END{print gsub(/-\+/,"",x)}' file
3
Another Perl example. Not as terse as choroba's, but more transparent in how it works:
perl -e'while (<>) { $last = $cur; $cur = $_; print $last, $cur if substr($last, 0, 1) eq "-" && substr($cur, 0, 1) eq "+" }' < infile
Output:
-jslakflkdsalfkdls;
+sdjafkdjsakfjdskal
Pure bash:
unset c p
while read line ; do
[[ $line == +* && $p == 0 ]] && (( c++ ))
[[ $line == -* ]]
p=$?
done < FILE
echo $c

Using sed to delete a case insensitive matched line

How do I match a case insensitive regex and delete it at the same time
I read that to get case insensitive matches, use the flag "i"
sed -e "/pattern/replace/i" filepath
and to delete use d
sed -e "/pattern/d" filepath
I've also read that I could combine multiple flags like 2iw
I'd like to know if sed could combine both i and d
I've tried the following but it didn't work
sed -e "/pattern/replace/id" filepath > newfilepath
For case-insensitive use /I instead of /i.
sed -e "/pattern/Id" filepath
you can use (g)awk as well.
# print case insensitive
awk 'BEGIN{IGNORECASE=1}/pattern/{print}' file
# replace with case insensitive
awk 'BEGIN{IGNORECASE=1}/pattern/{gsub(/pattern/,"replacement")}1' file
OR just with the shell(bash)
#!/bin/bash
shopt -s nocasematch
while read -r line
do
case "$line" in
*pattern* ) echo $line;
esac
done <"file"
I produced this one-liner because Ansible cannot handle different lv with the same name. This convert near CSV into perfect JSON. Possibly, you want to change the -F flag to change the field separator.
lvs | perl -ane '
local %tmp,$i=0;
while($i<#f){
$tmp{$f[$i]}=$F[$i] if $F[$i];
$i++
};
if(#f){push #ans,\%tmp}
else{ #f=#F };
END { print to_json(\#ans,{pretty=>1}) }
' -MJSON