Add any number of whitespaces to file - sed

I have a plain text file:
line1_text
line2_text
I need to add a number of whitespaces between the two lines.
Adding 10 whitespaces is easy.
But say I need to add 10000 whitespaces, how would I achieve that using sed?
P.S. This is for experimental purposes

There undoubtedly is a sed method to do this but, since sed does not have any natural understanding of arithmetic, it is not a natural choice for this problem. By contrast, awk understands arithmetic and can readily, for example, print an empty line n times for any integer value of n.
As an example, consider this input file:
$ cat infile
line1_text
line2_text
This code will add as many blank lines as you like before any line that contains the string line2_text:
$ awk -v n=5 '/line2_text/{for (i=1;i<=n;i++)print""} 1' infile
line1_text
line2_text
If you want 10,000 blank lines instead of 5, then replace n=5 with n=10000.
How it works
-v n=5
This defines an awk variable n with value 5.
/line2_text/{for (i=1;i<=n;i++)print""}
Every time that a line matches the regex line2_text, then a for loop is performed with prints an empty line n times.
1
This is awk's shorthand for print-the-line and it causes every line from input to be printed to the output.

This might work for you (GNU sed):
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /&\n/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;x;G}' file
This appends the hold space to the first line. The hold space is manipulated to hold the required number of spaces by a looping mechanism based on powers of 2. This may produce more than necessary and the remainder are chopped off using a linefeed as a delimiter.
To change spaces to newlines, use:
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /\n&/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;s/ /\n/g;x;G}' file
In essence the same can be achieved using this (however it is very slow for large numbers):
sed -r '/line1_text/{x;:a;/ {20}/bb;s/^/ /;ta;:b;x;G}' file

Related

how to type the beginning or end in sed multiple-lines mode?

As we all knew,the "\‘" and "\’"
indicates the beginning or end respectively in multiple-lines mode.But under ASCII(or input-in-english) only "'" exists.
How to type the beginning?
This might work for you (GNU sed):
seq 3 | sed -n 'p;H;1h;$!d;g;l0
s/^.*$/ALL/mgp
s/\`.*$/START/mp
s/^.*\'\''/END/mp'
1
2
3
1\n2\n3$
ALL
ALL
ALL
START
ALL
ALL
START
ALL
END
The command seq generates a file of three consecutive integers.
The sed uses the -n option to turn off implicit printing and then slurps the three integers into hold space. Printing each integer as it is read.
The first substitution, replace all lines with the literal ALL.
The second substitution, replaces the first line with START.
The third substitution, replace the last line with END.
N.B. The use of the m(multiline), g(global) and p(print) substitution flags. Lastly, if the -z option is in use, these zero width anchors work with respect to null characters not newlines.

Matching patterns across lines

Suppose I have a file which contains:
something
line=1
file=2
other
lines
ignore
something
line=2
file=3
other
lines
ignore
Eventually, I want a unique list of the line and file combinations in each section. In the first stage I am trying to get sed to output just those lines combined into one line, like
line=1file=2
line=2file=3
Then I can use sort and uniq.
So I am trying
sed -n -r 's/(line=)(.*?)(\r)(file=)(.*?)(\r)/\1\2\4\5/p' sample.txt
(It isn't necessarily just a number after each)
But it won't match across the lines. I have tried \n and \r\n but it doesn't seem to be the style of new line, since:
sed -n -r 's/(line=)(.*?)(\r)/\1\2/p' sample.txt
Will output the "line=" lines, but I just can't get it to span the new line, and collect the second line as well.
By default, sed will operate only on chunks separated by \n character, so you can never match across multiple lines. Some sed implementations support -z option which will make it to operate on chunks separated by ASCII NUL character instead of newline character (this could work for small files, assuming NUL character won't affect the pattern you want to match)
There are also some sed commands that can be used for multiline processing
sed -n '/line=/{N;s/\n//p}'
N command will add the next line to current chunk being processed (which has to match line= in this case)
s/\n//p then delete the newline character, so that you get the output as single line
If your input has dos style line ending, first convert it to unix style (see Why does my tool output overwrite itself and how do I fix it?) or take care of \r as well
sed -n '/line=/{N;s/\r\n//p}'
Note that these commands were tested on GNU sed, syntax may vary for other implementations

How to use sed to isolate only the first part of a file

I'm running Windows and have the GnuWin32 toolkit, which includes sed. Specifically:
C:\TEMP>sed --version
GNU sed version 4.2.1
I have a text file with two sections: A fixed part I want to preserve, and a part that's appended after running a job.
In the file is a unique string that identifies the start of the part that's added, and I'd like to use Gnu sed to isolate only the part of the file that's before the unique string - i.e., so I can append different data to the fixed part each time the job is run.
I know I could keep the fixed portion in a separate file, but that adds complexity and it would be more elegant if I could just reuse the data at the start of the same file.
A long time ago I knew how to set up sed scripts, and I'm sure this can be done with sed, but I've slept since then. :)
Can you please describe how to use sed to display the lines of text in a file up to and not including a specific string?
Example:
line 1 of fixed portion
line 2 of fixed portion
unique string
line 1 of appended portion
line 2 of appended portion
line 3 of appended portion
What I'd like is to see as output:
line 1 of fixed portion
line 2 of fixed portion
I've gotten as far as:
sed -r -n -e "0,/unique string/p"
but that prints the unique string as well.
Thanks in advance.
-Noel
This should work for you:
sed -n '/unique string/q;p' file
It quits processing at unique string. Other lines get printed.
An alternative might be to use a range address like this:
sed -n '1,/unique string/{/unique string/!p}' file
Note that sed includes the range border. We need to exclude unique string from printing.
Furthermore I'm using the -n option which makes sed suppress the output of input lines by default.
One thing, if unique string can contain characters which are also syntax characters in the regex like ...
test*
... sed might not be the right tool for the job any more since it can only match regular expressions but not fixed strings.
In that case awk might be the tool of choice:
awk 'index("*unique string*"){exit}1' file
index("string") returns a non zero value (the position) if the string has been found. We cancel further processing of input lines in that case and don't print that line as well.
The trailing 1 always evaluates to true and makes awk print all the lines until the previous condition applies.

SED command to remove words at the end of the string

I want to remove last 2 words in the string which is in a file.
I am using this command first to delete the last word. But I couldn't do it. can someone help me
sed 's/\w*$//' <file name>
my strings are like this
Input:
asbc/jahsf/jhdsflk/jsfh/ -0.001 (exam)
I want to remove both numerical value and the one in brackets.
Output:
asbc/jahsf/jhdsflk/jsfh/
Using GNU sed:
$ sed -r 's/([[:space:]]+[-+.()[:alnum:]]+){2}$//' file
asbc/jahsf/jhdsflk/jsfh/
How it works
[[:space:]]+ matches one or more spaces.
[-+.()[:alnum:]]+ matches the 'words' which are allowed to contain any number of plus or minus signs, periods, parens, or any alphanumeric characters.
Note that, when a period is inside square brackets, [.], it is just a period, not a wildcard: it does not need to be escaped.
([[:space:]]+[-+.()[:alnum:]]+) matches one or more spaces followed by a word.
([[:space:]]+[-+.()[:alnum:]]+){2}$ matches two words and the spaces which precede them.
Note the use of character classes like [:space:] and [:alnum:]. Unlike the old-fashioned classes like [a-zA-Z0-9], these classes are unicode safe.
OSX (BSD) sed
The above was tested on GNU sed. For BSD sed, try:
sed -E 's/([[:space:]][[:space:]]*[-+.()[:alnum:][:alnum:]]*){2}$//' file
To remove everything that follows a number with decimal places
This looks for a decimal number with optional sign and removes it, the spaces which precede it, and everything which follows it:
$ sed -r 's/[[:space:]]+[-+]?[[:digit:]]+[.][[:digit:]]+[[:space:]].*//' file
asbc/jahsf/jhdsflk/jsfh/
How it works:
[[:space:]]+ matches one or more spaces
[-+]? matches zero or one signs.
[[:digit:]]+ matches any number of digits.
[.] matches a decimal point (period).
[[:digit:]]+ matches one or more digits following the decimal point.
[[:space:]] matches a space following the number.
.* matches anything which follows.
It looks like there is a tab between what you want to keep and what you want to get rid of. I don't have linux in front of me but try this.
sed 's/\t.*//'
This is assuming your strings are always formatted similarily which is what I take from your comment.
This might work for you (GNU sed):
sed -r 's/\s+\S+\s+\S+\s*$//' file
or if you prefer:
sed -r 's/(\s+\S+){2}\s*$//' file
This matches and removes: one or more whitespaces followed by one or more non-whitespaces twice followed by zero or more whitespaces at the end of the line.

Alternatives to grep/sed that treat new lines as just another character

Both grep and sed handle input line-by-line and, as far as I know, getting either of them to handle multiple lines isn't very straightforward. What I'm looking for is an alternative or alternatives to these two programs that treat newlines as just another character. Is there any tool that fits such a criteria
The tool you want is awk. It is record-oriented, not line-oriented, and you can specify your record-separator by setting the builtin variable RS. In particular, GNU awk lets you set RS to any regular expression, not just a single character.
Here is an example where awk uses one blank line to separate every record. If you show us what data you have, we can help you with it.
cat file
first line
second line
third line
fourth line
fifth line
sixth line
seventh line
eight line
more data
Running awk on this and reconstruct data using blank line as new record.
awk -v RS= '{$1=$1}1' file
first line second line third line
fourth line fifth line sixth line
seventh line eight line
more data
PS RS is not equal to file, is set to RS= blank, equal to RS=""
1) Sed can handle a block lines together, not always line by line.
In sed, normally I use :loop; $!{N; b loop}; to get all the lines available in pattern space delimited by newline.
Sample:
Productivity
Google Search\
Tips
"Web Based Time Tracking,
Web Based Todo list and
Reduce Key Stores etc"
result (remove the content between ")
sed -e ':loop; $!{N; b loop}; s/\"[^\"]*\"//g' thegeekstuff.txt
Productivity
Google Search\
Tips
You should read this URL (Unix Sed Tutorial: 6 Examples for Sed Branching Operation), it will give you detail how it works.
http://www.thegeekstuff.com/2009/12/unix-sed-tutorial-6-examples-for-sed-branching-operation/
2) For grep, check if your grep support -z option, which needn't handle input line by line.
-z, --null-data
Treat the input as a set of lines, each terminated by a zero
byte (the ASCII NUL character) instead of a newline. Like the
-Z or --null option, this option can be used with commands like
sort -z to process arbitrary file names.