I am trying to write a Shell script to edit an input file. Data is structured in the input file as follow:
1000␣␣␣␣␣␣␣␣␣␣␣␣: final time
1000 : print time
0.1 : time step
The alignment is made with whitespaces as emphasized in the first line.
I am currently using sed to replace the parameters (first "word" of each line).
I couldn't find a way to do it without messing the alignment of the labels. I'm open to any suggestions, I don't particularly want to achieve this with sed. It is possible to change the structure of the input file by using tabs for example.
Here's an example of what I would like the script to do:
input file
----------
1000␣␣␣␣␣␣␣␣␣␣␣␣: final time
1000 : print time
0.1 : time step
running the script
------------------
$ script --final-time=100
input file after running the script
-----------------------------------
100␣␣␣␣␣␣␣␣␣␣␣␣␣: final time
1000 : print time
0.1 : time step
The length of the replacement string is not know in advance. It's not fixed and can be up to 6 characters.
With GNU awk:
awk -v value="100" 'BEGIN{FS=OFS=" : "} $2=="final time" {$1=sprintf("%-15s",value)}1' file
Output:
100 : final time
1000 : print time
0.1 : time step
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
This might work for you (GNU sed):
sed -E '/final time/{s/.*/\n&\n100/;:a;s/\n[^:](.*\n)(.)/\2\n\1/;s/\n[^:](.*\n)$/ \n\1/;ta;s/\n//g}' file
In overview, replace the first field by the replacement value a character at a time, making sure to overwrite the original value by spaces if the replacement string is shorter.
If the line contains the required match, prepend a newline to the start of the pattern space and a newline followed by the required replacement string to the end of the pattern space.
Within a loop: if the first character following the first newline is not a colon : i.e. the character which denotes the separation of the first field to the second, replace it by the first character following the second newline and replace the first newline after the replacement character. If there are no characters following the second newline and the first character following the first newline is neither a space or a colon, replace it by a space and again shuffle the first newline. Otherwise, the replacement has been successful and remove all introduced newlines.
Related
Take the string "hello_world 1 2 3"
I want the output to be "hello_world"
My attempt is "s/\(.*\) .*/\1/g"
But I get "hello_world 1 2"
Instead of stopping at the first space after the sequence, it gets the last space on the line.
I want to take any length of characters \(.*\) followed by a space ' ' and remove anything that comes after it .*
How can I do it?
Could you please try following.
echo "hello_world 1 2 3" | sed 's/\([^ ]*\).*/\1/'
Explanation of above:
Using sed's capability of storing matched regex into a temp buffer. Which could be later accessed by variables like 1, 2 and so on(depending upon number of buffers you are mentioning).
In here we are capturing everything till occurrence of first space into 1st temp buffer and then keeping everything as it is .*. While substituting we are mentioning \1 here which means substitute whole line's value with first matched/stored value of 1st temp buffer(which is hello_world).
Why OP's code not working: Because OP using .* which is a greedy matched regex and capturing all the line in 1st buffer itself that's why when its used \1 its actually printing whole line there.
This might work for you (GNU sed):
sed 's/\s.*//' file
Matches the first white space character and everything thereafter and removes it, leaving whatever is in front of i.e. all non-white space characters.
Same as:
sed 's/^(\S+).*/\1/' -E file
I have a plain text file:
line1_text
line2_text
I need to add a number of whitespaces between the two lines.
Adding 10 whitespaces is easy.
But say I need to add 10000 whitespaces, how would I achieve that using sed?
P.S. This is for experimental purposes
There undoubtedly is a sed method to do this but, since sed does not have any natural understanding of arithmetic, it is not a natural choice for this problem. By contrast, awk understands arithmetic and can readily, for example, print an empty line n times for any integer value of n.
As an example, consider this input file:
$ cat infile
line1_text
line2_text
This code will add as many blank lines as you like before any line that contains the string line2_text:
$ awk -v n=5 '/line2_text/{for (i=1;i<=n;i++)print""} 1' infile
line1_text
line2_text
If you want 10,000 blank lines instead of 5, then replace n=5 with n=10000.
How it works
-v n=5
This defines an awk variable n with value 5.
/line2_text/{for (i=1;i<=n;i++)print""}
Every time that a line matches the regex line2_text, then a for loop is performed with prints an empty line n times.
1
This is awk's shorthand for print-the-line and it causes every line from input to be printed to the output.
This might work for you (GNU sed):
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /&\n/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;x;G}' file
This appends the hold space to the first line. The hold space is manipulated to hold the required number of spaces by a looping mechanism based on powers of 2. This may produce more than necessary and the remainder are chopped off using a linefeed as a delimiter.
To change spaces to newlines, use:
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /\n&/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;s/ /\n/g;x;G}' file
In essence the same can be achieved using this (however it is very slow for large numbers):
sed -r '/line1_text/{x;:a;/ {20}/bb;s/^/ /;ta;:b;x;G}' file
I'm trying to extract the name of the file name that has been generated by a Java program. This Java program spits out multiple lines and I know exactly what the format of the file name is going to be. The information text that the Java program is spitting out is as follows:
ABCASJASLEKJASDFALDSF
Generated file YANNANI-0008876_17.xml.
TDSFALSFJLSDJF;
I'm capturing the output in a variable and then applying a sed operator in the following format:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p'
The result set is:
YANNANI-0008876_17.xml.
However, my problem is that want the extraction of the filename to stop at .xml. The last dot should never be extracted.
Is there a way to do this using sed?
Let's look at what your capture group actually captures:
$ grep 'YANNANI.\([[:digit:]]\).\([xml]\)*' infile
Generated file YANNANI-0008876_17.xml.
That's probably not what you intended:
\([[:digit:]]\) captures just a single digit (and the capture group around it doesn't do anything)
\([xml]\)* is "any of x, m or l, 0 or more times", so it matches the empty string (as above – or the line wouldn't match at all!), x, xx, lll, mxxxxxmmmmlxlxmxlmxlm, xml, ...
There is no way the final period is removed because you don't match anything after the capture groups
What would make sense instead:
Match "digits or underscores, 0 or more": [[:digit:]_]*
Match .xml, literally (escape the period): \.xml
Make sure the rest of the line (just the period, in this case) is matched by adding .* after the capture group
So the regex for the string you'd like to extract becomes
$ grep 'YANNANI.[[:digit:]_]*\.xml' infile
Generated file YANNANI-0008876_17.xml.
and to remove everything else on the line using sed, we surround regex with .*\( ... \).*:
$ sed -n 's/.*\(YANNANI.[[:digit:]_]*\.xml\).*/\1/p' infile
YANNANI-0008876_17.xml
This assumes you really meant . after YANNANI (any character).
You can call sed twice: first in printing and then in replacement mode:
sed -n 's/.*\(YANNANI.\([[:digit:]]\).\([xml]\)*\)/\1/p' | sed 's/\.$//g'
the last sed will remove all the last . at the end of all the lines fetched by your first sed
or you can go for a awk solution as you prefer:
awk '/.*YANNANI.[0-9]+.[0-9]+.xml/{print substr($NF,1,length($NF)-1)}'
this will print the last field (and truncate the last char of it using substr) of all the lines that do match your regex.
I'm running Windows and have the GnuWin32 toolkit, which includes sed. Specifically:
C:\TEMP>sed --version
GNU sed version 4.2.1
I have a text file with two sections: A fixed part I want to preserve, and a part that's appended after running a job.
In the file is a unique string that identifies the start of the part that's added, and I'd like to use Gnu sed to isolate only the part of the file that's before the unique string - i.e., so I can append different data to the fixed part each time the job is run.
I know I could keep the fixed portion in a separate file, but that adds complexity and it would be more elegant if I could just reuse the data at the start of the same file.
A long time ago I knew how to set up sed scripts, and I'm sure this can be done with sed, but I've slept since then. :)
Can you please describe how to use sed to display the lines of text in a file up to and not including a specific string?
Example:
line 1 of fixed portion
line 2 of fixed portion
unique string
line 1 of appended portion
line 2 of appended portion
line 3 of appended portion
What I'd like is to see as output:
line 1 of fixed portion
line 2 of fixed portion
I've gotten as far as:
sed -r -n -e "0,/unique string/p"
but that prints the unique string as well.
Thanks in advance.
-Noel
This should work for you:
sed -n '/unique string/q;p' file
It quits processing at unique string. Other lines get printed.
An alternative might be to use a range address like this:
sed -n '1,/unique string/{/unique string/!p}' file
Note that sed includes the range border. We need to exclude unique string from printing.
Furthermore I'm using the -n option which makes sed suppress the output of input lines by default.
One thing, if unique string can contain characters which are also syntax characters in the regex like ...
test*
... sed might not be the right tool for the job any more since it can only match regular expressions but not fixed strings.
In that case awk might be the tool of choice:
awk 'index("*unique string*"){exit}1' file
index("string") returns a non zero value (the position) if the string has been found. We cancel further processing of input lines in that case and don't print that line as well.
The trailing 1 always evaluates to true and makes awk print all the lines until the previous condition applies.
All, I'm trying to run a sed command to strip out card numbers from certain files. I was trying to do this in a one-liner and I thought all was going well - but I realized that if my first substitute didn't match the pattern it continued into the next commands. Is there a way to get it to exit if there is no match?
We have 16-22 length card numbers on our system, so I wrote this with a variable length in mind. My specifications were to preserve the first 6 and last 4 of any 16+ digit number, and axe (asterisk) out anything in the middle.
sed 'h;s/[0-9]\{6\}\([0-9]\{5\}\)\([0-9]*\)[0-9]\{4\}/\1\2/;s/./*/g;x;s/\([0-9]\{6\}\)[0-9]*\([0-9]\{4\}\)/\1\2/;G;s/\n//;s/\([0-9]\{6\}\)\([0-9]\{4\}\)\(.*\)/\1\3\2/'
The problem lies in the fact that if this part of the command:
s/[0-9]\{6\}\([0-9]\{5\}\)\([0-9]*\)[0-9]\{4\}/\1\2/
Finds nothing, the pattern space remains the input. It continues into the next command which then replaces everything with asterisks. What I end up with is the input followed by an equal number of asterisks (if it does not match the "card number qualifications" in my first substitute). It works perfectly if it is what is deemed a possible card number.
Any ideas?
but I realized that if my first substitute didn't match the pattern it
continued into the next commands. Is there a way to get it to exit if
there is no match?
You can use branch commands. I added and commented them in place:
sed '
h;
s/[0-9]\{6\}\([0-9]\{5\}\)\([0-9]*\)[0-9]\{4\}/\1\2/;
## If last substitution command succeeds, go to label "a".
t a
## Begin next cycle (previous substitution command didn't succeed).
b
## Label "a".
:a
s/./*/g;
x;
s/\([0-9]\{6\}\)[0-9]*\([0-9]\{4\}\)/\1\2/;
G;
s/\n//;
s/\([0-9]\{6\}\)\([0-9]\{4\}\)\(.*\)/\1\3\2/
'
UPDATE due to comments.
So you want to transform
texttexttext111111222223333texttexttext
in
texttexttext111111*****3333texttexttext
Try:
echo "texttexttext111111222223333texttexttext" |
sed -e '
## Add newlines characters between the characters to substitute with "*".
s/\([0-9]\{6\}\)\([0-9]\{5\}\)\([0-9]*\)\([0-9]\{4\}\)/\1\n\2\3\n\4/;
## Label "a".
:a;
## Substitute first not-asterisk character between newlines with "*".
s/\(\n\**\)[^\n]\(.*\n\)/\1*\2/;
## If character before second newline is not an asterisk, repeat
## the substitution from label "a".
/^.*\*\n/! ta;
## Remove artificial newlines.
s/\n//g
## Implicit print.
'
Output:
texttexttext111111*****3333texttexttext
From man sed:
t label
If a s/// has done a successful substitution since the last
input line was read and since the last t or T command, then
branch to label; if label is omitted, branch to end of script.
T label
If no s/// has done a successful substitution since the last
input line was read and since the last t or T command, then
branch to label; if label is omitted, branch to end of script.
This is a GNU extension.
So I think you can just add T; after your first s command.