I'd like to re-format a text file such that the line immediately bellow the matching string gets cut and appended to the line with the matching string. Here is an example text:
Answer:
renice
X.
find / -name filename &
Y.
find / -name filename
Z.
bg find / -name filename
I'm looking for the end result:
Answer: renice
X. find / -name filename &
Y. find / -name filename
Z. bg find / -name filenames
I'm unable to get the following right trim suggestion:
$str =~ s/\s+$//;
To generate the result I need inline. The space is gone, but the string I need is still on the line bellow. The lines to cut and paste only occur directly bellow "Answer:" "X." "Y." or "Z."
It would help to see your full solution, but here's a one-liner that does what you'd like:
perl -pe's/\s*$/ / if /^.+?[:.]/'
This replaces any whitespace, including a newline, at the end of the string with a single space, but only if the pattern matches. The pattern looks for some characters at the beginning of the line followed by a period or colon. Add -i.bak to modify files in-place. Hope this helps!
Related
I have a command like this, it is marking words to appear in an index in the document:
sed -i "s/\b$line\b/\\\keywordis\{$line\}\{$wordis\}\{$definitionis\}/g" file.txt
The problem is, it is finding matches within existing matches, which means its e.g. "hello" is replaced with \keywordis{hello}{a common greeting}, but then "greeting" might be searched too, and \keywordis{hello}{a common \keywordis{greeting}{a phrase used when meeting someone}}...
How can I tell sed to perform the replacement, but ignore text that is already inside curly brackets?
Curley brackets in this case will always appear on the same line.
How can I tell sed to perform the replacement, but ignore text that is already inside curly brackets?
First tokenize input. Place something unique, like | or byte \x01 between every \keywordis{hello}{a common greeting} and store that in hold space. Something along s/\\the regex to match{hello}{a common greeting}/\x01&\x01/g'.
Ten iterate over elements in hold space. Use \n to separate elements already parsed from not parsed - input from output. If the element matches the format \keywordis{hello}{a common greeting}, just move it to the front before the newline in hold space, if it does not, perform the replacement. Here's an example: Identify and replace selective space inside given text file , it uses double newline \n\n as input/output separator.
Because, as you noted, replacements can have overlapping words with the patterns you are searching for, I believe the simplest would be after each replacement shuffling the pattern space like for ready output and starting the process all over for the current line.
Then on the end, shuffle the hold space to remove \x01 and newline and any leftovers and output.
Overall, it's Latex. I believe it would be simpler to do it manually.
By "eating" the string from the back and placing it in front of input/output separator inside pattern space, I simplified the process. The following program:
sed '
# add our input/output separator - just a newline
s/^/\n/
: loop
# l1000
# Ignore any "\keywords" and "{stuff}"
/^\([^\n]*\)\n\(.*\)\(\\[^{}]*\|{[^{}]*}\)$/{
s//\3\1\n\2/
b loop
}
# Replace hello followed by anthing not {}
# We match till the end because regex is greedy
# so that .* will eat everything.
/^\([^\n]*\)\n\(.*\)hello\([{}]*\)$/{
s//\\keywordis{hello}{a common greeting}\3\1\n\2/
b loop
}
# Hello was not matched - ignore anything irrelevant
# note - it has to match at least one character after newline
/^\([^\n]*\)\n\(.*\)\([^{}]\+\)$/{
s//\3\1\n\2/
b loop
}
s/\n//
' <<<'
\keywordis{hello}{hello} hello {some other hello} another hello yet
'
outputs:
\keywordis{hello}{hello} \keywordis{hello}{a common greeting} {some other hello} another \keywordis{hello}{a common greeting} yet
I've been messing arround with Powershell and googling various things as I go along. This one is a little hard to put into words that google woule understand. I can get the indevidual lines of a text file in powershell by indexing:
$textFile = Get-Content "myText.txt"
$textFile[0]
This would output the first line of the text file. But when I put the text file in quotes it will output all lines, even with the index
"$textFile[0]"
How can I still get only get the line I want, while wrapping the variable in quotes? If I try "$textFile"[0] it will just give me the whole file as before. The reason I'm trying to do this is because I'm trying to make that one line of the text file part of a bigger string that I can execute
$remote = "Enter-PSSession -ComputerName`", textFile[0]"
Invoke-Expression $remote
This is my way of illustrating what I'm trying to do.
You can use any of the following methods:
# Sub-expression operator
"Some Text $($textFile[0])"
# String format operator
"My Text {0}" -f $textFile[0]
# Concatenation
("Text"+$textFile[0])
Surrounding double quotes tells PowerShell to expand the string inside. Any variables within will be interpolated. Variables begin with $ and their following names can only have certain characters without requiring a special escape. [ would require an escape and since it isn't escaped, PowerShell interprets the variable name ending with the character just before the [. Therefore $textFile is interpolated, the whole file contents are converted into a string, and [0] is appended to the end of the string.
You can see details of the operators at About_Operators.
See About_Variables for how to create a variable including cases with special characters even if that doesn't directly apply here.
I found this magical command on the unix forum to move the last line of a file to the beginning of the file. I use sed quite a bit but not to this extent. Can someone explain each part to me?
sed '1h;1d;$!H;$!d;G' infile
Yes, it uses exotic commands.
1h: put first line in the "hold" space (sed has 2 spaces: 1 hold space to keep data and the pattern space: actual processed line)
1d: delete first line
$!H: append all lines BUT the last one (and the first one since d command skips to the next line) into the "hold" space
$!d: delete (do not print) all lines except the last one
G: Append a newline to the contents of the pattern space (this is the last line, the only one able to reach that part of the script), and then append the contents of the hold space to that of the pattern space, pattern space which is printed right away. Swap done.
Opinion based comment: I must admit I would never have thought of doing that using sed, and I would have had to make a test to convince me of what this command was doing... in awk, it is much much easier to do that.
But sed has a special place in my heart with it's cryptic commands. I wonder if there are some sed candidates to CodeGolf :)
reference manual: https://www.gnu.org/software/sed/manual/sed.html
some exotic things you can do with sed (my best 1999 read): http://sed.sourceforge.net/grabbag/tutorials/do_it_with_sed.txt
Here is the same command in a more procedural-looking pseudocode:
for line in infile:
# Always do this: Copy the current line to the pattern
pattern = line
# Process the script
if first line:
hold = pattern # 1h
pattern = ""; continue # 1d
elif not last line:
hold = hold + "\n" + pattern # $!H
pattern = ""; continue # $!d
pattern = pattern + "\n" + hold # G
# Always do this after the script is completed.
# Due to the continue statements above, this
# isn't always reached, and in this case
# is only reached for the last line.
print pattern
d clears the pattern space and continues to the next input line without executing the rest of the script.
h copies the pattern space to the hold space.
H appends a newline to the hold space, then appends the pattern space to the hold space.
G is like H, but in the other direction; it copies the hold space to the pattern space.
The overall affect on a file with N lines is to build up a copy of lines 1 through N-1 in the hold space. When the pattern holds line N, append the hold space to the pattern space and print the pattern space to standard output.
I want to remove duplicate lines from a file, without sorting the file.
Example of why this is useful to me: removing duplicates from Bash's $HISTFILE without changing the chronological order.
This page has a one-liner to do that:
http://sed.sourceforge.net/sed1line.txt
Here's the one-liner:
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
I asked a sysadmin and he told me "you just copy the script and it works, don't go philosophising about this", which is fine, so I am asking here as it's a developer forum and I trust people might be like me, suspicious about using things they don't understand:
Could you kindly provide a pseudo-code explanation of what that "black magic" script is doing, please? I tried parsing the incantation in my head but especially the central part is quite hard.
I'll note that this script does not appear to work with my copy of sed (GNU sed 4.1.5) in my current locale. If I run it with LC_ALL=C it works fine.
Here's an annotated version of the script. sed basically has two registers, one is called "pattern space" and is used for (basically) the current input line, and the other, the "hold space", can be used by scripts for temporary storage etc.
sed -n ' # -n: by default, do not print
G # Append hold space to current input line
s/\n/&&/ # Add empty line after current input line
/^\([ -~]*\n\).*\n\1/d # If the current input line is repeated in the hold space, skip this line
# Otherwise, clean up for storing all input in hold space:
s/\n// # Remove empty line after current input line
h # Copy entire pattern space back to hold space
P # Print current input line'
I guess the adding and removal of an empty line is there so that the central pattern can be kept relatively simple (you can count on there being a newline after the current line and before the beginning of the matching line).
So basically, the entire input file (sans duplicates) is kept (in reverse order) in the hold space, and if the first line of the pattern space (the current input line) is found anywhere in the rest of the pattern space (which was copied from the hold space when the script started processing this line), we skip it and start over.
The regex in the conditional can be further decomposed;
^ # Look at beginning of line (i.e. beginning of pattern space)
\( # This starts group \1
[ -~] # Any printable character (in the C locale)
* # Any number of times
\n # Followed by a newline
\) # End of group \1 -- it contains the current input line
.*\n # Skip any amount of lines as necessary
\1 # Another occurrence of the current input line, with newline and all
If this pattern matches, the script discards the pattern space and starts over with the next input line (d).
You can get it to work independently of locale by changing [ -~] to [[:print:]]
The code doesn't work for me, perhaps due to some locale setting, but this does:
vvv
sed -n 'G; s/\n/&&/; /^\([^\n]*\n\).*\n\1/d; s/\n//; h; P'
^^^
Let's first translate this by the book (i.e. sed info page), into something perlish.
# The standard sed loop
my $hold = "";
while ($my pattern = <>) {
chomp $pattern;
$pattern = "$pattern\n$hold"; # G
$pattern =~ s/(\n)/$1$1/; # s/\n/&&/
if ($pattern =~ /^([^\n]*\n).*\n\1/) { # /…/
next; # d
}
$pattern =~ s/\n//; # s/\n//
$hold = $pattern; # h
$pattern =~ /^([^\n]*\n?)/; print $1; # P
}
OK, the basic idea is that the hold space contains all the lines seen so far.
G: At the beginning of each cycle, append that hold space to the current line. Now we have a single string consisting of the current line and all unique lines which preceeded it.
s/\n/&&/: Turn the newline which separates them into a double newline, so that we can match subsequent and non-subsequent duplicates the same, see the next step.
^\([^\n]*\n\).*\n\1/: Look through the current text for the following: at the beginning of all the lines (^) look for a first line including trailing newline (\([^\n]*\n\)), then anything (.*), then a newline (\n), and then that same first line including newline repeated again (\1). If two subsequent lines are the same, then the .* in the regular expression will match the empty string, but the two \n will still match due to the newline duplication in the preceding step. So basically this asks whether the first line appears again among the other lines.
d: If there is a match, this is a duplicate line. We discard this input, keep the hold space as it is as a buffer of all unique lines seen so far, and continue with the next line of input.
s/\n//: Otherwise, we continue and next turn the double newline back into a single newline.
h: We include the current line in our list of all unique lines.
P: And finally print this new unique line, up to the newline character.
For the actual problem to resolve, here is a simpler solution (at least it looks so) with awk:
awk '!_[$0]++' FILE
In short _[$0] is a counter (of appearance) for each unique line, for any line ($0) appearing for the second time _[$0] >= 1, thus !_[$0] evaluates to false, causing it not to be printed except its first time appearance.
See https://gist.github.com/ryenus/5866268 (credit goes to a recent forum I visited.)
I have several thousand large text files that I need to clean up. I need any line that ends with a comma to end with a comma followed by a period (,.).
I found the following, which works for every line except the last line. It must be close to what I need but I can't figure out how to make it work on the last line as well.
find . -name "*.txt" -print | xargs sed -i ':a;N;$!ba;s/,\n/,\.\n/g'
My data looks something like this:
0,0,0,193,17,.,.,
0,0,0,174,19,.,.,
0,0,0,124,14,.,.,
I need it to look like this:
0,0,0,193,17,.,.,.
0,0,0,174,19,.,.,.
0,0,0,124,14,.,.,.
sed 's/,$/,./'
($ means end of line.)