Related
I have a textfile with lines of text where I want to move a pattern to the beginning of the line with sed. The pattern is the sequences like [35 of 44].
CSV files and Jupyter _ Even More Python for Beginners - Data Tools [35 of 44].description
Calling An API _ Python for Beginners [36 of 44].description
With \\[.*?\\] I can match this part [11 of 31] of the line, but I can't figure out how to move this pattern to the start of the line.
[35 of 44] CSV files and Jupyter _ Even More Python for Beginners - Data Tools.description
[36 of 44] Calling An API _ Python for Beginners.description
Hopefully, someone can help me!
You need to capture both what matches and what precedes it to do the replacement. In sed, the \(…\) captures what's in the … part. Hence:
sed -e 's/\(.*\)\(\[[^]]*\]\)/\2\1/'
Using single quotes on the command line avoids needing to use doubled-up backslashes.
As shown, this generates:
[35 of 44]CSV files and Jupyter _ Even More Python for Beginners - Data Tools .description
[36 of 44]Calling An API _ Python for Beginners .description
If you want a space after the [n of m] information, add it:
sed -e 's/\(.*\)\(\[[^]]*\]\)/\2 \1/'
Note that if there are two or more [n of m] sequences on the line, only the last one will be moved. Also, the search does not enforce that the material between the square brackets is of the form [1 of 2] (number of number). It would be possible to do so; it is not clear that it is worth worrying about it.
With your shown samples, please try following.
awk '
match($0,/\[[^]]*\]/){
print substr($0,RSTART,RLENGTH),substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)
}
' Input_file
Explanation: Using match function of awk to match from [ to till ] in each line then printing the matched text's sub string first followed by rest of the line's value.
You can use the following POSIX ERE based sed command:
sed -E 's/(.*[^[:space:]])[[:space:]]*(\[[0-9]+ of [0-9]+])/\2 \1/' file
Details:
-E - enables POSIX ERE syntax (less escaping)
(.*[^[:space:]]) - Group 1: any text and then a non-whitespace char
[[:space:]]* - zero or more whitespace
(\[[0-9]+ of [0-9]+]) - Group 2: [, one or more digits, space, of, space, one or more digits, ].
The replacement is \2 \1, that is, Group 1 value, space, Group 2 value.
See the online demo:
s='CSV files and Jupyter _ Even More Python for Beginners - Data Tools [35 of 44].description
Calling An API _ Python for Beginners [36 of 44].description'
sed -E 's/(.*[^[:space:]])[[:space:]]*(\[[0-9]+ of [0-9]+])/\2 \1/' <<< "$s"
Output:
[35 of 44] CSV files and Jupyter _ Even More Python for Beginners - Data Tools.description
[36 of 44] Calling An API _ Python for Beginners.description
Latex returns error when I write # in \mintinline
When i delete # problem disappears.
\section{Example 1 - \mintinline{bash}{${#parameter}}}
Can somebody help?
Error messages
Code listing:
\documentclass[11pt]{article}
\usepackage[utf8]{inputenc}
\usepackage{minted}
\begin{document}
\section{Example 1 - \mintinline{bash}{${#parameter}}}
\end{document}
With a little help from the cprotect package:
\documentclass{article}
\usepackage{minted}
\usepackage{cprotect}
\begin{document}
\cprotect\section[Example 1]{Example 1 - \mintinline{bash}|${#parameter}| }
\end{document}
Both $ and # are special characters in LaTeX: $ opens and closes ‘maths mode’, and # refers to a numbered parameter of a function.
If you need to refer to them as ordinary characters, you need to escape them with \$ and \# respectively (to be pedantic, \$ isn't ‘escaping’ as such, but instead a command \$ which expands to $ as an ordinary character).
That's presuming that \mintinline doesn't do something clever to make special characters non-special (some macros do so, for convenience). Presuming not, and recalling that { and } are special characters to, I guess that you can get what you want with
\mintinline{bash}{\$\{\#parameter\}}}
(which is unfortunately a bit of a mess to type...).
I have the following (gnu) sed script, which is intended to parse another sed script, and output distinct commands on a separate line.
In words, this script should put a newline after each semicolon ;, except semicolons that are inside a matching or substitution command.
Sed script:
#!/bin/sed -rf
# IDEA:
# replace ';' by ';\n' except when it's inside a match expression or subst. expression.
# Ignored patterns:
/^#/b # commented lines
/^$/b # empty lines
# anything in a single line, without semicolon except at the end
/^[^\n;]*;?$/b
# Processed patterns (put on separate lines):
# Any match preceding a semicolon, or the end of the line, or a substitution
s_/^[^/]+/[^;s]*;?_&\n_; t printtopline
s/^\\(.)[^\1]+\1[^;s]*;?/&\n/;t printtopline
# Any substitution (TODO)
# Any other command, separated by semicolon
s/\;/\;\n/; t printtopline;
:printtopline
P;D; # print top line, delete it, start new cycle
For example, I tested it with the following file (actually adapted from an answer of #ctac_ to one of my previous sed questions):
Input file:
#!/bin/sed -f
#/^>/N;
:A;
/\n>/!{s/\n/ /;N;bA}; # join next line if not a sequence label
#h;
#s/\(.*\)\n.*/\1/p;
s/^>//g;P
#x;
#s/.*\n//;
D
bA;
Output
The above script produces the right output, for example, the line /\n>/!{s/\n/ /;N;bA}; # join next line if not a sequence label becomes:
/\n>/!{s/\n/ /;
N;
bA};
# join next line if not a sequence label
Question
However, could you help me understand why this part of the script works:
s/\;/\;\n/; t printtopline;
:printtopline
?
I seems to me that the branching command t printtopline is useless here. I thought whatever the success of the substitution, the next thing to be executed would be :printtopline.
However, if I comment out the t command, or if I replace it with b, the script produces the following output lines:
/\n>/!{s/\n/ /;
N;bA}; # join next line if not a sequence label
From info sed, here is the explanation of t:
't LABEL'
Branch to LABEL only if there has been a successful 's'ubstitution
since the last input line was read or conditional branch was taken.
The LABEL may be omitted, in which case the next cycle is started.
Why isn't the t command immediately followed by its label not behaving like no command at all or the b command?
The crucial part is this:
Branch to label only if there has been a successful substitution since the last input line was read or conditional branch was taken.
I.e. t looks into the past and takes into account the success of all recent substitutions up to the most recent
input, or
conditional branch.
Consider the input line you're asking about. After all the substitutions we have
/\n>/!{s/\n/ /;
N;bA}; # join next line if not a sequence label
in our pattern space when we reach P;D;. The P commands outputs the first line, then D deletes the first line and restarts the main loop. Now we just have
N;bA}; # join next line if not a sequence label
Note that this didn't involve reading any additional lines. No input occurred; D just removed parts of the pattern space.
We process the remaining text (which does nothing because none of the other patterns match) until we reach this part of the code:
s_/^[^/]+/[^;s]*;?_&\n_; t printtopline
The substitution fails (the pattern space doesn't contain /^). But the t command doesn't check the status of just this one s command; it looks at the history of all substitutions since the most recent input or conditional branch taken.
The most recent input occurred when /\n>/!{s/\n/ /;N;bA}; was read.
The most recent conditional branch taken was
s/\;/\;\n/; t printtopline;
:printtopline
in the original version of your code. Since then no other substitution succeeded, so the t command does nothing. The rest of the program continues as expected.
But in the modified version of your code there was no conditional branch at this point (b is an unconditional branch):
s/\;/\;\n/; b printtopline;
:printtopline
That means the t from s_/^[^/]+/[^;s]*;?_&\n_; t printtopline "sees" the s/\;/\;\n/; as having succeeded, so it immediately jumps to the P;D; part. This is what outputs
N;bA}; # join next line if not a sequence label
unmodified.
In summary: t makes a difference here not because of its immediate effect of jumping to a label, but because it serves as a dynamic delimiter for the next t that gets executed. Without t here, the previously executed s command is taken into account for the next t.
Part 1 - how the P;D; sequence works.
Compare this two command's outputs: sed 's/;/;\n/' and sed 's/;/;\n/; P;D;'.
First:
$ sed 's/;/;\n/' <<< 'one;two;three;four'
one;
two;three;four
Second:
$ sed 's/;/;\n/; P;D;' <<< 'one;two;three;four'
one;
two;
three;
four
Why the difference? I will to explain.
The first command substitutes only the first occurrence of the ; character. To substitute all occurrences, the g modifier should be added to the s command: sed 's/;/;\n/g'.
The second command works this way:
sed 's/;/;\n/; - the same as the first command - no difference. Before this command the pattern space is one;two;three;four, after - one\ntwo;three;four.
P; -
from man: "Print up to the first embedded newline of the current pattern space."
That is, it prints up to first newline - one. The pattern space stay unchanged: one\ntwo;three;four
D; -
from man: "If pattern space contains no newline, start a normal new cycle as if the d command was
issued. Otherwise, delete text in the pattern space up to the first newline, and restart
cycle with the resultant pattern space, without reading a new line of input."
In the our case, pattern space has newline - one\ntwo;three;four. The D; removes the one\n part and repeat all commands cycle from the beginning. Now, the pattern space is: two;three;four.
That is, again sed 's/;/;\n/; - pattern space: two\nthree;four, then P; - print two, pattern space unchanged: two\nthree;four, D; - removes two\n, pattern space becomes: three;four. Etc.
Part 2 - what happening with branching.
I looked at the sed source code and found next information:
When the s command is executing and having match, the replaced flag is setting to the true:
/* We found a match, set the 'replaced' flag. */
replaced = true;
The t command is executing, if the replaced flag is true. And it is changing this flag to the false:
case 't':
if (replaced)
{
replaced = false;
So, in the first, s/\;/\;\n/; t printtopline; case, the substitution is successful - therefore, replaced flag is setting to the true. Then, the following t command is running and changing replaced flag back to the false.
In the second case, without t command - s/\;/\;\n/;, substitution is successful, too - therefore, replaced flag is setting to the true.
But now, this flag is stored to the next cycle, initiated by the D command. So, then the first t command appears in the new cycle - s_/^[^/]+/[^;s]*;?_&\n_; t printtopline, it checks the replaced flag, sees, that the flag is true and jumps to the label :printtopline, omitting all other commands before the label.
The pattern space doesn't have newlines, so P;D; sequence just prints pattern space and starts the next cycle with the new line of input.
The following Groovy commands illustrate my problem.
First of all, this works (as seen on lotrepls.appspot.com) as expected (note that \u0061 is 'a').
>>> print "a".matches(/\u0061/)
true
Now let's say that we want to match \n, using the Unicode escape \u000A. The following, using "pattern" as a string, behaves as expected:
>>> print "\n".matches("\u000A");
Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting anything but ''\n''; got it anyway
# line 1, column 21. 1 error
This is expected because in Java at least, Unicode escapes are processed early (JLS 3.3), so:
print "\n".matches("\u000A")
really is the same as:
print "\n".matches("
")
The fix is to escape the Unicode escape, and let the regex engine process it, as follows:
>>> print "\n".matches("\\u000A")
true
Now here's the question part: how can we get this to work with the Groovy /pattern/ syntax instead of using string literal?
Here are some failed attempts:
>>> print "\n".matches(/\u000A/)
Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting EOF, found '(' # line 1, column 19.
1 error
>>> print "\n".matches(/\\u000A/)
false
>>> print "\\u000A".matches(/\\u000A/);
true
~"[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F-\u009F]"
Appears to be working as it should. According to the docs I've seen, the double backslashes shouldn't be required with a slashy string, so I don't know why the compiler's not happy with them.
Firstly, it seems Groovy changed in this regard in the meantime, at least on https://groovyconsole.appspot.com/ and a local Groovy shell, "\n".matches(/\u000A/) works perfectly fine, evaluating to true.
In case you have a similar situation again, just encode the backslash with a unicode escape like in "\n".matches(/\u005Cu000A/) as then the unicode escape to character conversion makes it a backslash again and then the sequence for the regex parser is kept.
Another option would be to separate the backslash from the u for example by using "\n".matches(/${'\\'}u000A/) or "\n".matches('\\' + /u000A/)
The following Scala code does just what I expect it to - it prints each line of some_file.txt.
import scala.io.Source
val lines = Source.fromPath("some_file.txt").mkString
for (line <- lines) print(line)
If I use println instead of print, I expect to see some_file.txt printed out with double-spacing. Instead, the program prints a newline after every character of some_file.txt. Could someone explain this to me? I'm using Scala 2.8.0 Beta 1.
lines is a single string, not some iterable container of strings. This is because you called the .mkString method on it.
When you iterate over a string, you do so one character at a time. So the line in your for is not actually a line, it's a single character.
What you probably intended to do was call .getLines instead of .mkString
I suspect that for (line <- lines) print(line) doesn't put a line in line but instead a character. Making the output as expected since the \n is there too. When you the replace the print with println every character gets its own line.