TXR: How to combine all lines where the following line begins with a tab?

TXR: How to combine all lines where the following line begins with a tab? - text-processing

I am trying to parse the text output of a shell command using txr.
The text output uses a tab indented line following it to continue the current line (not literal \t characters as I show below). Note that on other variable assignment lines (that don't represent extended length values), there are leading spaces in the input.
Variable Group: 1
variable = the value of the variable
long_variable = the value of the long variable
\tspans across multiple lines
really_long_variable = this variable extends
\tacross more than two lines, but it
\tis unclear how many lines it will end up extending
\tacross ahead of time
Variable Group: 2
variable = the value of the variable in group 2
long_variable = this variable might not be that long
really_long_variable = neither might this one!
How might I capture these using the txr pattern language? I know about the #(freeform) directive and it's optional numeric argument to treat the next n lines as one big line. Thus, it seems to me the right approach would be something like:
#(collect)
Variable Group: #i
variable = #value
#(freeform 2)
long_variable = #long_value
#(set long_value #(regsub #/[\t ]+/ "" long_value))
#(freeform (count-next-lines-starting-with-tab))
really_long_variable = #really_long_value
#(set really_long_value #(regsub #/[\t ]+/ "" really_long_value))
#(end)
However, it's not clear to me how I might write the count-next-lines-starting-with-tab procedure with TXR lisp. On the other hand, maybe there is another better way I could approach this problem. Could you provide any suggestions?
Thanks in advance!

Let's apply the KISS principle; we don't need to bring in #(freeform). Instead we can separately capture the main line and the continuation lines for the (potentially) multi-line variables. Then, intelligently combine them with #(merge):
#(collect)
Variable Group: #i
variable = #value
long_variable = #l_head
# (collect :gap 0 :vars (l_cont))
#l_cont
# (end)
really_long_variable = #rl_head
# (collect :gap 0 :vars (rl_cont))
#rl_cont
# (end)
# (merge long_variable l_head l_cont)
# (merge really_long_variable rl_head rl_cont)
#(end)
Note that the big indentations in the above are supposed to be literal tabs. Instead of literal tabs, we can encode tabs using #\t.
Test run on the real data with \t replaced by tabs:
$ txr -Bl new.txr data
(i "1" "2")
(value "the value of the variable" "the value of the variable in group 2")
(l_head "the value of the long variable" "this variable might not be that long")(l_cont ("spans across multiple lines") nil)
(rl_head "this variable extends" "neither might this one!")
(rl_cont ("across more than two lines, but it" "is unclear how many lines it will end up extending"
"across ahead of time") nil)
(long_variable ("the value of the long variable" "spans across multiple lines")
("this variable might not be that long"))
(really_long_variable ("this variable extends" "across more than two lines, but it"
"is unclear how many lines it will end up extending" "across ahead of time")
("neither might this one!"))
We use a strict collect with :vars for the continuation lines, so that the variable is bound (to nil) even if nothing is collected. :gap 0 prevents these inner collects from scanning across lines that don't start with tabs: another strictness measure.
#(merge) has "special" semantics for combining lists of strings that haver different nesting levels; it's perfect for assembling data from different levels of collection and is basically tailor made for this kind of thing. This problem is very similar to extracting HTTP, Usenet or e-mail headers, which can have continuation lines.
On the topic of how to write a Lisp function to look ahead in the data, the most important aspect is how to get a handle on the data at the current position. The TXR pattern matching works by backtracking over a lazy list of strings (lines/records).　We can use the #(data) directive to capture the list pointer at the given input position. Then we can just treat that as a list:
#(data here)
#(bind tab-start-lines #(length (take-while (f^ #/\t/) here))
Now tab-start-lines has a count of how many lines in the input start with tabs. However, take-while has a termination condition bug, unfortunately; if the following data consists of nothing but one or more tab lines, it misbehaves.⚠ Until TXR 166 is released, this requires a little workaround: (take-while [iff stringp (f^ #/\t/)] here).

Related

why `echo HTTPS_PROXY=$HTTPS_PROXY` print an empty line when variable not set?

shouldn't it be printing out HTTPS_PROXY= instead? (when $HTTPS_PROXY is not set)
I know I can work around using
echo HTTPS_PROXY=(echo $HTTPS_PROXY) or echo HTTPS_PROXY="$HTTPS_PROXY" , but I want to know why I need a work around in this case.

In fish, all variables are lists. When you concatenate a string and a variable, what it does is combine every list element with the string.
So
set bar 1 2 3
echo foo$bar
prints "foo1 foo2 foo3".
Now, when you have an undefined variable (or an empty one, set like set bar without values), this combines nothing with the string, which ends up eliminating it.
You can think of it like any variable expansion being a brace expansion - echo foo{1,2,3} is the same as echo foo$bar with bar set like above.
In many cases, that is exactly what you want. Imagine $bar being a list of directories. To go over all files in them you could use
for file in $bar/*
and if $bar was empty (there was no directory), the entire loop would be skipped instead of e.g. showing all files in "/".
The obvious solution is to quote the variable if you want to supress this. Quoting turns the variable into always exactly one argument, even if it's empty or has multiple elements, so
echo foo"$bar"
prints "foo1 2 3" (as one argument).
This is documented at https://fishshell.com/docs/current/#combining-lists-cartesian-product.

Why are ##, #!, #, etc. not interpolated in strings?

First, please note that I ask this question out of curiosity, and I'm aware that using variable names like ## is probably not a good idea.
When using doubles quotes (or qq operator), scalars and arrays are interpolated :
$v = 5;
say "$v"; # prints: 5
$# = 6;
say "$#"; # prints: 6
#a = (1,2);
say "#a"; # prints: 1 2
Yet, with array names of the form #+special char like ##, #!, #,, #%, #; etc, the array isn't interpolated :
#; = (1,2);
say "#;"; # prints nothing
say #; ; # prints: 1 2
So here is my question : does anyone knows why such arrays aren't interpolated? Is it documented anywhere?
I couldn't find any information or documentation about that. There are too many articles/posts on google (or SO) about the basics of interpolation, so maybe the answer was just hidden in one of them, or at the 10th page of results..
If you wonder why I could need variable names like those :
The -n (and -p for that matter) flag adds a semicolon ; at the end of the code (I'm not sure it works on every version of perl though). So I can make this program perl -nE 'push#a,1;say"#a"}{say#a' shorter by doing instead perl -nE 'push#;,1;say"#;"}{say#', because that last ; convert say# to say#;. Well, actually I can't do that because #; isn't interpolated in double quotes. It won't be useful every day of course, but in some golfing challenges, why not!
It can be useful to obfuscate some code. (whether obfuscation is useful or not is another debate!)

Unfortunately I can't tell you why, but this restriction comes from code in toke.c that goes back to perl 5.000 (1994!). My best guess is that it's because Perl doesn't use any built-in array punctuation variables (except for #- and #+, added in 5.6 (2000)).
The code in S_scan_const only interprets # as the start of an array if the following character is
a word character (e.g. #x, #_, #1), or
a : (e.g. #::foo), or
a ' (e.g. #'foo (this is the old syntax for ::)), or
a { (e.g. #{foo}), or
a $ (e.g. #$foo), or
a + or - (the arrays #+ and #-), but not in regexes.
As you can see, the only punctuation arrays that are supported are #- and #+, and even then not inside a regex. Initially no punctuation arrays were supported; #- and #+ were special-cased in 2000. (The exception in regex patterns was added to make /[\c#-\c_]/ work; it used to interpolate #- first.)
There is a workaround: Because #{ is treated as the start of an array variable, the syntax "#{;}" works (but that doesn't help your golf code because it makes the code longer).

Perl's documentation says that the result is "not strictly predictable".
The following, from perldoc perlop (Perl 5.22.1), refers to interpolation of scalars. I presume it applies equally to arrays.
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
"a $x -> {c}" really means:
"a " . $x . " -> {c}";
or:
"a " . $x -> {c};
Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
brackets. because the outcome may be determined by voting based on
heuristic estimators, the result is not strictly predictable.
Fortunately, it's usually correct for ambiguous cases.

Some things are just because "Larry coded it that way". Or as I used to say in class, "It works the way you think, provided you think like Larry thinks", sometimes adding "and it's my job to teach you how Larry thinks."

AutoHotKey Source Code Line Break

Is there a way to do line break in AutoHotKey souce code? My code is getting longer than 80 characters and I would like to separate them neatly. I know we can do this in some other language, such as VBA for example below:
http://www.excelforum.com/excel-programming-vba-macros/564301-how-do-i-break-vba-code-into-two-or-more-lines.html
If Day(Date) > 10 _
And Hour(Time) > 20 Then _
MsgBox "It is after the tenth " & _
"and it is evening"
Is there a souce code line break in AutoHotKey? I use a older version of the AutoHotKey, ver 1.0.47.06

There is a Splitting a Long Line into a Series of Shorter Ones section in the documentation:
Long lines can be divided up into a collection of smaller ones to
improve readability and maintainability. This does not reduce the
script's execution speed because such lines are merged in memory the
moment the script launches.
Method #1: A line that starts with "and", "or", ||, &&, a comma, or a
period is automatically merged with the line directly above it (in
v1.0.46+, the same is true for all other expression operators except
++ and --). In the following example, the second line is appended to the first because it begins with a comma:
FileAppend, This is the text to append.`n ; A comment is allowed here.
, %A_ProgramFiles%\SomeApplication\LogFile.txt ; Comment.
Similarly, the following lines would get merged into a single line
because the last two start with "and" or "or":
if (Color = "Red" or Color = "Green" or Color = "Blue" ; Comment.
or Color = "Black" or Color = "Gray" or Color = "White") ; Comment.
and ProductIsAvailableInColor(Product, Color) ; Comment.
The ternary operator is also a good candidate:
ProductIsAvailable := (Color = "Red")
? false ; We don't have any red products, so don't bother calling the function.
: ProductIsAvailableInColor(Product, Color)
Although the indentation used in the examples above is optional, it might improve
clarity by indicating which lines belong to ones above them. Also, it
is not necessary to include extra spaces for lines starting with the
words "AND" and "OR"; the program does this automatically. Finally,
blank lines or comments may be added between or at the end of any of
the lines in the above examples.
Method #2: This method should be used to merge a large number of lines
or when the lines are not suitable for Method #1. Although this method
is especially useful for auto-replace hotstrings, it can also be used
with any command or expression. For example:
; EXAMPLE #1:
Var =
(
Line 1 of the text.
Line 2 of the text. By default, a line feed (`n) is present between lines.
)
; EXAMPLE #2:
FileAppend, ; The comma is required in this case.
(
A line of text.
By default, the hard carriage return (Enter) between the previous line and this one will be written to the file as a linefeed (`n).
By default, the tab to the left of this line will also be written to the file (the same is true for spaces).
By default, variable references such as %Var% are resolved to the variable's contents.
), C:\My File.txt
In the examples above, a series of lines is bounded at
the top and bottom by a pair of parentheses. This is known as a
continuation section. Notice that the bottom line contains
FileAppend's last parameter after the closing parenthesis. This
practice is optional; it is done in cases like this so that the comma
will be seen as a parameter-delimiter rather than a literal comma.
Please read the documentation link for more details.
So your example can be rewritten as the following:
If Day(Date) > 10
And Hour(Time) > 20 Then
MsgBox
(
It is after the tenth
and it is evening
)

I'm not aware of a general way of doing this, but it seems you can break a line and start the remainder of the broken line (e.g. the next real line) with an operator. As long as the second line (and the third, fourth, etc., as applicable) starts with (optional whitespace plus) an operator, AHK will treat the whole thing as one line.
For instance:
hello := "Hello, "
. "world!"
MsgBox %hello%
The presence of the concatenation operator . at the logical beginning of the second line here makes AHK treat both lines as one.
(I also tried leaving the operator and the end of the first line and starting the second off with a double-quoted string; that didn't work.)

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?

This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.

The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);

How does this Perl one liner to check if a directory is empty work?

I got this strange line of code today, it tells me 'empty' or 'not empty' depending on whether the CWD has any items (other than . and ..) in it.
I want to know how it works because it makes no sense to me.
perl -le 'print+(q=not =)[2==(()=<.* *>)].empty'
The bit I am interested in is <.* *>. I don't understand how it gets the names of all the files in the directory.

It's a golfed one-liner. The -e flag means to execute the rest of the command line as the program. The -l enables automatic line-end processing.
The <.* *> portion is a glob containing two patterns to expand: .* and *.
This portion
(q=not =)
is a list containing a single value -- the string "not". The q=...= is an alternate string delimiter, apparently used because the single-quote is being used to quote the one-liner.
The [...] portion is the subscript into that list. The value of the subscript will be either 0 (the value "not ") or 1 (nothing, which prints as the empty string) depending on the result of this comparison:
2 == (()=<.* *>)
There's a lot happening here. The comparison tests whether or not the glob returned a list of exactly two items (assumed to be . and ..) but how it does that is tricky. The inner parentheses denote an empty list. Assigning to this list puts the glob in list context so that it returns all the files in the directory. (In scalar context it would behave like an iterator and return only one at a time.) The assignment itself is evaluated in scalar context (being on the right hand side of the comparison) and therefore returns the number of elements assigned.
The leading + is to prevent Perl from parsing the list as arguments to print. The trailing .empty concatenates the string "empty" to whatever came out of the list (i.e. either "not " or the empty string).

<.* *>
is a glob consisting of two patterns: .* are all file names that start with . and * corresponds to all files (this is different than the usual DOS/Windows conventions).
(()=<.* *>)
evaluates the glob in list context, returning all the file names that match.
Then, the comparison with 2 puts it into scalar context so 2 is compared to the number of files returned. If that number is 2, then the only directory entries are . and .., period. ;-)

<.* *> means (glob(".*"), glob("*")). glob expands file patterns the same way the shell does.

I find that the B::Deparse module helps quite a bit in deciphering some stuff that throws off most programmers' eyes, such as the q=...= construct:
$ perl -MO=Deparse,-p,-q,-sC 2>/dev/null << EOF
> print+(q=not =)[2==(()=<.* *>)].empty
> EOF
use File::Glob ();
print((('not ')[(2 == (() = glob('.* *')))] . 'empty'));
Of course, this doesn't instantly produce "readable" code, but it surely converts some of the stumbling blocks.

The documentation for that feature is here. (Scroll near the end of the section)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

TXR: How to combine all lines where the following line begins with a tab? - text-processing

Related

why `echo HTTPS_PROXY=$HTTPS_PROXY` print an empty line when variable not set?

Why are ##, #!, #, etc. not interpolated in strings?

AutoHotKey Source Code Line Break

How does this Perl one-liner actually work?

How does this Perl one liner to check if a directory is empty work?

Categories

Resources