I have the following file:
2020-04-17 10:35:08.339 msw_im.c wync_ua[0]DEBUG: .mark1: lorem ipsum
dolor sit amet, consetetur sadipscing elitr, sed diam
nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.
At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren
2020-04-17 10:35:08.340
I want to have every char between "mark1:" and and "2020-04-17 10:35:08.340" replaced like this
2020-04-17 10:35:08.339 msw_im.c wync_ua[0]DEBUG: .mark1: xxxxxxxxxxx
xxxxxxxxxxxxxxxxxxx
xxxxx
xxxxxxxx
xxxxxxx
2020-04-17 10:35:08.340
How can I do this? I have tried:
$: sed -i '/mark1/,/^$/{s/./x/g}' file
which works, but also replaces the beginning of the 1st line with "x". I tried multiple other things w/o success. Any idea?
Anonymizing lines with GNU sed:
sed -E -i '/mark1:/,/^$/{ /mark1:/{ :a;s/(mark1:x*)[^x]/\1x/;ta;b }; s/./x/g }' file
Output to file:
2020-04-17 10:35:08.339 msw_im.c wync_ua[0]DEBUG: .mark1:xxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2020-04-17 10:35:08.340
See: man sed and The Stack Overflow Regular Expressions FAQ
Related
Can I declare globals in a subroutine in perl with use strict?
Consider the following code:
#!/usr/bin/perl
# $Id: foo,v 1.5 2019/02/21 10:41:08 bennett Exp bennett $
use strict;
use warnings;
initialize();
print "$lorem\n";
exit 0;
sub initialize {
# How would one delcare "$lorem" here such that it is available as
# a global?
$lorem = <<_EOM_
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
_EOM_
;
}
Note that I'm not asking if (ab)using globals in this fashion is a good idea; I'm certain that it isn't.
I've tried a few combinations of our and $main::, but they all fail in just the manner you might expect them to.
At this point, I'm just curious. Can it be done?
I wonder if some sort of shenanigans with the BEGIN block would work.
The following will work, but as #simbabque points out, it's ugly:
#!/usr/bin/perl
# $Id: foo,v 1.7 2019/02/21 19:48:26 bennett Exp bennett $
use strict;
use warnings;
initialize();
printf("$main::lorem\n");
exit 0;
sub initialize {
# How would one delcare "$lorem" here such that is is available as
# a global-ish?
our $lorem = <<_EOM_
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
_EOM_
;
}
For your specific example, you don't need a global variable.
Perl has package variables. Those are created with our and can also be accessed with $namespace:: (where main is the default namespace, and $:: works for that as well). Those are global, but we rarely call them that.
You need to keep in mind that our makes a lexical alias, so if you declare it inside the sub, it will not be available outside, because there is no lexical alias in that larger scope.
use strict;
sub foo {
our $bar = 123;
}
foo();
print $bar; # error
You need to declare the variable in a larger scope.
use strict;
our $bar;
sub foo {
$bar = 123;
}
foo();
print $bar;
This will work, because $bar is now available in the file's scope.
All of this only applies when use strict is turned on. If you don't declare a variable, it will automatically become a package variable. However, if you turn on strict, you have to declare all variables. Therefore you need to be explicit.
You can also use my if you declare it outside of the sub.
use strict;
my $bar;
sub foo {
$bar = 123;
}
foo();
print $bar;
Since you're doing this in a script and there is no explicit package declaration, I think it's safe to assume there are no other modules involved. In that case, it does not matter if you use my or our.
If you were using this in a package with different files involved it would make a difference. Variables declared on the file scope with my are kind of private, as there is no way to access them from the outside directly.
package Foo;
use strict;
my $bar = 123;
### other file
use Foo;
# no way to get $bar as there is no $Foo::bar
But if you use our (or the outdated use vars) it will become a package variable.
package Foo;
use strict;
our $bar = 123;
### other file
use Foo;
print $Foo::bar;
Can I declare globals in a subroutine in Perl?
Yes, you can declare package variables in a subroutine with our. But you can't access them as lexical variables outside the scope they've been declared in, so you need to access them with their fully qualified package name, and that is ugly.
I believe the answer is "no". Not with use strict. Among the features of use strict is checking whether or not you're using a variable before it's declared.
The notion of wanting to do just that, as I did above, is at odds with the compile time checks of use strict.
Thanks to #simbabque for helping me think through this more clearly.
Rather than remove the use strict, I'm going to move the ugliness of the huge $lorem type variables (and there's a lot of them) to a separate package.
-E
I'm trying to indent a long string in a write-host.
While it trivial to .PadLeft() or prepend spaces in writing a short string, the same isn't true for a long one, as as soon as the cursor is at the last column of the string, it will continue from column 0 of the next row, e.g.:
"{0}Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." -f $(" " * 5)
will show up as
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliq
ua.
but what if I want it to obtain:
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore mag
na aliqua.
?
Is there a way to accomplish this?
many thanks guys!
You'll have to split the string up yourself based on the width of the console, and then pad the string and display:
$consoleWidth = $Host.UI.RawUI.BufferSize.Width
$desiredIndent = 5 # spaces
$chunkSize = $consoleWidth - $desiredIndent
$bigString = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. '*10
[RegEx]::Matches($bigString, ".{$chunkSize}|.+").Groups.Value | ForEach-Object {
' '*$desiredIndent + $_
}
This problem does not have to be solved this way, but it would save me a lot of precious time if it can be:
I have been logging "line at a time" to a log file. Like this:
$mylogentry = "A string of text containing information"
$mylogentry | out-file -append $log_file_path
$array_of_log_entries += $mylogentry
The reason for the array is that I join them into a Send-Message body.
When I have long quotes however, I like to break them across lines. I can do this using the backtick '`' character.
What I have never worked out is how to escape tabs if the quote line is nested under something else. Using backtick before each indentation doesn't remove the tab.
If you look at sites like this: http://technet.microsoft.com/en-us/magazine/hh475841.aspx you'll see that even while he is encouraging indentation, his code is not indented for the parameters. (You can actually tab those in because whitespace is ignored outside of quotes. Maybe he was just making a point)
This is an attempted example at what I mean. (Note Im having trouble replicating the formatting on SE. 4 space indent doesn't seem to create a code block anymore)
if($somecondition){
$mylogentry = "A string of really long text for non-technical people `
` ` continue to write text here. Additional info (technical): $techvar"
That string would have a big gap between 'people' and 'continue' due to my indenting. I can of course have 'continue' begin at column 0 but then my code looks even more stupid.
I'd like to share some options in addition to other answers.
Joining array of substrings (explicit)
Unary -join operator
$singleLineText = -join #(
"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod "
"tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim "
"veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea "
"commodo consequat."
)
Binary -join operator
$singleLineText = #(
"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod"
"tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim"
"veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea"
"commodo consequat."
) -join ' '
Pros:
No plus signs (+) or commas (,) needed.
Easy switch to binary -join "`r`n" operator for multiline string.
Free to use desired indentations.
Cons:
Text manipulations can be tiresome.
Joining array of substrings (implicit) | avoid
Appending to an empty string.
$singleLineText = '' + #(
"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod"
"tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,"
"quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo"
"consequat."
)
Piping to script block and using $input - an automatic variable.
$singleLineText = #(
"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod"
"tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,"
"quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo"
"consequat."
) | & { "$input" }
Whenever an array is coerced into a string, [System.String]::Join(separator, array) method is applied implicitly. Where the separator is " " (space) by default, and can be overwritten by setting $OFS - the Ouptut Field Sperator special variable.
Pros:
Suited for joining pipe output.
Cons:
Lack of clarity (for several reasons), thus should be avoided whenever possible.
Here-string
$singleLineText = #"
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
"# -replace "`r" -replace "`n", ' '
Pros:
Good for drop-in big sheets of arbitrary preformatted text (like source code).
Preserves source indentations.
No need to escape quotes.
Cons:
Not friendly to script formatting.
Can be hard to keep track of trailing white-spaces.
The addition assignment operator (+=)
$s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do ";
$s += "eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad ";
$s += "minim veniam, quis nostrud exercitation ullamco laboris nisi ut ";
$s += "aliquip ex ea commodo consequat.";
Pros:
Most obvious, universal and well-known syntax (outside of PowerShell).
Mostly programming language agnostic *.
* Variable naming conventions may be violated in some programming languages.
Cons:
Text manipulations can be tiresome.
You can't escape away the tabs or spaces you inserted into the string; they're inside the quotes and part of the string now (and so are the line breaks for that matter).
You have a few options:
Don't break the string at all; let it ride. Obviously not what you want to do, but it works.
Use #Matt's suggestion and concatenate the strings. As long as the quotes begin after the indentation, you'll be fine.
Don't indent the next lines. This assumes you do want the line breaks as part of the string. It can look messy, but it will be a) readable without scrolling and b) functional. It looks something like this:
if ($fakeCondition) {
if ($fauxCondition) {
$longString = "Hello this string is too long
to be just one big line so we'll break it up into
some smaller lines for the sake of readability."
# Code that uses the string
Write-Host $longString
}
}
Other stuff: use an array with one element for each line and then join it, or use .NET's StringBuilder, but those are overkill to solve what is essentially a formatting annoyance for you.
Here strings (using #" to begin a string and "# to end it) will have the same problem as option 3 (the lines cannot be indented or the tabs/spaces will be embedded in the string).
My View
When I run into this issue of long strings polluting the code, I usually start to rethink embedding the strings, or where I'm embedding them.
I might break this functionality into a function, then accept the string as a parameter (pushing the problem off to a different part of the code, but it can be helpful).
Sometimes I'll put it into a here document or long string at the top of the script and then use the variable later on.
Sometimes it means saving these strings in a separate file and reading the contents at run time.
All depends on the situation.
Would you consider using string concatenation instead?
$test = "I would like to think that his is an example of a long string. Grammar aside I mean long as" +
" in super long. So long even that I have to use lots of letters to get my message" +
" across"
Which would output the following:
I would like to think that his is an example of a long string. Grammar aside I mean long as in super long. So long even that I have to use lots of letters to get my message across
Since you are joining strings with the + it would effectively ignore the white space between the quoted strings.
You could store the string as a newline delimited string and use a little regex to clean it up after the fact. It has limited use cases but if you are doing this a lot and braintists answer is not helping ....
$test = "I would like to think that his is an example of a long string. Grammar aside I mean long as
in super long. So long even that I have to use lots of letters to get my message
across" -replace "(?m)^\s+" -replace "`r`n"," "
So it is typed as a single string. First replace removes all leading whitespace. The second changes all of the newline into space. Both work to make it into a single ling again.
I am reading a text file into a SCALAR variable by using
open FILE, "<", $filename or print("Error opening $filename");
read FILE, my $buffer, -s $filename;
close FILE;
Later on I am counting the number of lines in that SCALAR variable. How can I go to a specific line within that SCALAR variable without iterating through it?
The answer to the question you originally posed is that unless you're dealing with a fixed-width file, you can't skip to a certain line without iterating somehow.
However, based on your comments, it seems that there's no need to read your entire file into a scalar in the first place. You can get both the second-to-last line and a total line count by simply iterating through the file like this:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my ($count, #previous);
while (<DATA>) {
chomp;
$count++;
push #previous, $_;
shift #previous if #previous > 2;
}
say "Count: $count";
say "Second-to-last line: $previous[0]";
__DATA__
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
Output:
Count: 6
Second-to-last line: fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
For simplicity, I used the special __DATA__ block instead of an external file. To read a file as above, do:
open my $fh, '<', '/path/to/file' or die "Failed to open file: $!";
while (<$fh>) {
...
}
close $fh;
Why is this better than reading the whole file into a scalar? If your file is large, reading the whole thing into memory all at once can be costly. Unless you have a good reason not to, it's generally better to read files one line at a time and process them as you go.
For simple output formatting I tend to use printf where I used write/format in the days of Perl 4. However sometimes it seems the simplest solution for variable numbers of output lines per data record. For example
#!/usr/bin/perl
use strict;
use warnings;
my ($lorem, $aprille);
format =
## ^<<<<<<<<<<<<<<<<<<<<<<<< | ^<<<<<<<<<<<<<<<<<<
$.,$aprille , $lorem
^<<<<<<<<<<<<<<<<<<<<<<<< | ^<<<<<<<<<<<<<<<<<< ~~
$aprille , $lorem
|
.
while(<DATA>) {
($aprille, $lorem) = split(/\|/, $_, 2);
write;
}
__DATA__
WHAN that Aprille with his shoures soote |Lorem ipsum dolor sit amet,
The droghte of Marche hath perced to the roote,|consectetur adipisicing elit,
And bathed every veyne in swich licour, |sed do eiusmod tempor
Of which vertu engendred is the flour; |incididunt ut labore et dolore
Whan Zephirus eek with his swete breeth |magna aliqua. Ut enim ad minim
Inspired hath in every holt and heeth |veniam, quis nostrud
The tendre croppes, and the yonge sonne |exercitation exercitation
Hath in the Ram his halfe cours y-ronne, |ullamco laboris nisi ut ali-
And smale fowles maken melodye, |quip ex ea commodo conse-
That slepen al the night with open ye, |quat. Duis aute irure dolor
So priketh hem nature in hir corages: |in reprehenderit in volup-
Than longen folk to goon on pilgrimages, |tate velit esse cillium dol-
And palmers for to seken straunge strondes, |ore eu fugiat nulla pariatur.
To ferne halwes, couthe in sondry londes; |Lorem ipsum dolor sit amet,
And specially, from every shires ende |consectetur adipisicing elit,
Of Engelond, to Caunterbury they wende, |sed do eiusmod tempor
The holy blisful martir for to seke, |incididunt ut labore et dolore
That hem hath holpen, whan that they were seke.|magna aliqua. Ut enim ad minim
And now for something completely different. Nice plumage.|Norwegian blue.
Produces
1 WHAN that Aprille with | Lorem ipsum dolor
his shoures soote | sit amet,
|
2 The droghte of Marche | consectetur
hath perced to the roote, | adipisicing elit,
|
3 And bathed every veyne in | sed do eiusmod
swich licour, | tempor
...
19 And now for something | Norwegian blue.
completely different. |
Nice plumage. |
Note that record 19 occupies three lines.
What is an equivalent concise perl5ish way to do the above without using write and format?
The main thing that perl5 added above perl4’s format and write is formline. There are a few other niceties, including $^A, numeric formats, and package scoping, but these are mainly nonessential fluff. The current set of formatting directives is only a bit greater than perl4’s:
# start of regular field
^ start of special field
< pad character for left justification
| pad character for centering
> pad character for right justification
# pad character for a right justified numeric field
0 instead of first #: pad number with leading zeroes
. decimal point within a numeric field
... terminate a text field, show "..." as truncation evidence
#* variable width field for a multi-line value
^* variable width field for next line of a multi-line value
~ suppress line with all fields empty
~~ repeat line until all fields are exhausted
Other little-known enhancements include support for LC_NUMERIC local, being able to use a {}-delimited block to aid in alignment, and using a \r to force a true line-break.
I still use formats from time to time. Here is a bit from a program I wrote only a couple of weeks ago.
sub init_screen() {
our %Opt;
my $cols;
if ($Opt{width}) {
$cols = $Opt{width};
}
elsif (am_unixy()) {
($cols) = `stty size 2>&1` =~ /^\d+ (\d+)$/;
}
else {
# FALLTHROUGH to ||= init on next line
}
$cols ||= 80; # non-unix or stty error
$cols -= 2;
my $format = "format STDOUT = \n"
. ' ^' . '<' x ($cols-4) . "\n"
. '$_' . "\n"
. " ^" . "<" x ($cols-6) . "~~\n"
. '$_' . "\n"
. ".\n"
. "1;" # for true eval return
;
eval($format) || die;
}
The code for constructing the format dynamically based on the current screen width could be prettier, but it’s still useful.
The main problem with formats is the reliance on global variables. For other problems with formats, see pages 449 - 454 of Perl Best Practices.
The modern solution would be Perl6::Form. This is a backport of what they are planning for Perl 6.
What follows is a rough translation of your format code to Perl6::Form. I do not know Perl6::Form very well, so there may be ways to make it better or truer to your original example:
#!/usr/bin/perl
use strict;
use warnings;
use Perl6::Form;
while(<DATA>) {
my ($aprille, $lorem) = split(/\|/, $_, 2);
print form(
"{>} {[[[[[[[[[[[[[[[[[[[[[[[[} | {[[[[[[[[[[[[[[[[[[}",
$., $aprille, $lorem,
" | ",
);
}
__DATA__
WHAN that Aprille with his shoures soote |Lorem ipsum dolor sit amet,
The droghte of Marche hath perced to the roote,|consectetur adipisicing elit,
And bathed every veyne in swich licour, |sed do eiusmod tempor
Of which vertu engendred is the flour; |incididunt ut labore et dolore
Whan Zephirus eek with his swete breeth |magna aliqua. Ut enim ad minim
Inspired hath in every holt and heeth |veniam, quis nostrud
The tendre croppes, and the yonge sonne |exercitation exercitation
Hath in the Ram his halfe cours y-ronne, |ullamco laboris nisi ut ali-
And smale fowles maken melodye, |quip ex ea commodo conse-
That slepen al the night with open ye, |quat. Duis aute irure dolor
So priketh hem nature in hir corages: |in reprehenderit in volup-
Than longen folk to goon on pilgrimages, |tate velit esse cillium dol-
And palmers for to seken straunge strondes, |ore eu fugiat nulla pariatur.
To ferne halwes, couthe in sondry londes; |Lorem ipsum dolor sit amet,
And specially, from every shires ende |consectetur adipisicing elit,
Of Engelond, to Caunterbury they wende, |sed do eiusmod tempor
The holy blisful martir for to seke, |incididunt ut labore et dolore
That hem hath holpen, whan that they were seke.|magna aliqua. Ut enim ad minim
And now for something completely different. Nice plumage.|Norwegian blue.