Preg replace symbols - preg-replace

Until now I was using str_replace to replace my letters to english letters, symbols to "-" etc., but now I discovered that there can be a lot of symbols and I don't know all of them..
So I don't know how my pattern should look like, but I need a function, which can replace all undefined symbols to "-".
It should allow only eng letters and numbers and after all it should replace where is two "-" into one. for example
link-has--to-be-modern--as-i-said--
link-has-to-be-modern-as-i-said

Like this:
$txt = trim(preg_replace('~[^a-zA-Z0-9]+~', '-', $txt), '-');
about character classes

Related

YAML, Docker Compose, Spaces & Quotes

Under what circumstances must one use quotes in a YAML file, specifically when using docker-compose.
For instance,
service:
image: "my-registry/repo:tag1"
environment:
ENV1: abc
ENV2: "abc"
ENV3: "a b c"
If spaces are required, for example, must one use quotes around the environment variable, as depicted in ENV3?
After some googling I've found a blog post
that touches this problem as I understood it.
I'll cite the most important part here:
plain scalars:
- a string
- a string with a \ backslash that doesn't need to be escaped
- can also use " quotes ' and $ a % lot /&?+ of other {} [] stuff
single quoted:
- '& starts with a special character, needs quotes'
- 'this \ backslash also does not need to be escaped'
- 'just like the " double quote'
- 'to express one single quote, use '' two of them'
double quoted:
- "here we can use predefined escape sequences like \t \n \b"
- "or generic escape sequences \x0b \u0041 \U00000041"
- "the double quote \" needs to be escaped"
- "just like the \\ backslash"
- "the single quote ' and other characters must not be escaped"
literal block scalar: |
a multiline text
line 2
line 3
folded block scalar: >
a long line split into
several short
lines for readability
Also I have not seen such docker-compose syntax to set env variables. Documentation suggests using simple values like
environment:
- ENV1=abc
- "ENV2=abc"
Where quotes " or ' are optional in this particular example according to what I've said earlier.
To see how to include spaces in env variables you can check out this so answer
Whether or not you need quotes, depends on the parser. Docker-compose AFAIK is still relying on the PyYAML module and that implements most of YAML 1.1 and has a few quirks of its own.
In general you only need to quote what could otherwise be misinterpreted or clash with some YAML construct that is not a scalar string. You also need (double) quotes for things that cannot be represented in plain scalars, single quoted scalars or block style literal or folded scalars.
Misinterpretation
You need to quote strings that look like some of the other data structures:
booleans: "True", "False", but PyYAML also assumes alternatives words like "Yes", "No", "On", "Off" represent boolean values ( and the all lowercase, all uppercase versions should be considered as well). Please note that the YAML 1.2 standard removed references to these alternatives.
integers: this includes string consisting of numbers only. But also hex (0x123) and octal number (0123). The octals in YAML 1.2 are written as 0o123, but PyYAML doesn't support this, however it is best to quote both.
A special integer that PyYAML still supports but again not in the YAML 1.2 specification are sexagesimals: base 60 number separated by colon (:), time indications, but also MAC addresses can be interpreted as such if the values between/after the colons are in the range 00-59
floats: strings like 1E3 (with optional sign ans mantissa) should be quoted. Of course 3.14 needs to be quoted as well if it is a string. And sexagesimal floats (with a mantissa after the number after the final colon) should be quoted as well.
timestamps: 2001-12-15T02:59:43.1Z but also iso-8601 like strings should be quoted to prevent them from being interpreted as timestamps
The null value is written as the empty string, as ~ or Null (in all casing types), so any strings matching those need to be quoted.
Quoting in the above can be done with either single or double quotes, or block style literal or folded scalars can be used. Please note that for the block-style you should use |- resp. >- in order not to introduce a trailing newline that is not in the original string.
Clashes
YAML assigns special meaning to certain characters or character combinations. Some of these only have special meaning at the beginning of a string, others only within a string.
characters fromt the set !&*?{[ normally indicate special YAML constructs. Some of these might be disambiguated depending on the following character, but I would not rely on that.
whitespace followed by # indicates an end of line comment
wherever a key is possible (and within block mode that is in many places) the combination of colon + space (:) indicates a value will be following. If that combination is part of your scalar string, you have to quote.
As with the misinterpretation you can use single or double quoting or block-style literal or folding scalars. There can be no end-of-line comments beyond the first line of a block-style scalar.
PyYAML can additionally get confused by any colon + space within a plain scalar (even when this is in a value) so always quote those.
Representing special characters
You can insert special characters or unicode code-points in a YAML file, but if you want these to be clearly visible in all cases, you might want to use escape sequences. In that case you have to use double quotes, this is the only mode that
allows backslash escapes. And e.g. \u2029. A full list of such escapes can be taken from the standard, but note that PyYAML doesn't implement e.g \/ (or at least did not when I forked that library).
One trick to find out what to quote or not is to use the library used to dump the strings that you have. My ruamel.yaml and PyYAML used by docker-compose, when potentially dumping a plain scalar, both try to read back (yes, by parsing the result) the plain scalar representation of a string and if that results in something different than a string, it is clear quotes need to be applied. You can do so too: when in doubt write a small program dumping the list of strings that you have using PyYAML's safe_dump() and apply quotes anywhere that PyYAML does.

Perl - can't remove trailing characters at the end of string

I have some trailing characters at the end of a string peregrinevwap^_^_
print "JH 4 - app: $application \n";
app: peregrinevwap^_^_
Do you know why they are there and how I can remove them. I tried the chomp command but this hasn't worked.
Check out the tr//cd operator to get rid of unwanted characters.
It's documented in "perldoc perlop"
$application =~ tr/a-zA-Z//cd;
Will remove everything except letters from the string and
$application =~ tr/^_//d;
Will remove all "^" and "_" characters.
If you only want to remove certain characters when they at the end of the string, use the s// search/replace operator with regular expressions and the $ anchor to match the end of the string.
Here's an example:
s/[\^_]*$//;
Let's hope the underscores do not occur at the end of your strings, otherwise you can't automatically separate them from these unwanted characters.
Are you sure these characters are actually ^ and _ characters?
^_ could also indicate Ctrl-Underscore, ASCII character 0x1F (Unit Separator). (Not a character I've ever seen used, but you never know.)
If this is in fact the case, you can remove them with something like:
$application =~ s/\x1F//g;

regex matching white spaces and non-characters in perl

I'm looking for pattern matching for the following.
While space at start followed by characters and then a decimal number like 3.2 and then followed by symbols like $ and #.
For ex: " bash-3.2#"
My code:
while(#wait = $t->waitfor('/^[\s]bash\-3\.2[.] $/i'))
How do i do this.
Thanks,
Sharath
While space at start
^\s
followed by characters
\w+
and then a decimal number like 3.2
-?\d+\.\d+
and then followed by symbols like $ and #.
[\$\#]
So, something like this:
/^\s\w+-?\d+\.\d+[\$\#]/
I assumed that the characters are typical word characters and that the number could be negative

Check characters inside string for their Unicode value

I would like to replace characters with certain Unicode values in a variable with dash. I have two ideas which might work, but I do not know how to check for the value of character:
1/ processing variable as string, checking every characters value and placing these characters in a new variable (replacing those characters which are invalid)
2/ use these magic :-)
$variable = s/[$char_range]/-/g;
char_range should be similar to [0-9] or [A-Z], but it should be values for utf-8 characters. I need range from 0x00 to 0x7F to be exact.
The following expression should replace anything that is not ASCII with a hyphen, which is (I think) what you want to do:
s/[\N{U+0080}-\N{U+FFFF}]/-/g
There's no such thing as UTF-8 characters. There are only characters that you encode into UTF-8. Even then, you don't want to make ranges outside of the magical ones that Perl knows about. You're likely to get more than you expect.
To get the ordinal value for a character, use ord:
use utf8;
my $code_number = ord '😸'; # U+1F638
say sprintf "%#x", $code_number;
However, I don't think that's what you need. It sounds like you want to replace characters in the ASCII range with a -. You can specify ranges of code numbers:
s/[\000-\177]/-/g; # in octal
s/[\x00-\x7f]/-/g; # in hexadecimal
You can specify wide character ordinal values in braces:
s/[\x80-\x{10ffff}]/-/g; # wide characters, replace non-ASCII in this case
When the characters have a common property, you can use that:
s/\p{ASCII}/-/g;
However, if you are replacing things character for character, you might want a transliteration:
$string =~ tr/\000-\177/-/;

I want to find and replace an ordered list in word from the . to the )

I have tried [0-9] and checked the use wildcard box but it replaces the individual numbers with the literal [0-9] string. How do I replace with the number it found plus a character?
Backreferences. Your unspecified environment may or may not support them, but if it does, you would:
replace \([0-9]*\)
with \1 <then, whatever the character you want is>