I'm looking to find email address that only match the pattern firstName.LastName#xxx.yyy in Scala / Spark
My issue is that "." is used in Scale regex for "Matches any single character except newline"
I tried with \\. but doesn't match as well
Here is my code:
val emailTest = "ja.mes#downstairs.com"
if (emailTest.matches("[A-Za-z]+\\.[A-Za-z]+#[A-Za-z0-9.-]"))
println("ok")
else
println("nok")
Thanks for your help
Matthieu
The . is fine, but you are only looking for one character after the #. Add a + to fix this:
"[A-Za-z]+\\.[A-Za-z]+#[A-Za-z0-9.-]+"
Related
I am trying to use the Internationalization feature of the Play Framework.
It involves the creation of a new conf file for each language that we want to support. Example for french we create a messages.fr file in the conf folder.
Inside it we define key-values like this:
Hello.World = 'Bonjour le monde'
Now the issue is that I have lines that contain characters like "," and "(" and if these are included in the key then we get the error in parsing from the MessageApi
Example
Hello.(World) = 'Bonjour (le monde)'
Here the "(" before and after World is throwing an error while parsing.
Anyone having any idea how we could achieve this?
Try to escape these special characters:
Hello.\(World\) = 'Bonjour (le monde)'
Other examples:
string_one = String for translation 1
string_two = one + one \= two
# String key with spaces
key\ with\ spaces = This is the value that could be looked up with the key "key with spaces".
# Backslash in value should be escaped by another backslash
path=c:\\wiki\\ templates
Also, you can try to escape special characters by using Java Unicode:
Hello.\u0028World\u0029 = 'Bonjour (le monde)'
Reference - How to escape the equals sign in properties files
I am trying to replaces a regex (in this case a space with a number) with
I have a Spark dataframe that contains a string column. I want to replace a regex (space plus a number) with a comma without losing the number. I have tried both of these with no luck:
df.select("A", f.regexp_replace(f.col("A"), "\s+[0-9]", ' ,
').alias("replaced"))
df.select("A", f.regexp_replace(f.col("A"), "\s+[0-9]", '\s+[0-9] ,
').alias("replaced"))
Any help is appreciated.
What you need is another function, regex_extract
So, you have to divide the regex and get the part you need. It could be something like this:
df.select("A", f.regexp_extract(f.col("A"), "(\s+)([0-9])", 2).alias("replaced"))
I searched all the web but did not solve my problem.
I need a regular expression to match or replace a word folowed with a double quote.
Example :
uniquehereuniqueyouunique"can"uniquegetuniquefoounique"bar"
I need to replace all unique" by something (ex notunique). I tried :
preg_replace("/unique\"/", 'notunique', $contents);
but that don't match.
thanks for your help
I found it !
preg_replace('/unique."/', 'notunique', $contents);
Apologizing in advance for yet another email pattern matching query.
Here is what I have so far:
$text = strtolower($intext);
$lines = preg_split("/[\s]*[\n][\s]*/", $text);
$pattern = '/[A-Za-z0-9_-]+#[A-Za-z0-9_-]+\.([A-Za-z0-9_-][A-Za-z0-9_]+)/';
$pattern1= '/^[^#]+#[a-zA-Z0-9._-]+\.[a-zA-Z]+$/';
foreach ($lines as $email) {
preg_match($pattern,$email,$goodies);
$goodies[0]=filter_var($goodies[0], FILTER_SANITIZE_EMAIL);
if(filter_var($goodies[0], FILTER_VALIDATE_EMAIL)){
array_push($good,$goodies[0]);
}
}
$Pattern works fine but .rr.com addresses (and more issues I am sure) are stripped of .com
$pattern1 only grabs emails that are on a line by themselves.
I am pasting in a whole page of miscellaneous text into a textarea that contains some emails from an old data file I am trying to recover.
Everything works great except for the emails with more than one "." either before or after the "#".
I am sure there must be more issues as well.
I have tried several patterns I have found as well as some i tried to write.
Can someone show me the light here before I pull my remaining hair out?
How about this?
/((?:\w+[.]*)*(?:\+[^# \t]*)?#(?:\w+[.])+\w+)/
Explanation: (?:\w+[.])* recognizes 0 or more instances of strings of word characters (alphanumeric + _) optionally separated by strings of periods. Next, (?:\+[^# \t]*)? recognizes a plus sign followed by zero or more non-whitespace, non-at-sign characters. Then we have the # sign, and finally (?:\w+[.])+\w+, which matches a sequence of word character strings separated by periods and ending in a word character string. (ie, [subdomain.]domain.topleveldomain)
In scala, "here docs" is begin and end in 3 "
val str = """Hi,everyone"""
But what if the string contains the """? How to output Hi,"""everyone?
Since unicode escaping via \u0022 in multi-line string literals won’t help you, because they would be evaluated as the very same three quotes, your only chance is to concatenate like so:
"""Hi, """+"""""""""+"""everyone"""
The good thing is, that the scala compiler is smart enough to fix this and thus it will make one single string out of it when compiling.
At least, that’s what scala -print says.
object o {
val s = """Hi, """+"""""""""+"""everyone"""
val t = "Hi, \"\"\"everyone"
}
and using scala -print →
Main$$anon$1$o.this.s = "Hi, """everyone";
Main$$anon$1$o.this.t = "Hi, """everyone";
Note however, that you can’t input it that way. The format which scala -print outputs seems to be for internal usage only.
Still, there might be some easier, more straightforward way of doing this.
It's a totally hack that I posted on a similar question, but it works here too: use Scala's XML structures as an intermediate format.
val str = <a>Hi,"""everyone</a> text
This will give you a string with three double quotation marks.
you can't
scala heredocs are raw strings and don't use any escape codes
if you need tripple quotes in a string use string-concatenation add them
You can't using the triple quotes, as far as I know. In the spec, section 1.3.5, states:
A multi-line string literal is a sequence of characters enclosed in triple quotes
""" ... """. The sequence of characters is arbitrary, except that it may contain
three or more consuctive quote characters only at the very end. Characters must
not necessarily be printable; newlines or other control characters are also permitted.
Unicode escapes work as everywhere else, but none of the escape sequences in
(§1.3.6) is interpreted.
So if you want to output three quotes in a string, you can still use the single quote string with escaping:
scala> val s = "Hi, \"\"\"everyone"
s: java.lang.String = Hi, """everyone