I'm interested in a pure-Lua (i.e., no external Unicode library) solution to extracting the units of a string between certain Unicode control characters and spaces. The code points I would like to use as delimiters are:
0000-0020
007f-00a0
00ad
1680
2000-200a
2028-2029
202f
205f
3000
I know how to access the code points in a string, for example:
> for i,c in utf8.codes("é$ \tπ😃") do print(c) end
233
36
32
9
960
128515
but I am not sure how to "skip" the spaces and tabs and reconstitute the other codepoints into strings themselves. What I would like to do in the example above, is drop the 32 and 9, then perhaps use utf8.char(233, 36) and utf8.char(960, 128515) to somehow get ["é$", "π😃"].
It seems that putting everything into a table of numbers and painstakingly walking through the table with for-loops and if-statements would work, but is there a better way? I looked into string:gmatch but that seems to require making utf8 sequences out of each of the ranges I want, and it's not clear what that pattern would even look like.
Is there a idiomatic way to extract the strings between the spaces? Or must I manually hack tables of code points? gmatch does not look up to the task. Or is it?
would require painstakingly generating the utf8 encodings for all code points at each end of the range.
Yes. But of course not manually.
local function range(from, to)
assert(utf8.codepoint(from) // 64 == utf8.codepoint(to) // 64)
return from:sub(1,-2).."["..from:sub(-1).."-"..to:sub(-1).."]"
end
local function split_unicode(s)
for w in s
:gsub("[\0-\x1F\x7F]", " ")
:gsub("\u{00a0}", " ")
:gsub("\u{00ad}", " ")
:gsub("\u{1680}", " ")
:gsub(range("\u{2000}", "\u{200a}"), " ")
:gsub(range("\u{2028}", "\u{2029}"), " ")
:gsub("\u{202f}", " ")
:gsub("\u{205f}", " ")
:gsub("\u{3000}", " ")
:gmatch"%S+"
do
print(w)
end
end
Test:
split_unicode("#\0#\t#\x1F#\x7F#\u{00a0}#\u{00ad}#\u{1680}#\u{2000}#\u{2005}#\u{200a}#\u{2028}#\u{2029}#\u{202f}#\u{205f}#\u{3000}#")
I'm reading text from a text file in Scala. I'm having difficulties with if statements.
for (line <- Source.fromFile(filename).getLines) {
if (line.length>7) {
println("b1 >" + line(7)+ "< " + line(0).getType)
if(line(7)=="#") {
println("hashtag")
}
}
}
below is 2 lines from my text file. the first line has 4 spaces followed by many hashtags. the second line is 4 spaces followed by 1 hashtag (the 4 spaces keep getting deleted by stack overflow)
##################################################################################################################################################
#
below is the output i recieve
//| b1 >#< 12
//| b1 > < 12
Question 1) why is getType returning 12? This is the strangest data type I've ever heard of.
Question 2) (possibly answered by Q1) why does the if(line(7)=="#") statement never returns true?
To answer your questions in reverse order:
Question 2. Because line is a String, line(7) is a Char which is never equal to a String. You want to compare it with '#' instead.
Question 1. Because of the above, this calls Char.getType method which
Returns a value indicating a character's general category.
(not that you can find it from Scala's own documentation). You probably wanted getClass instead.
When using triple quotes in an indented position I for sure get indentation in the output js string too:
Comparing these two in a nested let
let input1 = "T1\nX55.555Y-44.444\nX52.324Y-40.386"
let input2 = """T1
X66.324Y-40.386
X52.324Y-40.386"""
giving
// single quotes with \n
"T1\x0aX55.555Y-44.444\x0aX52.324Y-40.386"
// triple quoted
"T1\x0a X66.324Y-40.386\x0a X52.324Y-40.386"
Is there any agreed upon thing like stripMargin in Scala so I can use those without having to unindent to top level?
Update, just to clarify what I mean, I'm currently doing:
describe "header" do
it "should parse example header" do
let input = """M48
;DRILL file {KiCad 4.0.7} date Wednesday, 31 January 2018 'AMt' 11:08:53
;FORMAT={-:-/ absolute / metric / decimal}
FMAT,2
METRIC,TZ
T1C0.300
T2C0.400
T3C0.600
T4C0.800
T5C1.000
T6C1.016
T7C3.400
%
"""
doesParse input header
describe "hole" do
it "should parse a simple hole" do
doesParse "X52.324Y-40.386" hole
Update:
I was asked to clarify stripMargin from Scala. It's used like so:
val speech = """T1
|X66.324Y-40.386
|X52.324Y-40.386""".stripMargin
which then removes the leading whitespace. stripMargin can take any separator, but defaults to |.
More examples:
Rust has https://docs.rs/trim-margin/0.1.0/trim_margin/
Kotlin has in stdlib: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.text/trim-margin.html
I guess it might sound like asking for left-pad ( :) ) but if there's something there already I'd rather not brew it myself…
I'm sorry you didn't get a prompt response to this one, but I have implemented this function here. In case the pull request isn't merged, here's an implementation that just depends on purescript-strings:
import Data.String (joinWith, split) as String
import Data.String.CodeUnits (drop, dropWhile) as String
import Data.String.Pattern (Pattern(..))
stripMargin :: String -> String
stripMargin =
let
lines = String.split (Pattern "\n")
unlines = String.joinWith "\n"
mapLines f = unlines <<< map f <<< lines
in
mapLines (String.drop 1 <<< String.dropWhile (_ /= '|'))
I want to implement a Scala-style string interpolation in Scala. Here is an example,
val str = "hello ${var1} world ${var2}"
At runtime I want to replace "${var1}" and "${var2}" with some runtime strings. However, when trying to use Regex.replaceAllIn(target: CharSequence, replacer: (Match) ⇒ String), I ran into the following problem:
import scala.util.matching.Regex
val placeholder = new Regex("""(\$\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
java.lang.IllegalArgumentException: No group with name {var1}
at java.util.regex.Matcher.appendReplacement(Matcher.java:800)
at scala.util.matching.Regex$Replacement$class.replace(Regex.scala:722)
at scala.util.matching.Regex$MatchIterator$$anon$1.replace(Regex.scala:700)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.util.matching.Regex$$anonfun$replaceAllIn$1.apply(Regex.scala:410)
at scala.collection.Iterator$class.foreach(Iterator.scala:743)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1174)
at scala.util.matching.Regex.replaceAllIn(Regex.scala:410)
... 32 elided
However, when I removed '$' from the regular expression, it worked:
val placeholder = new Regex("""(\{\w+\})""")
placeholder.replaceAllIn(str, m => s"A${m.matched}B")
res2: String = hello $A{var1}B world $A{var2}B
So my question is that whether this is a bug in Scala Regex. And if so, are there other elegant ways to achieve the same goal (other than brutal force replaceAllLiterally on all placeholders)?
$ is a treated specially in the replacement string. This is described in the documentation of replaceAllIn:
In the replacement String, a dollar sign ($) followed by a number will be interpreted as a reference to a group in the matched pattern, with numbers 1 through 9 corresponding to the first nine groups, and 0 standing for the whole match. Any other character is an error. The backslash (\) character will be interpreted as an escape character and can be used to escape the dollar sign. Use Regex.quoteReplacement to escape these characters.
(Actually, that doesn't mention named group references, so I guess it's only sort of documented.)
Anyway, the takeaway here is that you need to escape the $ characters in the replacement string if you don't want them to be treated as references.
new scala.util.matching.Regex("""(\$\{\w+\})""")
.replaceAllIn("hello ${var1} world ${var2}", m => s"A\\${m.matched}B")
// "hello A${var1}B world A${var2}B"
It's hard to tell what you're expecting the behavior to do. The issue is that s"${m.matched}" is turning into "${var1}" (and "${var2}"). The '$' is special character to say "place the group with name {var1} here instead".
For example:
scala> placeholder.replaceAllIn(str, m => "$1")
res0: String = hello ${var1} world ${var2}
It replaces the match with the first capturing group (which is m itself).
It's hard to tell exactly what you're doing, but you could escape any $ like so:
scala> placeholder.replaceAllIn(str, m => s"${m.matched.replace("$","\\$")}")
res1: String = hello ${var1} world ${var2}
If what you really want to do is evaluate var1/var2 for some variables in the local scope of the method; that's not possible. In fact, the s"Hello, $name" pattern is actually converted into new StringContext("Hello, ", "").s(name) at compile time.
Please answer with the shortest possible source code for a program that converts an arbitrary plaintext to its corresponding ciphertext, following the sample input and output I have given below. Bonus points* for the least CPU time or the least amount of memory used.
Example 1:
Plaintext: The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!
Ciphertext: eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
Example 2:
Plaintext: 123 1234 12345 123456 1234567 12345678 123456789
Ciphertext: 312 4213 53124 642135 7531246 86421357 975312468
Rules:
Punctuation is defined to be included with the word it is closest to.
The center of a word is defined to be ceiling((strlen(word)+1)/2).
Whitespace is ignored (or collapsed).
Odd words move to the right first. Even words move to the left first.
You can think of it as reading every other character backwards (starting from the end of the word), followed by the remaining characters forwards. Corporation => XoXpXrXtXoX => niaorCoprto.
Thank you to those who pointed out the inconsistency in my description. This has lead many of you down the wrong path, which I apologize for. Rule #4 should clear things up.
*Bonus points will only be awarded if Jeff Atwood decides to do so. Since I haven't checked with him, the chances are slim. Sorry.
Python, 50 characters
For input in i:
' '.join(x[::-2]+x[len(x)%2::2]for x in i.split())
Alternate version that handles its own IO:
print ' '.join(x[::-2]+x[len(x)%2::2]for x in raw_input().split())
A total of 66 characters if including whitespace. (Technically, the print could be omitted if running from a command line, since the evaluated value of the code is displayed as output by default.)
Alternate version using reduce:
' '.join(reduce(lambda x,y:y+x[::-1],x) for x in i.split())
59 characters.
Original version (both even and odd go right first) for an input in i:
' '.join(x[::2][::-1]+x[1::2]for x in i.split())
48 characters including whitespace.
Another alternate version which (while slightly longer) is slightly more efficient:
' '.join(x[len(x)%2-2::-2]+x[1::2]for x in i.split())
(53 characters)
J, 58 characters
>,&.>/({~(,~(>:#+:#i.#-#<.,+:#i.#>.)#-:)#<:##)&.><;.2,&' '
Haskell, 64 characters
unwords.map(map snd.sort.zip(zipWith(*)[0..]$cycle[-1,1])).words
Well, okay, 76 if you add in the requisite "import List".
Python - 69 chars
(including whitespace and linebreaks)
This handles all I/O.
for w in raw_input().split():
o=""
for c in w:o=c+o[::-1]
print o,
Perl, 78 characters
For input in $_. If that's not acceptable, add six characters for either $_=<>; or $_=$s; at the beginning. The newline is for readability only.
for(split){$i=length;print substr$_,$i--,1,''while$i-->0;
print"$_ ";}print $/
C, 140 characters
Nicely formatted:
main(c, v)
char **v;
{
for( ; *++v; )
{
char *e = *v + strlen(*v), *x;
for(x = e-1; x >= *v; x -= 2)
putchar(*x);
for(x = *v + (x < *v-1); x < e; x += 2)
putchar(*x);
putchar(' ');
}
}
Compressed:
main(c,v)char**v;{for(;*++v;){char*e=*v+strlen(*v),*x;for(x=e-1;x>=*v;x-=2)putchar(*x);for(x=*v+(x<*v-1);x<e;x+=2)putchar(*x);putchar(32);}}
Lua
130 char function, 147 char functioning program
Lua doesn't get enough love in code golf -- maybe because it's hard to write a short program when you have long keywords like function/end, if/then/end, etc.
First I write the function in a verbose manner with explanations, then I rewrite it as a compressed, standalone function, then I call that function on the single argument specified at the command line.
I had to format the code with <pre></pre> tags because Markdown does a horrible job of formatting Lua.
Technically you could get a smaller running program by inlining the function, but it's more modular this way :)
t = "The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!"
T = t:gsub("%S+", -- for each word in t...
function(w) -- argument: current word in t
W = "" -- initialize new Word
for i = 1,#w do -- iterate over each character in word
c = w:sub(i,i) -- extract current character
-- determine whether letter goes on right or left end
W = (#w % 2 ~= i % 2) and W .. c or c .. W
end
return W -- swap word in t with inverted Word
end)
-- code-golf unit test
assert(T == "eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos")
-- need to assign to a variable and return it,
-- because gsub returns a pair and we only want the first element
f=function(s)c=s:gsub("%S+",function(w)W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end return W end)return c end
-- 1 2 3 4 5 6 7 8 9 10 11 12 13
--34567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
-- 130 chars, compressed and written as a proper function
print(f(arg[1]))
--34567890123456
-- 16 (+1 whitespace needed) chars to make it a functioning Lua program,
-- operating on command line argument
Output:
$ lua insideout.lua 'The quick brown fox jumps over the lazy dog. Supercalifragilisticexpialidocious!'
eTh kiquc nobrw xfo smjup rvoe eth yalz .odg !uioiapeislgriarpSueclfaiitcxildcos
I'm still pretty new at Lua so I'd like to see a shorter solution if there is one.
For a minimal cipher on all args to stdin, we can do 111 chars:
for _,w in ipairs(arg)do W=""for i=1,#w do c=w:sub(i,i)W=(#w%2~=i%2)and W ..c or c ..W end io.write(W ..' ')end
But this approach does output a trailing space like some of the other solutions.
For an input in s:
f=lambda t,r="":t and f(t[1:],len(t)&1and t[0]+r or r+t[0])or r
" ".join(map(f,s.split()))
Python, 90 characters including whitespace.
TCL
125 characters
set s set f foreach l {}
$f w [gets stdin] {$s r {}
$f c [split $w {}] {$s r $c[string reverse $r]}
$s l "$l $r"}
puts $l
Bash - 133, assuming input is in $w variable
Pretty
for x in $w; do
z="";
for l in `echo $x|sed 's/\(.\)/ \1/g'`; do
if ((${#z}%2)); then
z=$z$l;
else
z=$l$z;
fi;
done;
echo -n "$z ";
done;
echo
Compressed
for x in $w;do z="";for l in `echo $x|sed 's/\(.\)/ \1/g'`;do if ((${#z}%2));then z=$z$l;else z=$l$z;fi;done;echo -n "$z ";done;echo
Ok, so it outputs a trailing space.