kotlin split utf string into single length sub strings using codepoint

kotlin split utf string into single length sub strings using codepoint - unicode

I'm just starting kotlin so I'm sure there is an easy way to do this but I don't see it. I want to split a into single-length sub strings using codepoints. In Java 8, this works:
public class UtfSplit {
static String [] utf8Split (String str) {
int [] codepoints = str.codePoints().toArray();
String [] rv = new String[codepoints.length];
for (int i = 0; i < codepoints.length; i++)
rv[i] = new String(codepoints, i, 1);
return rv;
}
public static void main(String [] args) {
String test = "こんにちは皆さん";
System.out.println("Test string:" + test);
StringBuilder sb = new StringBuilder("Result:");
for(String s : utf8Split(test))
sb.append(s).append(", ");
System.out.println(sb.toString());
}
}
Output is:
Test string:こんにちは皆さん
Result:こ, ん, に, ち, は, 皆, さ, ん,
How would I do this in kotlin?? I can get to codepoints although it's clumsy and I'm sure I'm doing it wrong. But I can't get from the codepoints back to a strings. The whole string/character interface seems different to me and I'm just not getting it.
Thanks
Steve S.

You are using the same runtime as Java so the code is basically doing the same thing. However, the Kotlin version is shorter, and also has no need for a class, although you could group utility methods into an object. Here is the version using top-level functions:
fun splitByCodePoint(str: String): Array<String> {
val codepoints = str.codePoints().toArray()
return Array(codepoints.size) { index ->
String(codepoints, index, 1)
}
}
fun main(args: Array<String>) {
val input = "こんにちは皆さん"
val result = splitByCodePoint(input)
println("Test string: ${input}")
println("Result: ${result.joinToString(", ")}")
}
Output:
Test string: こんにちは皆さん
Result: こ, ん, に, ち, は, 皆, さ, ん
Note: I renamed the function because the encoding doesn't really matter since you are just splitting by Codepoints.
Some might write this without the local variable:
fun splitByCodePoint(str: String): Array<String> {
return str.codePoints().toArray().let { codepoints ->
Array(codepoints.size) { index -> String(codepoints, index, 1) }
}
}
See also:
Kotlin stdlib Array initializer by index w/lambda
Kotlin stdlib let function

Related

Simple Scala Function Not Removing Double Spaces

For some reason when I input a string with double spaces such as " ", the function does not remove them from the string, nor does it remove them when they are generated by two WUB's in a row
For example:
songDecoder("WUBCATWUBWUBBALLWUB") outputs "CAT_ _BALL" (underscores represent spaces)
I could fix this by other means, but since I have no idea why my current code isn't working I figured I should ask to patch my understanding.
def songDecoder(song:String):String = {
val l = song.indexOf("WUB")
if (song.contains(" ")) {
val e = song.indexOf(" ")
songDecoder(song.patch(e,Nil,1))
}
if (l==0) {
val c = song.patch(l,Nil,3)
songDecoder(c)
}
if (l== -1)
song.trim
else {
val c = song.patch(l,Nil,2)
val b = c.patch(l," ",1)
songDecoder(b)
}
}

The reason it doesn't work is because when you call a recursive method it eventually returns with its result. The code that clears out the double-whitespace doesn't save that result.
if (song.contains(" ")) {
val e = song.indexOf(" ")
songDecoder(song.patch(e,Nil,1)) //send patched song to decoder
} //don't save returned string
//continue with unpatched song
The 2nd if block also recurses without saving the result.
if (l==0) {
val c = song.patch(l,Nil,3)
songDecoder(c) //send patched song to decoder
} //don't save returned string
//continue with unpatched song
You can remove both of those if blocks and you'll get the same results from your method. The only code that effects the output is the final if/else and that's because it is at the end of the method's code block. So whatever the if/else produces that's what the method returns.
if (l== -1)
song.trim //return the final result string
else {
val c = song.patch(l,Nil,2) //remove one WUB
val b = c.patch(l," ",1) //replace with space
songDecoder(b) //return whatever the next recursion returns
}
Just as an FYI, here's a different approach.
def songDecoder(song:String):String =
"(WUB)+".r.replaceAllIn(song, " ").trim

How about something like:
song.split(“(WUB)+”).mkString(“ “).trim

How to block the return until a timer expires using RxJava

I'm not seeing anything ever get returned by the scan. I know it's because the mutableList gets returned right away, but how do I block the return until the time expires?
Basically, all I want to do is fill up the mutable list for as long as the take() permits then return that mutableList to the calling function.
This is what I have tried.
private val timeoutScheduler: Scheduler = Schedulers.computation()
fun scanForAllDevicesStartingWith(devicePrefix: String): List<String> {
Log.d(TAG, "Scanning for devices starting with $devicePrefix")
val mutableList = mutableListOf<String>()
val result = scanForDevices()
.take(3, TimeUnit.SECONDS, timeoutScheduler)
.subscribe { scanResult ->
val name = scanResult.bleDevice.name
Logger.d(TAG, "Potential device named $name found")
if(name != null) {
if(name.startsWith(prefix = devicePrefix)) {
Logger.d(TAG, "Match found $name")
mutableList.plus(name)
}
}
}
return mutableList
}
private fun scanForDevices(): Observable<ScanResult>
= rxBleClient.scanBleDevices(
ScanSettings.Builder()
.setScanMode(ScanSettings.SCAN_MODE_LOW_LATENCY)
.setCallbackType(ScanSettings.CALLBACK_TYPE_ALL_MATCHES)
.build(),
ScanFilter.Builder()
.build())
}

OK, here it is boiled down for the next person who wants to do this kind of thing. In Rx, they have Singles which are Observables that just emit one value. In my case I needed a list of String values, so just need to use a Single of type List of type String. That gets just one element emitted that happens to be a list of Strings. The code looks like this...
fun returnAllDevicesStartingWith(devicePrefix: String): Single<List<String>> {
return scanForDevices()
.take(3, TimeUnit.SECONDS, timeoutScheduler)
.map { it.bleDevice.name }
.filter { it.startsWith(devicePrefix) }
.toList()
}
The function that calls it (written in Java instead of Kotlin) looks like this:
List<String> devices = bleUtility.returnAllDevicesStartingWith(prefix).blockingGet();
I tested it using a mocked function like this:
//Begin test code
var emittedList: List<String> = listOf("dev1-1", "dev1-2", "dev2-1", "dev2-2", "dev3-1", "dev3-2")
private fun scanForRoomDevices(): Observable<FoundDevice> = Observable
.intervalRange(0, emittedList.size.toLong(), 0, 1, TimeUnit.SECONDS, timeoutScheduler)
.map { index -> FoundDevice(emittedList[index.toInt()], BleDevice(emittedList[index.toInt()])) }
data class FoundDevice(val controllerId: String, val bleDevice: BleDevice)
data class BleDevice(val name: String)
Hope this helps others.

Parsing text and representing it with tokens using Scala

I'm getting frustrated trying to convert a small part of the Golang templating language to Scala.
Below are the key parts of the lex.go source code: https://github.com/golang/go/blob/master/src/text/template/parse/lex.go
The tests are here: https://github.com/golang/go/blob/master/src/text/template/parse/lex_test.go
Basically this "class" takes a string and returns an Array of "itemType". In the template string, the start and end of special tokens is using curly braces {{ and }}.
For for example:
"{{for}}"
returns an array of 4 items:
item{itemLeftDelim, 0, "{{" } // scala case class would be Item(ItemLeftDelim, 0, "")
item{itemIdentifier, 0, "for"}
item{itemRightDelim, 0, "}}"}
item{itemEOF, 0, ""}
The actual call would look like:
l := lex("for", `{{for}}`, "{{", "}}") // you pass in the start and end delimeters {{ and }}
for {
item := l.nextItem()
items = append(items, item)
if item.typ == itemEOF || item.typ == itemError {
break
}
}
return
The key parts of the source code are below:
// itemType identifies the type of lex items.
type itemType int
const (
itemError itemType = iota // error occurred; value is text of error
itemEOF
itemLeftDelim // left action delimiter
// .............. skipped
)
const (
leftDelim = "{{"
rightDelim = "}}"
leftComment = "/*"
rightComment = "*/"
)
// item represents a token or text string returned from the scanner.
type item struct {
typ itemType // The type of this item.
pos Pos // The starting position, in bytes, of this item in the input string.
val string // The value of this item.
}
// stateFn represents the state of the scanner as a function that returns the next state.
type stateFn func(*lexer) stateFn
// lexer holds the state of the scanner.
type lexer struct {
name string // the name of the input; used only for error reports
input string // the string being scanned
leftDelim string // start of action
rightDelim string // end of action
state stateFn // the next lexing function to enter
pos Pos // current position in the input
start Pos // start position of this item
width Pos // width of last rune read from input
lastPos Pos // position of most recent item returned by nextItem
items chan item // channel of scanned items
parenDepth int // nesting depth of ( ) exprs
}
// lex creates a new scanner for the input string.
func lex(name, input, left, right string) *lexer {
if left == "" {
left = leftDelim
}
if right == "" {
right = rightDelim
}
l := &lexer{
name: name,
input: input,
leftDelim: left,
rightDelim: right,
items: make(chan item),
}
go l.run()
return l
}
// run runs the state machine for the lexer.
func (l *lexer) run() {
for l.state = lexText; l.state != nil; {
l.state = l.state(l)
}
}
// nextItem returns the next item from the input.
func (l *lexer) nextItem() item {
item := <-l.items
l.lastPos = item.pos
return item
}
// emit passes an item back to the client.
func (l *lexer) emit(t itemType) {
l.items <- item{t, l.start, l.input[l.start:l.pos]}
l.start = l.pos
}
// lexText scans until an opening action delimiter, "{{".
func lexText(l *lexer) stateFn {
for {
if strings.HasPrefix(l.input[l.pos:], l.leftDelim) {
if l.pos > l.start {
l.emit(itemText)
}
return lexLeftDelim
}
if l.next() == eof {
break
}
}
// Correctly reached EOF.
if l.pos > l.start {
l.emit(itemText)
}
l.emit(itemEOF)
return nil
}
// next returns the next rune in the input.
func (l *lexer) next() rune {
if int(l.pos) >= len(l.input) {
l.width = 0
return eof
}
r, w := utf8.DecodeRuneInString(l.input[l.pos:])
l.width = Pos(w)
l.pos += l.width
return r
}
// lexLeftDelim scans the left delimiter, which is known to be present.
func lexLeftDelim(l *lexer) stateFn {
l.pos += Pos(len(l.leftDelim))
if strings.HasPrefix(l.input[l.pos:], leftComment) {
return lexComment
}
l.emit(itemLeftDelim)
l.parenDepth = 0
return lexInsideAction
}
// lexRightDelim scans the right delimiter, which is known to be present.
func lexRightDelim(l *lexer) stateFn {
l.pos += Pos(len(l.rightDelim))
l.emit(itemRightDelim)
return lexText
}
// there are more stateFn
So I was able to write the item and itemType:
case class Item(typ: ItemType, pos: Int, v: String)
sealed trait ItemType
case object ItemError extends ItemType
case object ItemEOF extends ItemType
case object ItemLeftDelim extends ItemType
...
..
.
The stateFn and Lex definitions:
trait StateFn extends (Lexer => StateFn) {
}
I'm basically really stuck on the main parts here. So things seem to be kicked of like this:
A Lex is created, then "go l.run()" is called.
Run is a loop, which keeps looping until EOF or an error is found.
The loop initializes with lexText, which scans until it finds an {{, and then it sends a message to a channel with all the preceding text of type 'itemText', passing it an 'item'. It then returns the function lexLeftDelim. lexLeftDelim does the same sort of thing, it sends a message 'item' of type itemLeftDelim.
It keeps parsing the string until it reaches EOF basically.
I can't think in scala that well, but I know I can use an Actor here to pass it a message 'Item'.
The part of returning a function, I asked I got some good ideas here: How to model recursive function types?
Even after this, I am really frustrated and I can seem to glue these concepts together.
I'm not looking for someone to implement the entire thing for me, but if someone could write just enough code to parse a simple string like "{{}}" that would be awesome. And if they could explain why they did a certain design that would be great.
I created a case class for Lex:
case class Lex(
name: String,
input: String,
leftDelim: String,
rightDelim: String,
state: StateFn,
var pos: Int = 0,
var start: Int = 0,
var width: Int = 0,
var lastPos: Int = 0,
var parenDepth: Int = 0
) {
def next(): Option[String] = {
if (this.pos >= this.input.length) {
this.width = 0
return None
}
this.width = 1
val nextChar = this.input.drop(this.pos).take(1)
this.pos += 1
Some(nextChar)
}
}
The first stateFn is LexText and so far I have:
object LexText extends StateFn {
def apply(l: Lexer) = {
while {
if (l.input.startsWith(l.leftDelim)) {
if (l.pos > l.start) {
// ????????? emit itemText using an actor?
}
return LexLeftDelim
}
if (l.next() == None) {
break
}
}
if(l.pos > l.start) {
// emit itemText
}
// emit EOF
return None // ?? nil? how can I support an Option[StateFn]
}
}
I need guidance on getting the Actor's setup, along with the main run loop:
func (l *lexer) run() {
for l.state = lexText; l.state != nil; {
l.state = l.state(l)
}
}
This is an interesting problem domain that I tried to tackle using Scala, and so far I am a bit confused hoping some else finds it interesting and can work with what little I have so far and provide some code and critique if I am doing it correctly or not.
I know deep down I shouldn't be mutating, but I'm still on the first few pages of the functional book :)

If you translate the go code literally into Scala, you'll get very unidiomatic piece of code. You'll probably get much more maintainable (and shorter!) Scala version by using parser combinators. There are plenty of resources about them on the internet.
import scala.util.parsing.combinator._
sealed trait ItemType
case object LeftDelim extends ItemType
case object RightDelim extends ItemType
case object Identifier extends ItemType
case class Item(ty: ItemType, token: String)
object ItemParser extends RegexParsers {
def left: Parser[Item] = """\{\{""".r ^^ { _ => Item(LeftDelim, "{{") }
def right: Parser[Item] = """\}\}""".r ^^ { _ => Item(RightDelim, "}}") }
def ident: Parser[Item] = """[a-z]+""".r ^^ { x => Item(Identifier, x) }
def item: Parser[Item] = left | right | ident
def items: Parser[List[Item]] = rep(item)
}
// ItemParser.parse(ItemParser.items, "{{foo}}")
// res5: ItemParser.ParseResult[List[Item]] =
// [1.8] parsed: List(Item(LeftDelim,{{), Item(Identifier,foo), Item(RightDelim,}}))
Adding whitespace skipping, or configurable left and right delimiters is trivial.

What's a good way to iterate backwards through the Characters of a String?

What's the most Swiftian way to iterate backwards through the Characters in a String? i.e. like for ch in str, only in reverse?
I think I must be missing something obvious, because the best I could come up with just now was:
for var index = str.endIndex;
index != str.startIndex;
index = index.predecessor() {
let ch = str[index.predecessor()]
...
}
I realise "what's the best..." may be classed as subjective; I suppose what I'm really looking for is a terse yet readable way of doing this.
Edit: While reverse() works and is terse, it looks like this might be quite inefficient compared to the above, i.e. it seems like it's not actually iterating backwards, but creating a full reverse copy of the characters in the String. This would be much worse than my original if, say, you were looking for something that was usually a few characters from the end of a 10,000-character String. I'm therefore leaving this question open for a bit to attract other approaches.

The reversed function reverses a C: CollectionType and returns a ReversedCollection:
for char in "string".characters.reversed() {
// ...
}
If you find that reversed pre-reverses the string, try:
for char in "string".characters.lazy.reversed() {
// ...
}
lazy returns a lazily evaluated sequence (LazyBidirectionalCollection) then reversed() returns another LazyBidirectionalCollection that is visited in reverse.

As of December 2015 with Swift version 2.1, the proper way to do this is
for char in string.characters.reverse() {
//loop backwards
}
String no longer conforms to SequenceType<T> but its character set does.

Not sure about efficiency, but I will suggest
for ch in reverse(str) {
println(ch)
}

Here is a code for reversing a string that doesn't use reverse(str)
// Reverse String
func myReverse(str:String) -> String {
var buffer = ""
for character in str {
buffer.insert(character, atIndex: buffer.startIndex)
}
return buffer
}
myReverse("Paul") // gives “luaP”
Just a little experiment. For what its worth.
Ok, leant how to read the question....
Would this work Matt?
func ReverseIteration(str:String) {
func myReverse(str:String) -> String {
var buffer = ""
for character in str {
buffer.insert(character, atIndex: buffer.startIndex)
}
return buffer
}
// reverse string then iterate forward.
var newStr = myReverse(str)
for char in newStr {
println(char)
// do some code here
}

this?
extension String {
var reverse: String {
var reverseStr = ""
for character in self {
reverseStr = String(character) + reverseStr
}
return reverseStr
}
}

What's the best way to convert String into [Character] in Swift?

I would like to run a filter on a string. My first attempt failed as string is not automagically converted to Character[].
var s: String = "abc"
s.filter { $0 != "b" }
If I clumsily convert the String to Character[] with following code, it works as expected. But surely there has to be a neater way?
var cs:Character[] = []
for c in s {
cs = cs + [c]
}
cs = cs.filter { $0 != "b" }
println(cs)

String conforms to the CollectionType protocol, so you can pass it directly to the function forms of map and filter without converting it at all:
let cs = filter(s) { $0 != "f" }
cs here is an Array of Characters. You can turn it into a String by using the String(seq:) initializer, which constructs a String from any SequenceType of Characters. (SequenceType is a protocol that all lists conform to; for loops use them, among many other things.)
let filteredString = String(seq: cs)
Of course, you can just as easily put those two things in one statement:
let filteredString = String(seq: filter(s) { $0 != "f" })
Or, if you want to make a convenience filter method like the one on Array, you can use an extension:
extension String {
func filter(includeElement: Character -> Bool) -> String {
return String(seq: Swift.filter(self, includeElement))
}
}
(You write it "Swift.filter" so the compiler doesn't think you're trying to recursively call the filter method you're currently writing.)
As long as we're hiding how the filtering is performed, we might as well use a lazy filter, which should avoid constructing the temporary array at all:
extension String {
func filter(includeElement: Character -> Bool) -> String {
return String(seq: lazy(self).filter(includeElement))
}
}

I don't know of a built in way to do it, but you could write your own filter method for String:
extension String {
func filter(f: (Character) -> Bool) -> String {
var ret = ""
for character in self {
if (f(character)) {
ret += character
}
}
return ret
}
}
If you don't want to use an extension you could do this:
Array(s).filter({ $0 != "b" }).reduce("", combine: +)

You can use this syntax:
var chars = Character[]("abc")
I'm not 100% sure if the result is an array of Characters or not but works for my use case.
var str = "abc"
var chars = Character[](str)
var result = chars.map { char in "char is \(char)" }
result

The easiest way to convert a char to string is using the backslash (), for example I have a function to reverse a string, like so.
var identityNumber:String = id
for char in identityNumber{
reversedString = "\(char)" + reversedString
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

kotlin split utf string into single length sub strings using codepoint - unicode

Related

Simple Scala Function Not Removing Double Spaces

How to block the return until a timer expires using RxJava

Parsing text and representing it with tokens using Scala

What's a good way to iterate backwards through the Characters of a String?

What's the best way to convert String into [Character] in Swift?

Categories

Resources