Scala String and Char Types - scala

val args = "To now was far back saw the *$# giant planet itself, het a won"
Find and sort distinct anagram pairs from "args":
now won
was saw
the het
First I clean up the args and put them in an array.
val argsArray = args.replaceAll("[^a-zA-Z0-9\\s]", "").toLowerCase.split(" ").distinct.sorted
argsArray: Array[String] = Array("", a, back, far, giant, het, itself, now, planet, saw, the, to, was, won)
My idea is to reduce each word to an array of char, then sort, then compare. But I get stuck because the following returns the wrong data type ---- String = [C#2736f24a
for (i <- 0 until argsArray.length - 1){
val j = i + 1
if(argsArray(i).toCharArray.sorted == argsArray(j).toCharArray.sorted) {
println(argsArray(i).toCharArray + " " + argsArray(j).toCharArray)
}
}
I assume there are better ways to solve this, but what I really want to learn is how to deal with this data type problem, so please help me solve that and then I will refactor later. Thank you.

[C#<whatever> is just how Array[Char] is converted to String on JVM. Remove calls to toCharArray from println and it'll print the strings you want. The second error, with the current code in the question, is the equality check: == on arrays checks that they are the same object, and since sorted will always create a new array, the left and right sides are always different objects even if they have the same elements.

Related

How to translate this code into python 3?

This code is originally written in Python 2 and I need to translate it in python 3!
I'm sorry for not sharing enough information:
Also, here's the part where self.D was first assigned:
def __init__(self,instance,transformed,describe1,describe2):
self.D=[]
self.instance=instance
self.transformed=transformed
self.describe1,self.describe2=describe1,describe2
self.describe=self.describe1+', '+self.describe2 if self.describe2 else self.describe1
self.column_num=self.tuple_num=self.view_num=0
self.names=[]
self.types=[]
self.origins=[]
self.features=[]
self.views=[]
self.classify_id=-1
self.classify_num = 1
self.classes=[]
def generateViews(self):
T=map(list,zip(*self.D))
if self.transformed==0:
s= int( self.column_num)
for column_id in range(s):
f = Features(self.names[column_id],self.types[column_id],self.origins[column_id])
#calculate min,max for numerical,temporal
if f.type==Type.numerical or f.type==Type.temporal:
f.min,f.max=min(T[column_id]),max(T[column_id])
if f.min==f.max:
self.types[column_id]=f.type=Type.none
self.features.append(f)
continue
d={}
#calculate distinct,ratio for categorical,temporal
if f.type == Type.categorical or f.type == Type.temporal:
for i in range(self.tuple_num):
print([type(self.D[i]) for i in range(self.tuple_num)])
if self.D[i][column_id] in d:
d[self.D[i][column_id]]+=1
else:
d[self.D[i][column_id]]=1
f.distinct = len(d)
f.ratio = 1.0 * f.distinct / self.tuple_num
f.distinct_values=[(k,d[k]) for k in sorted(d)]
if f.type==Type.temporal:
self.getIntervalBins(f)
self.features.append(f)
TypeError: 'map' object is not subscriptable
The snippet you have given is not enough to solve the problem. The problem lies in self.D which you are trying to subscript using self.D[i]. Please look into your code where self.D is instantiated and make sure that its an array-like variable so that you can subscript it.
Edit
based on your edit, please confirm that whether self.D[i] is also array-like for all i in the range mentioned in the code. you can do that by simply
print([type(self.D[i]) for i in range(self.tuple_num))
share the response of this code, so that I may help further.
Edit-2
As per your comments and the edited code snippet, it seems that self.D is the output of some map function. In python 2, map is a function that returns a list. However, in python3 map is a class that when invoked, creates a map object.
The simplest way to resolve this is the find out the line where self.D was first assigned, and whatever code is in the RHS, wrap it with a list(...) function.
Alternately, just after this line
T=map(list,zip(*self.D))
add the following
self.D = list(self.D)
Hope this will resolve the issue
We don't have quite enough information to answer the question, but in Python 3, generator and map objects are not subscriptable. I think it may be in your
self.D[i]
variable, because you claim that self.D is a list, but it is possible that self.D[i] is a map object.
In your case, to access the indexes, you can convert it to a list:
list(self.D)[i]
Or use unpacking to implicitly convert to a list (this may be more condensed, but remember that explicit is better than implicit):
[*self.D[i]]

scala nested for/yield generator to extract substring

I am new to scala. Pls be gentle. My problem for the moment is the syntax error.
(But my ultimate goal is to print each group of 3 characters from every string in the list...now i am merely printing the first 3 characters of every string)
def do_stuff():Unit = {
val s = List[String]("abc", "fds", "654444654")
for {
i <- s.indices
r <- 0 to s(i).length by 3
println(s(i).substring(0,3))
} yield {s(i)}
}
do_stuff()
i am getting this error. it is syntax related, but i dont undersatnd..
Error:(12, 18) ')' expected but '.' found.
println(s(i).substring(0,3))
That code doesn't compile because in a for-comprehension, you can't just put a print statement, you always need an assignment, in this case, a dummy one can solve your porblem.
_ = println(s(i).substring(0,3))
EDIT
If you want the combination of 3 elements in every String you can use combinations method from collections.
List("abc", "fds", "654444654").flatMap(_.combinations(3).toList)

Scala, user input till only newline is given

I have tried to get multiple user inputs to print them in Scala IDE.
I have tried the this piece of code
println(scala.io.StdIn.readLine())
which works, as the IDE takes my input and then print it in the line but this works only for a single input.
I want the code to take multiple inputs till only newline is entered. example,
1
2
3
so i decided we needed an iterator for the input, which led me to try the following 2 lines of code seperately
var in = Iterator.continually{ scala.io.StdIn.readLine() }.takeWhile { x => x != null}
and
var in = io.Source.stdin.getLines().takeWhile { x => x != null}
Unfortunately none of them worked as the IDE is not taking my input at all.
You're really close.
val in = Iterator.continually(io.StdIn.readLine).takeWhile(_.nonEmpty).toList
This will read input until an empty string is entered and saves the input in a List[String]. The reason for toList is because an Iterator element doesn't become real until next is called on it, so readLine won't be called until the next element is required. The transition to List creates all the elements of the Iterator.
update
As #vossad01 has pointed out, this can be made safer for unexpected input.
val in = Iterator.continually(io.StdIn.readLine)
.takeWhile(Option(_).fold(false)(_.nonEmpty))
.toList

Scala Performance Issue with mutable List (LinkedList)

I have the following code snippets: The code reads the system (Linux) dictionary(en) file and keeps it in memory List.
Code 1 : (With mutable List)
val word = scala.collection.mutable.LinkedList[String]("init");
for(line <- Source.fromFile("/usr/share/dict/words").getLines()){
val s : String = line.trim()
if( // some checks
){
word append scala.collection.mutable.LinkedList[String](s)
}
}
Code 2 : (With Immutable List)
var word = List[String]()
for(line <- Source.fromFile("/usr/share/dict/words").getLines()){
val s : String = line.trim()
if( // some checks
){
word ::= s
}
}
Code 2 : returns almost immediately , But
Code 1 : Takes for ever .
Can any one help me out , why is it taking so much time for mutable List? . Should we use Mutable at all or Am I doing something wrong?
Scala version used : 2.10.3
Thanks in Advance for your help.
word append scala.collection.mutable.LinkedList[String](s)
Traverse the word list and then at the end append the items from the other list.
word ::= s
Append s at the front of the word list and assign the new list to word variable.
Appending to the end of list is always expensive as compared to add a item to the front.
In the first example, you are adding to the end of a list repeatedly (append). This takes time on the order of the length of the list. In the second example, you are adding to the beginning of a list (::). This takes constant time. So the first example has an execution time that increases with the square of the number of lines in the file, and the second has an execution time that increases linearly with the length of the file.
This is due to the nature of linked lists, which are the data structure underlying both immutable List and mutable LinkedList. linked lists are fast to access at the front and slow to access at the back.

Why does the length function seems to delete the list?

I have this code snippet:
for (f <- file_list){
val file_name = path + "\\" + f + ".txt"
val line_list = Source.fromFile(file_name).getLines()
println (file_name + ": " + line_list.length)
println (file_name + ": " + line_list.length)
total_number_lines += line_list.size
}
I have a list of files, for each of them I open it, load it as a list of its lines and then I count the number of lines in the list.
The former call to line_list.length gives the right values of line number, but the latter one always returns zero. Actually, after the length function is executed, the line_list list seems to be empty.
I really cannot understand why is that.
What I am missing?
Source.getLines() returns an Iterator[String], not a collection, so calling .length on it will completely consume it.
You can use Source.fromFile(file_name).getLines().toList if you want to go through it several times.
getLines() returns an Iterator[String] and you can only traverse an iterator once. Calling length exhausts the iterator, so subsequent calls to length and size are being called when the end has being reached, hence it appearing empty:
It is of particular importance to note that, unless stated otherwise,
one should never use an iterator after calling a method on it. The two
most important exceptions are also the sole abstract methods: next and
hasNext.