Converting arabic numbers into chinese financial numbers - scala

I am trying to create a function in functional programming, which recieves a normal Int value and translates it to financial chinese numbers and returns a String, for exaple: 301 = 三百零一. To begin, I have two maps, one with every digit from 0 to 9, and the other one with the exponentials, from 10, to 1000000.
val digits: Map[Int, String] = Map(0 -> "〇", 1 -> "壹", 2 -> "貳", 3 -> "參", 4 -> "肆", 5 -> "伍", 6 -> "陸", 7 -> "柒", 8 -> "捌", 9 -> "玖");
val exponent: Map[Int, String] = Map(1 -> "", 10 -> "拾", 100 -> "佰", 1000 -> "仟", 10000 -> "萬", 100000 -> "億", 1000000 -> "兆");
For the ones who don´t know, here goes a little explanation about how chinese numbers work. If you already know, don´t bother in reading this paragraph. In the chinese numbers, when you want to write a large number, for example 5000, you write the 5 and the 1000 symbols (伍仟) to intimate that you are multiplying 5 * 1000. If you have 539, it´s 5100 + 310 + 9. This would be 伍佰參拾玖. Lastly, if the number has 0´s between multiplications, it doesn´t matter how many they are, you write only one 0 between the other characters. For example: 501 = 5100 + 1. This is 伍佰〇壹. One last example for calrification: 50103 = 510000 + 1*100 + 3. This is 伍萬〇壹佰〇參.
So what I could do, is the following:
def format(unit: Int): String = {
val l = unit.toString.map(_.asDigit).toList
if(l.isEmpty) ""
else if(l.tail.isEmpty) digits(l.head)
else digits(l.head) + format(l.tail.mkString.toInt)
}
This translates the characters one by one. For example:
format(135) "壹參伍"
And I don´t know how to continue.

If I understood your problem correctly you can do something like this:
def toChineseFinancial(number: Int): String = {
val digits = number.toString.iterator.map(_.asDigit).toList
val length = digits.length
val exponents = List.tabulate(length)(n => math.pow(10, n).toInt)
val (sb, _) =
digits
.iterator
.zip(exponents.reverseIterator)
.foldLeft(new collection.mutable.StringBuilder(length * 2) -> false) {
case ((sb, flag), (digit, exp)) =>
if (digit == 0) sb -> true
else if (flag) sb.append("〇").append(digitsMap(digit)).append(exponentsMap(exp)) -> false
else sb.append(digitsMap(digit)).append(exponentsMap(exp)) -> false
}
sb.result()
}
You can see it running here.
Note: I used mutable.StringBuilder because building Strings is somewhat expensive, but if you want to avoid any kind of mutability you can easily replace it with a normal String.

I would expand the exponents Map using a simple case class for its values to cover:
numbers of magnitude 1, 10, 10^2, ..., 10^12
10's, 100's and 1000's of "萬" (10^4), "億" (10^8) and "兆" (10^12)
as shown below:
case class CNU(unit: String, factor: Int)
val exponents: Map[Long, CNU] = Map(
1L -> CNU("", 1),
10L -> CNU("拾", 1),
100L -> CNU("佰", 1),
1000L -> CNU("仟", 1),
10000L ->CNU("萬", 1),
100000L -> CNU("萬", 10),
1000000L -> CNU("萬", 100),
10000000L -> CNU("萬", 1000),
100000000L -> CNU("億", 1),
1000000000L -> CNU("億", 10),
10000000000L -> CNU("億", 100),
100000000000L -> CNU("億", 1000),
1000000000000L -> CNU("兆", 1),
10000000000000L -> CNU("兆", 10),
100000000000000L -> CNU("兆", 100),
1000000000000000L -> CNU("兆", 1000)
)
Creating the method:
val digits: Map[Int, String] = Map(
0 -> "〇", 1 -> "壹", 2 -> "貳", 3 -> "參", 4 -> "肆",
5 -> "伍", 6 -> "陸", 7 -> "柒", 8 -> "捌", 9 -> "玖"
)
def toChineseNumber(num: Long): String = {
val s = num.toString
val ds = s.map(_.asDigit).zip(s.length-1 to 0 by -1)
ds.foldRight(List.empty[String], 0){ case ((d, i), (accList, dPrev)) =>
val cnu = exponents(math.pow(10, i).toLong)
val digit =
if (d == 0) {
if (dPrev != 0 || num == 0) digits(d) else ""
}
else
digits(d)
val unit =
if (d == 0)
""
else {
if (cnu.factor == 1) cnu.unit else exponents(cnu.factor).unit
}
((digit + unit) :: accList, d)
}.
_1.mkString
}
Note that method foldRight is used to traverse and process the input number from right to left and dPrev in the tuple-accumulator is for carrying digits across iterations for handling repetitive 0's.
Testing it:
toChineseNumber(50)
// res1: String = 伍拾
toChineseNumber(30001)
// res2: String = 參萬〇壹
toChineseNumber(1023405)
// res3: String = 壹佰〇貳萬參仟肆佰〇伍
toChineseNumber(2233007788L)
// res4: String = 貳拾貳億參仟參佰〇柒仟柒佰捌拾捌

Related

How to create an Iterable of semi-homogenous Integers that sum to a max in Scala?

In scala; given a maximum sum value and a maximum element value, how can an Iterable of elements be created such that the elements add up to the maximum sum? The Iterable should have the smallest size possible.
As an example, given
val maxSum = 47
val maxElementValue = 10
How can the following Iterable be created:
Iterable(10, 10, 10, 10, 7) //sum to 47
Other examples:
val maxSum = 9
val maxElementValue = 10
Iterable(9)
val maxSum = 11
val maxElementValue = 5
Iterable(5, 5, 1)
Thank you in advance for your consideration and response.
If you are in 2.13 you can use unfold
def elementGenerator(maxSum: Int, maxElementValue: Int): List[Int] =
Iterator.unfold(maxSum) { remainingSum =>
if (remainingSum == 0)
None
else if (remainingSum <= maxElementValue)
Some(remainingSum -> 0)
else
Some(maxElementValue -> (remainingSum - maxElementValue))
}.toList
You may also consider just returning the Iterator or using a LazyList if you do not need to keep all the elements right now but just know how to generate them.
How about
((maxSum % maxElementValue) :: List.fill(maxSum / maxElementValue)(maxElementValue)).filterNot(_ == 0)
You can use .unfold for this:
// Use `Iterator.unfold` if you want an Iterator instead of a List
List.unfold(maxSum) { remaining =>
if (remaining <= 0) None
else {
val nextElement = math.min(maxElementValue, remaining)
Some(nextElement -> (remaining - nextElement))
}
}
returns
List(10, 10, 10, 10, 7)
See it on scastie.
A recursive function can do the trick
#scala.annotation.tailrec
def elementGenerator(maxSum: Int, maxElementValue: Int, currentElements: Seq[Int] = Seq.empty[Int]) : Seq[Int] =
if(maxSum == 0 || maxValue <= 0)
currentElements
else if(maxSum < maxElementValue)
currentElements :+ maxSum
else
elementGenerator(maxSum - maxElementValue, maxElementValue, currentElements :+ maxElementValue)
Presented mostly as a "well, it works but you really don't want to do it this way" alternative :)
Iterator.fill(maxSum)(1).grouped(maxElementValue).map(_.sum)

How to pair each element of a Seq with the rest?

I'm looking for an elegant way to combine every element of a Seq with the rest for a large collection.
Example: Seq(1,2,3).someMethod should produce something like
Iterator(
(1,Seq(2,3)),
(2,Seq(1,3)),
(3,Seq(1,2))
)
Order of elements doesn't matter. It doesn't have to be a tuple, a Seq(Seq(1),Seq(2,3)) is also acceptable (although kinda ugly).
Note the emphasis on large collection (which is why my example shows an Iterator).
Also note that this is not combinations.
Ideas?
Edit:
In my use case, the numbers are expected to be unique. If a solution can eliminate the dupes, that's fine, but not at additional cost. Otherwise, dupes are acceptable.
Edit 2: In the end, I went with a nested for-loop, and skipped the case when i == j. No new collections were created. I upvoted the solutions that were correct and simple ("simplicity is the ultimate sophistication" - Leonardo da Vinci), but even the best ones are quadratic just by the nature of the problem, and some create intermediate collections by usage of ++ that I wanted to avoid because the collection I'm dealing with has close to 50000 elements, 2.5 billion when quadratic.
The following code has constant runtime (it does everything lazily), but accessing every element of the resulting collections has constant overhead (when accessing each element, an index shift must be computed every time):
def faceMap(i: Int)(j: Int) = if (j < i) j else j + 1
def facets[A](simplex: Vector[A]): Seq[(A, Seq[A])] = {
val n = simplex.size
(0 until n).view.map { i => (
simplex(i),
(0 until n - 1).view.map(j => simplex(faceMap(i)(j)))
)}
}
Example:
println("Example: facets of a 3-dimensional simplex")
for ((i, v) <- facets((0 to 3).toVector)) {
println(i + " -> " + v.mkString("[", ",", "]"))
}
Output:
Example: facets of a 3-dimensional simplex
0 -> [1,2,3]
1 -> [0,2,3]
2 -> [0,1,3]
3 -> [0,1,2]
This code expresses everything in terms of simplices, because "omitting one index" corresponds exactly to the face maps for a combinatorially described simplex. To further illustrate the idea, here is what the faceMap does:
println("Example: how `faceMap(3)` shifts indices")
for (i <- 0 to 5) {
println(i + " -> " + faceMap(3)(i))
}
gives:
Example: how `faceMap(3)` shifts indices
0 -> 0
1 -> 1
2 -> 2
3 -> 4
4 -> 5
5 -> 6
The facets method uses the faceMaps to create a lazy view of the original collection that omits one element by shifting the indices by one starting from the index of the omitted element.
If I understand what you want correctly, in terms of handling duplicate values (i.e., duplicate values are to be preserved), here's something that should work. Given the following input:
import scala.util.Random
val nums = Vector.fill(20)(Random.nextInt)
This should get you what you need:
for (i <- Iterator.from(0).take(nums.size)) yield {
nums(i) -> (nums.take(i) ++ nums.drop(i + 1))
}
On the other hand, if you want to remove dups, I'd convert to Sets:
val numsSet = nums.toSet
for (num <- nums) yield {
num -> (numsSet - num)
}
seq.iterator.map { case x => x -> seq.filter(_ != x) }
This is quadratic, but I don't think there is very much you can do about that, because in the end of the day, creating a collection is linear, and you are going to need N of them.
import scala.annotation.tailrec
def prems(s : Seq[Int]):Map[Int,Seq[Int]]={
#tailrec
def p(prev: Seq[Int],s :Seq[Int],res:Map[Int,Seq[Int]]):Map[Int,Seq[Int]] = s match {
case x::Nil => res+(x->prev)
case x::xs=> p(x +: prev,xs, res+(x ->(prev++xs)))
}
p(Seq.empty[Int],s,Map.empty[Int,Seq[Int]])
}
prems(Seq(1,2,3,4))
res0: Map[Int,Seq[Int]] = Map(1 -> List(2, 3, 4), 2 -> List(1, 3, 4), 3 -> List(2, 1, 4),4 -> List(3, 2, 1))
I think you are looking for permutations. You can map the resulting lists into the structure you are looking for:
Seq(1,2,3).permutations.map(p => (p.head, p.tail)).toList
res49: List[(Int, Seq[Int])] = List((1,List(2, 3)), (1,List(3, 2)), (2,List(1, 3)), (2,List(3, 1)), (3,List(1, 2)), (3,List(2, 1)))
Note that the final toList call is only there to trigger the evaluation of the expressions; otherwise, the result is an iterator as you asked for.
In order to get rid of the duplicate heads, toMap seems like the most straight-forward approach:
Seq(1,2,3).permutations.map(p => (p.head, p.tail)).toMap
res50: scala.collection.immutable.Map[Int,Seq[Int]] = Map(1 -> List(3, 2), 2 -> List(3, 1), 3 -> List(2, 1))

How do you pad a string in Scala with a character for missing elements in a Vector?

If I have a sparse list of numbers:
Vector(1,3,7,8,9)
and I need to generate a string of a fixed size which replaces the 'missing' numbers with a given character that might look like this:
1.3...789
How would I do this in Scala?
Well, I'm not sure the range of the integers. So I'm assuming that they may not fit into a char and used a string. Try this:
val v = Vector(1,3,7,8,9)
val fixedStr = ( v.head to v.last )
.map( i => if (v.contains(i)) i.toString else "." )
.mkString
If you are only dealing with single digits then you may change the strings to chars in the above.
-- edit --
ok, so I couldn't help myself and addressed the issue of sparse vector and wanted to change it to use the sliding function. Figured it does no good sitting on my PC so sharing here:
v.sliding(2)
.map( (seq) => if (seq.size == 2) seq else seq ++ seq ) //normalize window to size 2
.foldLeft( new StringBuilder )( (sb, seq) => //fold into stringbuilder
seq match { case Seq(a,b) => sb.append(a).append( "." * (b - a - 1) ) } )
.append( v.last )
.toString
One way to do this is using sliding and pattern matching:
def mkNiceString(v: Vector[Int]) = {
v.sliding(2).map{
case Seq(a) => ""
case Seq(a,b) =>
val gap = b-a;
a.toString + (if(gap>1) "." * (gap-1) else "")
}.mkString + v.last
}
In the REPL:
scala> mkNiceString(Vector(1,3,7,8,9,11))
res22: String = 1.3...789.11
If the vector is sparse, this will be more efficient than checking the range between the first and the last number.
def padVector(numbers: Vector[Int], placeHolder: String) = {
def inner(nums: Vector[Int], prevNumber: Int, acc: String) : String =
if (nums.length == 0) acc
else (nums.head - prevNumber) match {
// the difference is 1 -> no gap between this and previous number
case 1 => inner(nums.tail, nums.head, acc + nums.head)
// gap between numbers -> add placeholder x times
case x => inner(nums.tail, nums.head, acc + (placeHolder * (x-1)) + nums.head)
}
if (numbers.length == 0) ""
else inner(numbers.tail, numbers.head, numbers.head.toString)
}
Output:
scala> padVector(Vector(1,3,7,8,9), ".")
res4: String = 1.3...789

Slow IO with large data

I'm trying to find a better way to do this as it could take years to compute! I'm need to compute a map which is too large to fit in memory, so I am trying to make use of IO as follows.
I have a file that contains a list of Ints, about 1 million of them. I have another file that contains data about my (500,000) document collection. I need to calculate a function of the count, for every Int in the first file, of how many documents (lines in the second) it appears in. Let me give an example:
File1:
-1
1
2
etc...
file2:
E01JY3-615, CR93E-177 , [-1 -> 2,1 -> 1,2 -> 2,3 -> 2,4 -> 2,8 -> 2,... // truncated for brevity]
E01JY3-615, CR93E-177 , [1 -> 2,2 -> 2,4 -> 2,5 -> 2,8 -> 2,... // truncated for brevity]
etc...
Here is what I have tried so far
def printToFile(f: java.io.File)(op: java.io.PrintWriter => Unit) {
val p = new java.io.PrintWriter(new BufferedWriter((new FileWriter(f))))
try {
op(p)
} finally {
p.close()
}
}
def binarySearch(array: Array[String], word: Int):Boolean = array match {
case Array() => false
case xs => if (array(array.size/2).split("->")(0).trim().toInt == word) {
return true
} else if (array(array.size/2).split("->")(0).trim().toInt > word){
return binarySearch(array.take(array.size/2), word)
} else {
return binarySearch(array.drop(array.size/2 + 1), word)
}
}
var v = Source.fromFile("vocabulary.csv").getLines()
printToFile(new File("idf.csv"))(out => {
v.foreach(word =>{
var docCount: Int = 0
val s = Source.fromFile("documents.csv").getLines()
s.foreach(line => {
val split = line.split("\\[")
val fpStr = split(1).init
docCount = if (binarySearch(fpStr.split(","), word.trim().toInt)) docCount + 1 else docCount
})
val output = word + ", " + math.log10(500448 / (docCount + 1))
out.println(output)
println(output)
})
})
There must be a faster way to do this, can anyone think of a way?
From what I understand of your code, you are trying to find every word in the dictionary in the document list.
Hence, you are making N*M comparisons, where N is the number of words (in the dictionary with integers) and M is the number of documents in the document list. Instantiating to your values, you are trying to calculate 10^6 * 5*10^5 comparisons which is 5*10^11. Unfeasible.
Why not create a mutable map with all the integers in the dictionary as keys (1000000 ints in memory is roughly 3.8M from my measurements) and pass through the document list only once, where for each document you extract the integers and increment the respective count values in the map (for which the integer is key).
Something like this:
import collection.mutable.Map
import scala.util.Random._
val maxValue = 1000000
val documents = collection.mutable.Map[String,List[(Int,Int)]]()
// util function just to insert fake input; disregard
def provideRandom(key:String) ={ (1 to nextInt(4)).foreach(_ => documents.put(key,(nextInt(maxValue),nextInt(maxValue)) :: documents.getOrElse(key,Nil)))}
// inserting fake documents into our fake Document map
(1 to 500000).foreach(_ => {val key = nextString(5); provideRandom(key)})
// word count map
val wCount = collection.mutable.Map[Int,Int]()
// Counting the numbers and incrementing them in the map
documents.foreach(doc => doc._2.foreach(k => wCount.put(k._1, (wCount.getOrElse(k._1,0)+1))))
scala> wCount
res5: scala.collection.mutable.Map[Int,Int] = Map(188858 -> 1, 178569 -> 2, 437576 -> 2, 660074 -> 2, 271888 -> 2, 721076 -> 1, 577416 -> 1, 77760 -> 2, 67471 -> 1, 804106 -> 2, 185283 -> 1, 41623 -> 1, 943946 -> 1, 778258 -> 2...
the result is a map with its keys being a number in the dict and the value the number of times it appears in the document list
This is oversimplified since
I dont verify if the number exists in the dictionary, although you only need to init the map with the values and then increment the value in the final map if it has that key;
I dont do IO, which speeds up the whole thing
This way you only pass through the documents once, which makes the task feasible again.

Integer partitioning in Scala

Given n ( say 3 people ) and s ( say 100$ ), we'd like to partition s among n people.
So we need all possible n-tuples that sum to s
My Scala code below:
def weights(n:Int,s:Int):List[List[Int]] = {
List.concat( (0 to s).toList.map(List.fill(n)(_)).flatten, (0 to s).toList).
combinations(n).filter(_.sum==s).map(_.permutations.toList).toList.flatten
}
println(weights(3,100))
This works for small values of n. ( n=1, 2, 3 or 4).
Beyond n=4, it takes a very long time, practically unusable.
I'm looking for ways to rework my code using lazy evaluation/ Stream.
My requirements : Must work for n upto 10.
Warning : The problem gets really big really fast. My results from Matlab -
---For s =100, n = 1 thru 5 results are ---
n=1 :1 combinations
n=2 :101 combinations
n=3 :5151 combinations
n=4 :176851 combinations
n=5: 4598126 combinations
---
You need dynamic programming, or memoization. Same concept, anyway.
Let's say you have to divide s among n. Recursively, that's defined like this:
def permutations(s: Int, n: Int): List[List[Int]] = n match {
case 0 => Nil
case 1 => List(List(s))
case _ => (0 to s).toList flatMap (x => permutations(s - x, n - 1) map (x :: _))
}
Now, this will STILL be slow as hell, but there's a catch here... you don't need to recompute permutations(s, n) for numbers you have already computed. So you can do this instead:
val memoP = collection.mutable.Map.empty[(Int, Int), List[List[Int]]]
def permutations(s: Int, n: Int): List[List[Int]] = {
def permutationsWithHead(x: Int) = permutations(s - x, n - 1) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoP getOrElseUpdate ((s, n),
(0 to s).toList flatMap permutationsWithHead)
}
}
And this can be even further improved, because it will compute every permutation. You only need to compute every combination, and then permute that without recomputing.
To compute every combination, we can change the code like this:
val memoC = collection.mutable.Map.empty[(Int, Int, Int), List[List[Int]]]
def combinations(s: Int, n: Int, min: Int = 0): List[List[Int]] = {
def combinationsWithHead(x: Int) = combinations(s - x, n - 1, x) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoC getOrElseUpdate ((s, n, min),
(min to s / 2).toList flatMap combinationsWithHead)
}
}
Running combinations(100, 10) is still slow, given the sheer numbers of combinations alone. The permutations for each combination can be obtained simply calling .permutation on the combination.
Here's a quick and dirty Stream solution:
def weights(n: Int, s: Int) = (1 until s).foldLeft(Stream(Nil: List[Int])) {
(a, _) => a.flatMap(c => Stream.range(0, n - c.sum + 1).map(_ :: c))
}.map(c => (n - c.sum) :: c)
It works for n = 6 in about 15 seconds on my machine:
scala> var x = 0
scala> weights(100, 6).foreach(_ => x += 1)
scala> x
res81: Int = 96560646
As a side note: by the time you get to n = 10, there are 4,263,421,511,271 of these things. That's going to take days just to stream through.
My solution of this problem, it can computer n till 6:
object Partition {
implicit def i2p(n: Int): Partition = new Partition(n)
def main(args : Array[String]) : Unit = {
for(n <- 1 to 6) println(100.partitions(n).size)
}
}
class Partition(n: Int){
def partitions(m: Int):Iterator[List[Int]] = new Iterator[List[Int]] {
val nums = Array.ofDim[Int](m)
nums(0) = n
var hasNext = m > 0 && n > 0
override def next: List[Int] = {
if(hasNext){
val result = nums.toList
var idx = 0
while(idx < m-1 && nums(idx) == 0) idx = idx + 1
if(idx == m-1) hasNext = false
else {
nums(idx+1) = nums(idx+1) + 1
nums(0) = nums(idx) - 1
if(idx != 0) nums(idx) = 0
}
result
}
else Iterator.empty.next
}
}
}
1
101
5151
176851
4598126
96560646
However , we can just show the number of the possible n-tuples:
val pt: (Int,Int) => BigInt = {
val buf = collection.mutable.Map[(Int,Int),BigInt]()
(s,n) => buf.getOrElseUpdate((s,n),
if(n == 0 && s > 0) BigInt(0)
else if(s == 0) BigInt(1)
else (0 to s).map{k => pt(s-k,n-1)}.sum
)
}
for(n <- 1 to 20) printf("%2d :%s%n",n,pt(100,n).toString)
1 :1
2 :101
3 :5151
4 :176851
5 :4598126
6 :96560646
7 :1705904746
8 :26075972546
9 :352025629371
10 :4263421511271
11 :46897636623981
12 :473239787751081
13 :4416904685676756
14 :38393094575497956
15 :312629484400483356
16 :2396826047070372396
17 :17376988841260199871
18 :119594570260437846171
19 :784008849485092547121
20 :4910371215196105953021