Given a csv in the format below, what is the best way to load it into Scala as type Map[String, Array[String]], with the first key being the unique values for Col2, and the value Array[String]] as all co-occurring values of Col1?
a,1,
b,2,m
c,2,
d,1,
e,3,m
f,4,
g,2,
h,3,
I,1,
j,2,n
k,2,n
l,1,
m,5,
n,2,
I have tried to use the function below, but am getting errors trying to add to the Option type:
+= is not a member of Option[Array[String]]
In addition, I get overloaded method value ++ with alternatives:
with regards to the line case None => mapping ++ (linesplit(2) -> Array(linesplit(1)))
def parseCSV() : Map[String, Array[String]] = {
var mapping = Map[String, Array[String]]()
val lines = Source.fromFile("test.csv")
for (line <- lines.getLines) {
val linesplit = line.split(",")
mapping.get(linesplit(2)) match {
case Some(_) => mapping.get(linesplit(2)) += linesplit(1)
case None => mapping ++ (linesplit(2) -> Array(linesplit(1)))
}
}
mapping
}
}
I am hoping for a Map[String, Array[String]] like the following:
(2 -> Array["b","c","g","j", "k", "n"])
(3 -> Array["e","h"])
(4 -> Array["f"])
(5 -> Array["m"])
You can do the following:
First - read the file to List[List[String]]:
val rows: List[List[String]] = using(io.Source.fromFile("test.csv")) { source =>
source.getLines.toList map { line =>
line.split(",").map(_.trim).toList
}
}
Then, because the input has only 2 values per row, I filter the rows (rows with only one value I want to ignore)
val filteredRows = rows.filter(row => row.size > 1)
And the last step is to groupBy the first value (which is the second column - the index column is not returned from Source.fromFile):
filteredRows.groupBy(row => row.head).mapValues(_.map(_.last)))
This isn't complete, but it should give you an outline of how it might be done.
io.Source
.fromFile("so.txt") //open file
.getLines() //line by line
.map(_.split(",")) //split on commas
.toArray //load into memory
.groupMap(_(1))(_(0)) //Scala 2.13
//res0: Map[String,Array[String]] = Map(4 -> Array(f), 5 -> Array(m), 1 -> Array(a, d, I, l), 2 -> Array(b, c, g, j, k, n), 3 -> Array(e, h))
You'll notice that the file resource isn't closed, and it doesn't handle malformed input. I leave that for the diligent reader.
For the above code mutable Map & ArrayBuffer should be used, as they could be mutated/updated later.
def parseCSV(): Map[String, Array[String]] = {
val mapping = scala.collection.mutable.Map[String, ArrayBuffer[String]]()
val lines = Source.fromFile("test.csv")
for (line <- lines.getLines) {
val linesplit = line.split(",")
val key = line.split(",")(1)
val values = line.replace(s",$key", "").split(",")
mapping.get(key) match {
case Some(_) => mapping(linesplit(1)) ++= values
case None =>
val ab = ArrayBuffer[String]()
mapping(linesplit(1)) = ab ++= values
}
}
mapping.map(v => (v._1, v._2.toArray)).toMap
}
We have a sequence of tuples Seq(department, title) depTitleSeq we would like to extract Set(department) and Set(title) looking for the best way to do so far we could come up with is
val depTitleSeq = getDepTitleTupleSeq()
var departmentSeq = ArrayBuffer[String]()
var titleSeq = ArrayBuffer[String]()
for (depTitle <- depTitleSeq) yield {
departmentSeq += depTitle._1
titleSeq += depTitle._2
}
val depSet = departmentSeq.toSet
val titleSet = titleSeq.toSet
Fairly new to scala, i'm sure there are better and more efficient ways to achieve this if you could please point us in the right direction it would of great help
If you have two Seqs of data that you want combined into a Seq of tuples, you can zip them together.
If you have a Seq of tuples and you want the elements separated, then you can unzip them.
val (departmentSeq, titleSeq) = getDepTitleTupleSeq().unzip
val depSet :Set[String] = departmentSeq.toSet
val titleSet :Set[String] = titleSeq.toSet
val depTitleSeq = Seq(("x","a"),("y","b"))
val depSet = depTitleSeq.map(_._1).toSet
val titleSet = depTitleSeq.map(_._2).toSet
In Scala REPL:
scala> val depTitleSeq = Seq(("x","a"),("y","b"))
depTitleSeq: Seq[(String, String)] = List((x,a), (y,b))
scala> val depSet = depTitleSeq.map(_._1).toSet
depSet: scala.collection.immutable.Set[String] = Set(x, y)
scala> val titleSet = depTitleSeq.map(_._2).toSet
titleSet: scala.collection.immutable.Set[String] = Set(a, b)
val result:(Set[String], Set[String]) = depTitleSeq.foldLeft((Set[String](), Set[String]())){(a, b) => (a._1 + b._1, a._2 + b._2) }
you can use foldLeft to achieve this.
first List
remoteDeviceAndPort===>List(
(1,891w.yourdomain.com,wlan-ap0),
(13,ap,GigabitEthernet0),
(11,Router-3900,GigabitEthernet0/0)
)
second List
interfacesList===>List(
(1,UP,,0,0,0,0,UP,4294,other,VoIP-Null0,0,0),
(13,DOWN,,0,0,0,0,UP,100,Ethernet,FastEthernet6,0,0),
(11,UP,,0,0,0,0,UP,100,vlan,Vlan11,4558687845,1249542878),
(2,UP,,0,0,972,1327,UP,0,Tunnel,Virtual-Access1,0,0),
(4,DOWN,,0,0,0,0,UP,100,Ethernet,FastEthernet2,0,0),
(6,DOWN,,0,0,0,0,UP,100,Ethernet,FastEthernet2,0,0)
)
The above are my two lists now i have to combine these two lists like below.
Expected OutPut =>
combineList = List(
(1,UP,,0,0,0,0,UP,4294,other,VoIP-Null0,0,0,891w.yourdomain.com,wlan-ap0),
(13,DOWN,,0,0,0,0,UP,100,Ethernet,FastEthernet6,0,0,ap,GigabitEthernet0),
(11,UP,,0,0,0,0,UP,100,vlan,Vlan11,4558687845,1249542878,Router-3900,GigabitEthernet0/0),
(2,UP,,0,0,972,1327,UP,0,Tunnel,Virtual-Access1,0,0,empty,empty),
(4,DOWN,,0,0,0,0,UP,100,Ethernet,FastEthernet2,0,0,empty,empty),
(6,DOWN,,0,0,0,0,UP,100,Ethernet,FastEthernet2,0,0,empty,empty)
)
The similar question here
case class NetworkDeviceInterfaces(index: Int, params: String*)
val remoteDeviceAndPort = List(
(1,"891w.yourdomain.com","wlan-ap0"),
(13,"ap","GigabitEthernet0"),
(11,"Router-3900","GigabitEthernet0/0")
)
val rdapMap = remoteDeviceAndPort map {case (k, v1, v2) => k -> (v1, v2) } toMap
val interfacesList = List(NetworkDeviceInterfaces(1,"UP","","0","0","0","0","UP","4294","other","VoIP-Null0","0","0"))
val result = interfacesList map {
interface => {
val (first, second) = rdapMap.getOrElse(interface.index, ("empty", "empty"))
NetworkDeviceInterfaces(interface.index, (interface.params ++ Seq(first, second)):_*)
}
}