Parse nginx rewrite rules to extract from and to - scala

If I have, say, a list of nginx-esque config rewrite statement as such:
val rewritesList : List[String] = List(
"rewrite (?i)^/first$ http://www.firstredirect.com redirect;",
"rewrite (?i)^/second$ http://www.seconredirect.com redirect;"
)
And I would like to extract from and to from that list. I am not worried about the final structure as long as I extract the info, but for sake of demonstration:
val rewritesMap : Map[String, String] = Map(
"first" -> "http://www.firstredirect.com",
"second" -> "http://www.seconredirect.com"
)

You can use regular expressions with Scala's pattern matching:
val rewritesList : List[String] = List(
"rewrite (?i)^/first$ http://www.firstredirect.com redirect;",
"rewrite (?i)^/second$ http://www.seconredirect.com redirect;"
)
val Regex = """^rewrite \(\?i\)\^/(\w+)\$ ([^ ]+) redirect;$""".r
val rewritesMap = (for {
Regex(from, to) <- rewritesList
} yield (from -> to)).toMap
println(rewritesMap)
You could also use the more explicit findFirstMatchIn to extract a single match:
val rewritesMap = (for {
str <- rewritesList
} yield {
val m = Regex.findFirstMatchIn(str).get
(m.group(1), m.group(2))
}).toMap
Both versions print (up to indentation):
Map(
first -> http://www.firstredirect.com,
second -> http://www.seconredirect.com
)
Note that the latter variant will throw a NoSuchElementException if the input data is not of the format defined by the regex. I don't know what you want to do if the data does not match the regex: you can raise exceptions, but you can also simply skip the cases that aren't parsed correctly.

Related

Scala- creating a map from string

Hi have a string and the if format of the string is mentioned below:
val str = "{a=10, b=20, c=30}"
All the parameters inside this string is unique and separated by comma and space. Also This string always starts with '{' and ends with '}'. I want to create a Map out of this string something like below:
val values = Map("a" -> 10, "b" -> 20, "c" -> 30)
What is the most efficient way I can achieve this?
scala> val str = "{a=10, b=20, c=30}"
str: String = {a=10, b=20, c=30}
scala> val P = """.*(\w+)=(\d+).*""".r
P: scala.util.matching.Regex = .*(\w+)=(\d+).*
scala> str.split(',').map{ case P(k, v) => (k, v.toInt) }.toMap
res2: scala.collection.immutable.Map[String,Int] = Map(a -> 10, b -> 20, c -> 30)
Use regex can simply achieve this:
"(\\w+)=(\\w+)".r.findAllIn("{a=10, b=20, c=30}").matchData.map(i => {
(i.group(1), i.group(2))
}).toMap
The function you want to write is pretty easy:
def convert(str : String) : Map[String, String] = {
str.drop(1).dropRight(1).split(", ").map(_.split("=")).map(arr => arr(0)->arr(1)).toMap
}
with drop and dropRight, you remove the brackets. Then you split the String with the expression ,, which results in multiple Strings.
Than you split each of this strings, which results in arrays with two elements. Those are used to create a map.
I would do it likes this (I think regex is not needed here):
val str = "{a=10, b=20, c=30}"
val values: Map[String, Int] = str.drop(1).dropRight(1) // drop braces
.split(",") // split into key-value pairs
.map { pair =>
val Array(k, v) = pair.split("=") // split key-value pair and parse to Int
(k.trim -> v.toInt)
}.toMap

Scala list not adding elements

I am doing a sample program: adding a list of file names from a list of files. But I am getting an empty list after adding.
My code is this:
val regex = """(.*\.pdf$)|(.*\.doc$)""".r
val leftPath = "/Users/ravi/Documents/aa"
val leftFiles = recursiveListFiles(new File(leftPath), regex)
var leftFileNames = List[String]()
leftFiles.foreach((f:File) => {/*println(f.getName);*/ f.getName :: leftFileNames})
leftFileNames.foreach(println)
def recursiveListFiles(f: File, r: Regex): Array[File] = {
val these = f.listFiles
val good = these.filter(f => r.findFirstIn(f.getName).isDefined)
good ++ these.filter(_.isDirectory).flatMap(recursiveListFiles(_, r))
}
The last statement is not showing anything in the console.
f.getName :: leftFileNames means add the f.getName to the beginning of leftFileNames and return a new List, so it will not add into the leftFileNames. so for your example, you need to assign the leftFileNames after every operation, like:
leftFiles.foreach((f:File) => leftFileNames = f.getName :: leftFileNames)
but it's better not use the mutable variable in Scala, it's will cause the side effect, you can use map with reverse for this, like:
val leftFileNames = leftFiles.map(_.getName).reverse

Does this specific exercise lend itself well to a 'functional style' design pattern?

Say we have an array of one dimensional javascript objects contained in a file Array.json for which the key schema isn't known, that is the keys aren't known until the file is read.
Then we wish to output a CSV file with a header or first entry which is a comma delimited set of keys from all of the objects.
Each next line of the file should contain the comma separated values which correspond to each key from the file.
Array.json
[
abc:123,
xy:"yz",
s12:13,
],
...
[
abc:1
s:133,
]
A valid output:
abc,xy,s12,s
123,yz,13,
1,,,133
I'm teaching myself 'functional style' programming but I'm thinking that this problem doesn't lend itself well to a functional solution.
I believe that this problem requires some state to be kept for the output header and that subsequently each line depends on that header.
I'm looking to solve the problem in a single pass. My goals are efficiency for a large data set, minimal traversals, and if possible, parallelizability. If this isn't possible then can you give a proof or reasoning to explain why?
EDIT: Is there a way to solve the problem like this functionally?:
Say you pass through the array once, in some particular order. Then
from the start the header set looks like abc,xy,s12 for the first
object. With CSV entry 123,yz,13 . Then on the next object we add an
additional key to the header set so abc,xy,s12,s would be the header
and the CSV entry would be 1,,,133 . In the end we wouldn't need to
pass through the data set a second time. We could just append extra
commas to the result set. This is one way we could approach a single
pass....
Are there functional tools ( functions ) designed to solve problems like this, and what should I be considering? [ By functional tools I mean Monads,FlatMap, Filters, etc. ] . Alternatively, should I be considering things like Futures ?
Currently I've been trying to approach this using Java8, but am open to solutions from Scala, etc. Ideally I would be able to determine if Java8s' functional approach can solve the problem since that's the language I'm currently working in.
Since the csv output will change with every new line of input, you must hold that in memory before writing it out. If you consider creating an output text format from an internal representation of a csv file another "pass" over the data (the internal representation of the csv is practically a Map[String,List[String]] which you must traverse to convert it to text) then it's not possible to do this in a single pass.
If, however, this is acceptable, then you can use a Stream to read a single item from your json file, merge that into the csv file, and do this until the stream is empty.
Assuming, that the internal representation of the csv file is
trait CsvFile {
def merge(line: Map[String, String]): CsvFile
}
And you can represent a single item as
trait Item {
def asMap: Map[String, String]
}
You can implement it using foldLeft:
def toCsv(items: Stream[Item]): CsvFile =
items.foldLeft(CsvFile(Map()))((csv, item) => csv.merge(item.asMap))
or use recursion to get the same result
#tailrec def toCsv(items: Stream[Item], prevCsv: CsvFile): CsvFile =
items match {
case Stream.Empty => prevCsv
case item #:: rest =>
val newCsv = prevCsv.merge(item.asMap)
toCsv(rest, newCsv)
}
Note: Of course you don't have to create types for CsvFile or Item, you can use Map[String,List[String]] and Map[String,String] respectively
UPDATE:
As more detail was requested for the CsvFile trait/class, here's an example implementation:
case class CsvFile(lines: Map[String, List[String]], rowCount: Int = 0) {
def merge(line: Map[String, String]): CsvFile = {
val orig = lines.withDefaultValue(List.fill(rowCount)(""))
val current = line.withDefaultValue("")
val newLines = (lines.keySet ++ line.keySet) map {
k => (k, orig(k) :+ current(k))
}
CsvFile(newLines.toMap, rowCount + 1)
}
}
This could be one approach:
val arr = Array(Map("abc" -> 123, "xy" -> "yz", "s12" -> 13), Map("abc" -> 1, "s" -> 133))
val keys = arr.flatMap(_.keys).distinct // get the distinct keys for header
arr.map(x => keys.map(y => x.getOrElse(y,""))) // get an array of rows
Its completely OK to have state in functional programming. But having mutable state or mutating state is not allowed in functional programming.
Functional programming advocates creating new changed state instead of mutating the state in place.
So, its Ok to read and access state created in the program until and unless you are mutating or side effecting.
Coming to the point.
val list = List(List("abc" -> "123", "xy" -> "yz"), List("abc" -> "1"))
list.map { inner => inner.map { case (k, v) => k}}.flatten
list.map { inner => inner.map { case (k, v) => v}}.flatten
REPL
scala> val list = List(List("abc" -> "123", "xy" -> "yz"), List("abc" -> "1"))
list: List[List[(String, String)]] = List(List((abc,123), (xy,yz)), List((abc,1)))
scala> list.map { inner => inner.map { case (k, v) => k}}.flatten
res1: List[String] = List(abc, xy, abc)
scala> list.map { inner => inner.map { case (k, v) => v}}.flatten
res2: List[String] = List(123, yz, 1)
or use flatMap instead of map and flatten
val list = List(List("abc" -> "123", "xy" -> "yz"), List("abc" -> "1"))
list.flatMap { inner => inner.map { case (k, v) => k}}
list.flatMap { inner => inner.map { case (k, v) => v}}
In functional programming, mutable state is not allowed. But immutable states/values are fine.
Assuming that you have read your json file in to a value input:List[Map[String,String]], the codes below will solve your problem:
val input = List(Map("abc"->"123", "xy"->"yz" , "s12"->"13"), Map("abc"->"1", "s"->"33"))
val keys = input.map(_.keys).flatten.toSet
val keyvalues = input.map(kvs => keys.map(k => (k->kvs.getOrElse(k,""))).toMap)
val values = keyvalues.map(_.values)
val result = keys.mkString(",") + "\n" + values.map(_.mkString(",")).mkString("\n")

Filtering maps in iterator

I have following code:
val rows: Iterator[Map[String,String]] = CSVDictReader(file.getInputStream)
val parsedProducts = rows.map(x => Product(name = x.get("NAME"), id = x.get("ID")))
And I would like to replace map entries with empty string. With a map alone I could use:
filter(_._2.trim.nonEmpty)
I cannot get my head around how to do this in a nice way without introducing some helper function to return None in case value is empty string.
Edit: In my example I have only name and id but in the real code there are easily over ten columns of data. Also, I would need to have None instead of empty string value. So name=Option("") should be replaced with name=None
You can filter Options as well.
Let's say your x.get("NAME") returns a Some("") or even Some(" ").
Then you may do something like this: x.get("NAME").filter(_.trim.nonEmpty)
Hope I understood your question correctly
something like this?
val rows: Iterator[Map[String,String]] = CSVDictReader(file.getInputStream)
val parsedProducts = for {
row <- rows
name <- row.get("NAME")
id <- row.get("ID")
} yield Product(name, id)
Here, if row.get("NAME") or row.get("ID") return None, the corresponding entry will not be yielded.
I'm not sure if this is what you're looking for, but the following code snippet:
val rows: Iterator[Map[String,String]] = Iterator(Map("NAME" -> " ", "ID" -> "foo"), Map("NAME" -> " ", "ID" -> ""))
val fieldNames = List("NAME","ID","ANOTHER COLUMN")
val cleanedRows = rows map { row =>
fieldNames map { fieldName =>
Map ( fieldName -> row.get(fieldName).filter (_.trim.nonEmpty) )
}
}
while(cleanedRows.hasNext) {
println(cleanedRows.next)
}
Would print out:
List(Map(NAME -> None), Map(ID -> Some(foo)), Map(ANOTHER COLUMN -> None))
List(Map(NAME -> None), Map(ID -> None), Map(ANOTHER COLUMN -> None))
So at this point cleanedRows would have the entries you need to create your Product instances.

find by regular expression with Casbah

how to use regular expressions at Collection#find(/* HERE */) like:
val coll = MongoConnection()("foo")("bar")
for(x <- coll.find("name" -> ".*son$".r)) {
// some operations...
}
You are close, you just need to wrap your conditions in a MongoDBObject().
We had to pull out the implicit conversions of <key> -> <value> in a bunch of places because they were hard to catch properly and were breaking other code.
They'll probably be back in 2.1.
Do this instead:
val coll = MongoConnection()("foo")("bar")
for(x <- coll.find(MongoDBObject("name" -> ".*son$".r))) {
// some operations...
}
For adding IGNORECASE above answer will not work by appending "/i" at the end of regex in Scala, Casbah.
For this purpose use:
val EmailPattern = Pattern.compile(companyName,Pattern.CASE_INSENSITIVE)
val q = MongoDBObject("companyName" -> EmailPattern)
val result = MongoFactory.COLLECTION_NAME.findOne(q)