Is it possible to refer to different classes on each pass of an iteration?
I have a substantial number of Hadoop Hive tables, and will be processing them with Spark. Each of the tables has an auto-generated class, and I would like to loop through the tables, instead of the tedious, non-code reuse copy/paste/handCodeIndividualTableClassNames technique resorted to first.
import myJavaProject.myTable0Class
import myJavaProject.myTable1Class
object rawMaxValueSniffer extends Logging {
/* tedious sequential: it works, and sometimes a programmer's gotta do... */
def tedious(args: Array[String]): Unit = {
val tablePaths = List("path0_string_here","path1_string")
var maxIds = ArrayBuffer[Long]()
FileInputFormat.setInputPaths(conf, tablePaths(0))
AvroReadSupport.setAvroReadSchema(conf.getConfiguration, myTable0Class.getClassSchema)
ParquetInputFormat.setReadSupportClass(conf, classOf[AvroReadSupport[myTable0Class]])
val records = sc.newAPIHadoopRDD(conf.getConfiguration,
classOf[ParquetInputFormat[myTable0Class]],
classOf[Void],
classOf[myTable0Class]).map(x => x._2)
maxIds += records.map(_.getId).collect().max
FileInputFormat.setInputPaths(conf, tablePaths(1))
AvroReadSupport.setAvroReadSchema(conf.getConfiguration, myTable1Class.getClassSchema)
ParquetInputFormat.setReadSupportClass(conf, classOf[AvroReadSupport[myTable1Class]])
val records = sc.newAPIHadoopRDD(conf.getConfiguration,
classOf[ParquetInputFormat[myTable1Class]],
classOf[Void],
classOf[myTable1Class]).map(x => x._2)
maxIds += records.map(_.getId).collect().max
}
/* class as variable, used in a loop. I have seen the mountain... */
def hopedFor(args: Array[String]): Unit = {
val tablePaths = List("path0_string_here","path1_string")
var maxIds = ArrayBuffer[Long]()
val tableClasses = List(classOf[myTable0Class],classOf[myTable1Class]) /* error free, but does not get me where I'm trying to go */
var counter=0
tableClasses.foreach { tc =>
FileInputFormat.setInputPaths(conf, tablePaths(counter))
AvroReadSupport.setAvroReadSchema(conf.getConfiguration, tc.getClassSchema)
ParquetInputFormat.setReadSupportClass(conf, classOf[AvroReadSupport[tc]])
val records = sc.newAPIHadoopRDD(conf.getConfiguration,
classOf[ParquetInputFormat[tc]],
classOf[Void],
classOf[tc]).map(x => x._2)
maxIds += records.map(_.getId).collect().max /* all the myTableXXX classes have getId() */
counter += 1
}
}
}
/* the classes being referenced... */
#org.apache.avro.specific.AvroGenerated
public class myTable0Class extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"rsivr_surveyquestiontypes\",\"namespace\":\"myJavaProject\",\"fields\":[{\"name\":\"id\",\"type\":\"in t\"},{\"name\":\"description\",\"type\":\"st,ing\"},{\"name\":\"scale_range\",\"type\":\"int\"}]}");
public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
#Deprecated public int id;
yada.yada.yada0
}
#org.apache.avro.specific.AvroGenerated
public class myTable1Class extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"rsivr_surveyresultdetails\",\"namespace\":\"myJavaProject\",\"fields\":[{\"name\":\"id\",\"type\":\"in t\"},{\"name\":\"survey_dts\",\"type\":\"string\"},{\"name\":\"survey_id\",\"type\":\"int\"},{\"name\":\"question\",\"type\":\"int\"},{\"name\":\"caller_id\",\"type\":\"string\"},{\"name\":\"rec_msg\",\"type\":\"string\"},{\"name\ ":\"note\",\"type\":\"string\"},{\"name\":\"lang\",\"type\":\"string\"},{\"name\":\"result\",\"type\":\"string\"}]}");
public static org.apache.avro.Schema getClassSchema() { return SCHEMA$; }
#Deprecated public int id;
yada.yada.yada1
}
Something like this, perhaps:
def doStuff[T <: SpecificRecordBase : ClassTag](index: Int, schema: => Schema, clazz: Class[T]) = {
FileInputFormat.setInputPaths(conf, tablePaths(index))
AvroReadSupport.setAvroReadSchema(conf.getConfiguration, schema)
ParquetInputFormat.setReadSupportClass(conf, classOf[AvroReadSupport[T]])
val records = sc.newAPIHadoopRDD(conf.getConfiguration,
classOf[ParquetInputFormat[T]],
classOf[Void],
clazz).map(x => x._2)
maxIds += records.map(_.getId).collect().max
}
Seq(
(classOf[myTable0Class], myTable0Class.getClassSchema _),
(classOf[myTable1Class], myTable1Class.getClassSchema _)
).zipWithIndex
.foreach { case ((clazz, schema), index) => doStuff(index, schema, clazz) }
You could use reflection to invoke getClassSchema instead (clazz.getMethod("getClassSchema").invoke(null).asInstanceOf[Schema]), then you would not need to pass it in as a aprameter, just clazz would be enough, but that's kinda cheating ... I like this approach a bit better.
Related
I have a trait that overrides toString to print the values of all fields:
/**
* Interface for classes that provide application configuration.
*/
trait Configuration {
/** abstract fields defined here. e.g., **/
def dbUrl: String
/**
* Returns a list of fields to be excluded by [[toString]]
*/
protected def toStringExclude: Seq[String]
/**
* Returns a String representation of this object that prints the values for all configuration fields.
*/
override def toString: String = {
val builder = new StringBuilder
val fields = this.getClass.getDeclaredFields
for (f <- fields) {
if (!toStringExclude.contains(f.getName)) {
f.setAccessible(true)
builder.append(s"${f.getName}: ${f.get(this)}\n")
}
}
builder.toString.stripSuffix("\n")
}
}
A concrete class currently looks like this:
class BasicConfiguration extends Configuration {
private val config = ConfigFactory.load
override val dbUrl: String = config.getString("app.dbUrl")
/**
* #inheritdoc
*/
override protected def toStringExclude: Seq[String] = Seq("config")
}
The problem is, if config were renamed at some point, the IDE would miss "config" in toStringExclude as it's just a string. So I'm trying to find a way to get the name of a field as a string, like getFieldName(config).
Using https://github.com/dwickern/scala-nameof,
import com.github.dwickern.macros.NameOf._
class BasicConfiguration extends Configuration {
private val config = ConfigFactory.load
override val dbUrl: String = config.getString("app.dbUrl")
/**
* #inheritdoc
*/
override protected def toStringExclude: Seq[String] = Seq(nameOf(config))
}
I don't like this and I wouldn't do this but here's this:
class BasicConfiguration extends Configuration {
private val config = ConfigFactory.load
override val dbUrl: String = config.getString("app.dbUrl")
private val excludeFields: Set[Any] = Set(config)
override protected val toStringExclude: Seq[String] = {
this.getClass
.getDeclaredFields
.filter(field => Try(field.get(this)).fold(_ => false, a => excludeFields.contains(a)))
.map(_.getName)
.toList
}
}
I am writing a module for the Play Framework. In part of my module I have the following code
abstract class SecurityFiltering extends GlobalSettings{
override def onRequestReceived(request: RequestHeader) = {
play.Logger.debug("onRequestReceived: " + request)
super.onRequestReceived(request)
}
override def doFilter(next: RequestHeader => Handler): (RequestHeader => Handler) = {
request => {
play.Logger.debug("doFilter: " + request)
super.doFilter(next)(request)
}
}
override def onRouteRequest(request: RequestHeader): Option[Handler] = {
play.Logger.debug("onRouteRequest: " + request)
super.onRouteRequest(request)
}
}
In the doFilter method I am able to determine the following useful information
ROUTE_PATTERN = /x/$name<[^/]+>/$age<[^/]+>
ROUTE_CONTROLLER = controllers.Application
ROUTE_ACTION_METHOD = tester
ROUTE_VERB = GET
path = /x/hello
What I need in addition to this is the values for the named parts of the URL before the QueryString. So given the following route in my test application I need to retrieve Name=Pete and Age=41
localhost:9000/x/Pete/41
There is surely some code in the Play Framework which already does this but I am unable to find it. Can someone suggest how I achieve this goal, or point me at which Play class extracts these values?
package models.com.encentral.tattara.web.util;
import java.util.HashMap;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RouteExtractor {
//something like "/foo/$id<[^/]+>/edit/$james<[^/]+>"
private String routePattern;
private String path;
//something like /$id<[^/]+>
private static final String INDEX_PATTERN = "\\$(.+?)\\<\\[\\^\\/\\]\\+\\>";
public RouteExtractor(String routePattern, String path) {
this.routePattern = routePattern;
this.path = path;
}
private Map<Integer, String> extractPositions() {
Pattern pattern = Pattern.compile(INDEX_PATTERN);
Matcher matcher = pattern.matcher(this.routePattern);
Map<Integer, String> results = new HashMap<>();
int index = 0;
while (matcher.find()) {
results.put(index++, matcher.group(1));
}
return results;
}
private String replaceRoutePatternWithGroup() {
Pattern pattern = Pattern.compile(INDEX_PATTERN);
Matcher matcher = pattern.matcher(this.routePattern);
return matcher.replaceAll("([^/]+)");
}
public Map<String, String> extract() {
Pattern pattern = Pattern.compile(this.replaceRoutePatternWithGroup());
Matcher matcher = pattern.matcher(this.path);
final Map<String, String> results = new HashMap<>();
if (matcher.find()) {
this.extractPositions().entrySet().stream().forEach(s -> {
results.put(s.getValue(), matcher.group(s.getKey() + 1));
});
}
return results;
}
}
As per this GitHub issue response via JRoper
onRequestReceived is the thing that does the routing and tags the request, so of course it's not going to have any of the routing information when it's first invoked, only after it's invoked.
val p = """\$([^<]+)<([^>]+)>""".r
override def onRequestReceived(request: RequestHeader) = {
val (taggedRequest, handler) = super.onRequestReceived(request)
val pattern = taggedRequest.tags("ROUTE_PATTERN")
val paramNames = p.findAllMatchIn(pattern).map(m => m.group(1)).toList
val pathRegex = ("^" + p.replaceAllIn(pattern, m => "(" + m.group(2) + ")") + "$").r
val paramValues = pathRegex.findFirstMatchIn(request.path).get.subgroups
val params: Map[String, String] = paramNames.zip(paramValues).toMap
// ^ your params map, will be Map("name" -> "Pete", "age" -> "41")
(taggedRequest, handler)
}
That said, there are usually better, more typesafe ways to achieve whatever you're trying to achieve. If you depend on there being specific parameters in the URL, then a filter is not the right thing, because filters apply to all requests, whether they have those parameters or not. Rather, you should probably be using action composition or a custom action builder, like so:
case class MyAction(name: String, age: Int) extends ActionBuilder[Request] {
def invokeBlock[A](request: Request[A], block: (Request[A]) => Future[Result]) = {
// Do your filtering here, you have access to both the name and age above
block(request)
}
}
def foo(name: String, age: Int) = MyAction(name, age) { request =>
Ok("Hello world")
}
def bar(name: String, age: Int) = MyAction(name, age).async { request =>
Future.successful(Ok("Hello world"))
}
I would like to instance a model object of the Ebean project with scala and the fremework Play 2.2. I face to an issue with the ID autogenerate and the class parameteres / abstraction :
#Entity
class Task(#Required val label:String) extends Model{
#Id
val id: Long
}
object Task {
var find: Model.Finder[Long, Task] = new Model.Finder[Long, Task](classOf[Long], classOf[Task])
def all(): List[Task] = find.all.asScala.toList
def create(label: String) {
val task = new Task(label)
task.save
}
def delete(id: Long) {
find.ref(id).delete
}
}
The error : "class Task needs to be abstract, since value id is not defined". Any idea to avoid this problem?
I found the solution thanks this link : http://www.avaje.org/topic-137.html
import javax.persistence._
import play.db.ebean._
import play.data.validation.Constraints._
import scala.collection.JavaConverters._
#Entity
#Table( name="Task" )
class Task{
#Id
var id:Int = 0
#Column(name="title")
var label:String = null
}
/**
* Task Data Access Object.
*/
object Task extends Dao(classOf[Task]){
def all(): List[Task] = Task.find.findList().asScala.toList
def create(label: String) {
var task = new Task
task.label = label
Task.save(task)
}
def delete(id: Long) {
Task.delete(id)
}
}
And the DAO :
/**
* Dao for a given Entity bean type.
*/
abstract class Dao[T](cls:Class[T]) {
/**
* Find by Id.
*/
def find(id:Any):T = {
return Ebean.find(cls, id)
}
/**
* Find with expressions and joins etc.
*/
def find():com.avaje.ebean.Query[T] = {
return Ebean.find(cls)
}
/**
* Return a reference.
*/
def ref(id:Any):T = {
return Ebean.getReference(cls, id)
}
/**
* Save (insert or update).
*/
def save(o:Any):Unit = {
Ebean.save(o);
}
/**
* Delete.
*/
def delete(o:Any):Unit = {
Ebean.delete(o);
}
I have some working jackson scala module code for roundtripping scala case classes. Jackson worked great for flat case classes but when I made one which contains a list of other case classes the amount of code I seemed to need was a lot. Consider:
abstract class Message
case class CardDrawn(player: Long, card: Int, mType: String = "CardDrawn") extends Message
case class CardSet(cards: List[CardDrawn], mType: String = "CardSet") extends Message
To get the CardSet to roundtrip to/from json with jackson scala module I used a custom serializer/deserializer written in java:
object ScrumGameMashaller {
val mapper = new ObjectMapper()
val module = new SimpleModule("CustomSerializer")
module.addSerializer(classOf[CardSet], new CardSetSerializer)
module.addDeserializer(classOf[CardSet], new CardSetDeserializer)
val scalaModule = DefaultScalaModule
mapper.registerModule(scalaModule)
mapper.registerModule(module)
def jsonFrom(value: Any): String = {
import java.io.StringWriter
val writer = new StringWriter()
mapper.writeValue(writer, value)
writer.toString
}
private[this] def objectFrom[T: Manifest](value: String): T =
mapper.readValue(value, typeReference[T])
private[this] def typeReference[T: Manifest] = new TypeReference[T] {
override def getType = typeFromManifest(manifest[T])
}
private[this] def typeFromManifest(m: Manifest[_]): Type = {
if (m.typeArguments.isEmpty) { m.runtimeClass }
else new ParameterizedType {
def getRawType = m.runtimeClass
def getActualTypeArguments = m.typeArguments.map(typeFromManifest).toArray
def getOwnerType = null
}
}
with serializer:
public class CardSetSerializer extends JsonSerializer<CardSet> {
#Override
public void serialize(CardSet cardSet, JsonGenerator jgen, SerializerProvider provider) throws IOException, JsonProcessingException {
jgen.writeStartObject();
jgen.writeArrayFieldStart("cards");
List<CardDrawn> cardsDrawn = cardSet.cards();
scala.collection.Iterator<CardDrawn> iter = cardsDrawn.iterator();
while(iter.hasNext()){
CardDrawn cd = iter.next();
cdSerialize(jgen,cd);
}
jgen.writeEndArray();
jgen.writeStringField("mType", "CardSet");
jgen.writeEndObject();
}
private void cdSerialize(JsonGenerator jgen, CardDrawn cd) throws IOException, JsonProcessingException {
jgen.writeStartObject();
jgen.writeNumberField("player", cd.player());
jgen.writeNumberField("card", cd.card());
jgen.writeEndObject();
}
}
and matching deserializer:
public class CardSetDeserializer extends JsonDeserializer<CardSet> {
private static class CardDrawnTuple {
Long player;
Integer card;
}
#Override
public CardSet deserialize(JsonParser jsonParser, DeserializationContext cxt) throws IOException, JsonProcessingException {
ObjectCodec oc = jsonParser.getCodec();
JsonNode root = oc.readTree(jsonParser);
JsonNode cards = root.get("cards");
Iterator<JsonNode> i = cards.elements();
List<CardDrawn> cardObjects = new ArrayList<>();
while( i.hasNext() ){
CardDrawnTuple t = new CardDrawnTuple();
ObjectNode c = (ObjectNode) i.next();
Iterator<Entry<String, JsonNode>> fields = c.fields();
while( fields.hasNext() ){
Entry<String,JsonNode> f = fields.next();
if( f.getKey().equals("player")) {
t.player = f.getValue().asLong();
} else if( f.getKey().equals("card")){
t.card = f.getValue().asInt();
} else {
System.err.println(CardSetDeserializer.class.getCanonicalName()+ " : unknown field " + f.getKey());
}
}
CardDrawn cd = new CardDrawn(t.player, t.card, "CardDrawn");
cardObjects.add(cd);
}
return new CardSet(JavaConversions.asScalaBuffer(cardObjects).toList(), "CardSet");
}
}
This seems like a lot code to deal with something fairly vanilla in the scala. Can this code be improved (what did I miss that jackson has to make this easy)? Else is there a library which will do structured case classes automatically? The jerkson examples looked easy but that seems to have been abandoned.
Argonaut does a great job. Mark Hibbard helped me out with getting the example below working. All that is needed is to create a codec for the types and it will implicitly add an asJson to your objects to turn them into strings. It will also add a decodeOption[YourClass] to strings to extract an object. The following:
package argonaut.example
import argonaut._, Argonaut._
abstract class Message
case class CardDrawn(player: Long, card: Int, mType: String = "CardDrawn") extends Message
case class CardSet(cards: List[CardDrawn], mType: String = "CardSet") extends Message
object CardSetExample {
implicit lazy val CodecCardSet: CodecJson[CardSet] = casecodec2(CardSet.apply, CardSet.unapply)("cards","mType")
implicit lazy val CodecCardDrawn: CodecJson[CardDrawn] = casecodec3(CardDrawn.apply, CardDrawn.unapply)("player", "card", "mType")
def main(args: Array[String]): Unit = {
val value = CardSet(List(CardDrawn(1L,2),CardDrawn(3L,4)))
println(s"Got some good json ${value.asJson}")
val jstring =
"""{
| "cards":[
| {"player":"1","card":2,"mType":"CardDrawn"},
| {"player":"3","card":4,"mType":"CardDrawn"}
| ],
| "mType":"CardSet"
| }""".stripMargin
val parsed: Option[CardSet] =
jstring.decodeOption[CardSet]
println(s"Got a good object ${parsed.get}")
}
}
outputs:
Got some good json {"cards":[{"player":"1","card":2,"mType":"CardDrawn"},{"player":"3","card":4,"mType":"CardDrawn"}],"mType":"CardSet"}
Got a good object CardSet(List(CardDrawn(1,2,CardDrawn), CardDrawn(3,4,CardDrawn)),CardSet)
The question is old but maybe someone could still find it helpful. Apart from Argonaut, Scala has several Json libraries. Here you can find a list of them updated to the beginning of 2016 (and it still gives you a good overall picture).
Most of them (probably all) should allow you to come up with a drier version of your custom serializer/deserailizer. My preference goes to json4s which aims to provide a single AST across multiple libraries including Jackson (a bit like slf4j does for logging libraries). In this post you can find a working example of a Json custom serializer/deserializer using Json4s and Akka Http.
I have a number of classes that look like this:
class Foo(val:BasicData) extends Bar(val) {
val helper = new Helper(val)
val derived1 = helper.getDerived1Value()
val derived2 = helper.getDerived2Value()
}
...except that I don't want to hold onto an instance of "helper" beyond the end of the constructor. In Java, I'd do something like this:
public class Foo {
final Derived derived1, derived2;
public Foo(BasicData val) {
super(val);
Helper helper = new Helper(val);
derived1 = helper.getDerived1Value();
derived2 = helper.getDerived2Value();
}
}
So how do I do something like that in Scala? I'm aware of creating a helper object of the same name of the class with an apply method: I was hoping for something slightly more succinct.
You could use a block to create a temporary helper val and return a tuple, like this:
class Foo(v: BasicData) extends Bar(v) {
val (derived1, derived2) = {
val helper = new Helper(v)
(helper.getDerived1Value(), helper.getDerived2Value())
}
}
Better look at the javap output (including private members) before you conclude this has side-stepped any fields for the Tuple2 used in the intermediate pattern-matching.
As of Scala 2.8.0.RC2, this Scala code (fleshed out to compile):
class BasicData
{
def basic1: Int = 23
def basic2: String = "boo!"
}
class Helper(v: BasicData)
{
def derived1: Int = v.basic1 + 19
def derived2: String = v.basic2 * 2
}
class Bar(val v: BasicData)
class Foo(v: BasicData)
extends Bar(v)
{
val (derived1, derived2) = {
val helper = new Helper(v)
(helper.derived1, helper.derived2)
}
}
Produces this Foo class:
% javap -private Foo
public class Foo extends Bar implements scala.ScalaObject{
private final scala.Tuple2 x$1;
private final int derived1;
private final java.lang.String derived2;
public int derived1();
public java.lang.String derived2();
public Foo(BasicData);
}