HOCON does not override value in case substitution is used - scala

I am using a HOCON configuration file which also has substitution variables. But in the case of substitution variables, the key is not overridden by another value into the same file.
For example, consider the following HOCON config:
{
"x":5
"x":6
"y":{"a":1}
"y":{"a":11}
"z":${y.a}
"z":${y.a}
}
Now when I load this from ConfigFactor.parseURL, the resulted config is:
{"x":6,"y":{"a":11},"z":${y.a},"z":${y.a}}
Here y has to be resolved, but this does not happen with z.
Questions:
What is the reason for this output?
How could be enabled to resolve "z" as well?

You're just parsing the config file without resolving it. You have to call resolve() method.
Check following example
val options: ConfigRenderOptions = ConfigRenderOptions
.defaults()
.setComments(false)
.setOriginComments(false)
.setFormatted(false)
.setJson(true)
val parsed = ConfigFactory.parseString("""
|{
| "x":5
| "x":6
| "y":{"a":1}
| "y":{"a":11}
| "z":${y.a}
| "z":${y.a}
|}
|""".stripMargin)
println(parsed.root().render(options))
println(parsed.resolve().root().render(options))
Prints
{"x":6,"y":{"a":11},"z":${y.a},"z":${y.a}}
{"x":6,"y":{"a":11},"z":11}
Please note that parse/resolve methods are used for advanced/customised configuration loading.
If you are just loading application.conf and reference.conf files, I suggest to stick to load* methods only. Or use ConfigFactory.load(ConfigFactory.parse...) way of resolving the parsed config.

Related

Scala Option Types not recognized in apache flink table api

I am working on building a flink application which reads data from kafka
topics, apply some transformations and writes to the Iceberg table.
I read the data from kafka topic (which is in json) and use circe to decode
that to scala case class with scala Option values in it.
All the transformations on the datastream works fine.
Case Class Looks like below
Event(app_name: Option[String], service_name: Option[String], ......)
But when I try to convert the stream to a table to write to iceberg table
due to the case classes the columns are converted to Raw type as shown
below.
table.printSchema()
service_name RAW('scala.Option', '...'),
conversion_id RAW('scala.Option', '...'),
......
And the table write fails as below.
Query schema: [app_name: RAW('scala.Option', '...'), .........
Sink schema: [app_name: STRING, .......
Does flink table api support scala case classes with option values?
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/serialization/types_serialization/#special-types
I found out that it is supported in datastream at this documentation.
Is there a way to do this in Table API.
Thanks in advance for the help..
I had the exact same issue of Option being parsed as RAW and found yet another workaround that might interest you:
TL;DR:
instead of using .get which returns an Option and diligently declaring the return type to be Types.OPTION(Types.INSTANT) which doesn't work, I instead use .getOrElse("key", null) and declare the type as conventional. Table API then recognizes the column type, creates a nullable column and interprets the null correctly. I can then filter those rows with IS NOT NULL.
Detailed example:
For me it starts with a custom map function in which I unpack data where certain fields could be missing:
class CustomExtractor extends RichMapFunction[MyInitialRecordClass, Row] {
def map(in: MyInitialRecordClass): Row = {
Row.ofKind(
RowKind.INSERT,
in._id
in.name
in.times.getOrElse("time", null) // This here did the trick instead of using .get and returning an option
)
}
}
And then I use it like this explicitly declaring a return type.
val stream: DataStream[Row] = data
.map(new CustomExtractor())
.returns(
Types.ROW_NAMED(
Array("id", "name", "time"),
Types.STRING,
Types.STRING,
Types.INSTANT
)
val table = tableEnv.fromDataStream(stream)
table.printSchema()
// (
// `id` STRING,
// `name` STRING,
// `time` TIMESTAMP_LTZ(9),
// )
tableEnv.createTemporaryView("MyTable", table)
tableEnv
.executeSql("""
|SELECT
| id, name, time
|FROM MyTable""".stripMargin)
.print()
// +----+------------------+-------------------+---------------------+
// | op | id | name | time |
// +----+------------------+-------------------+---------------------+
// | +I | foo | bar | <Null> |
// | +I | spam | ham | 2022-10-12 11:32:06 |
This was at least for me exactly what I wanted. I'm very much new to Flink and would be curious if the Pros here think this is workable or horribly hacky instead.
Using scala 2.12.15 and flink 1.15.2
The type system of the Table API is more restrictive than the one of the DataStream API. Unsupported classes are immediately treated as black-boxed type RAW. This allows objects to still pass the API but it might not be supported by every connector.
From the exception, it looks like you declared the sink table with app_name: STRING, so I guess you are fine with a string representation. If this is the case, I would recommend to implement a user-defined function that performs the conversion to string.

Pureconfig - is it possible to include in conf file another conf file?

Is it possible to include in *conf file another conf file?
Current implementation:
// db-writer.conf
writer: {
name="DatabaseWriter",
model="model1",
table-name="model1",
append=false,
create-table-file="sql/create_table_model1.sql",
source-file="abcd.csv"
}
Desired solution:
// model1.conf + others model2.conf, model3.conf..
table: {
name="model1",
table-name="model1",
create-table-file="../sql/create_table_model1.sql"
}
//db-writer.conf
import model1.conf <=== some import?
writer: {
name="DatabaseWriter",
model="model1", <=== some reference like this?
append=false,
source-file="abcd.csv"
}
Reason why I would like to have it like this is :
to reduce duplicated definitions
to pre-define user conf file which are rare modified
I guess it is not possible - if not do you have any suggestion how to separate configs & reuse them?
I'm using scala 2.12 lang and pureconfig 0.14 (can be updated to any newer)
Pureconfig uses HOCON (though some of the interpretation of things like durations differ). HOCON include is supported.
So assuming that you have model1.conf in your resources (e.g. src/main/resources), all you need in db-writer.conf is
include "model1"
HOCON-style overrides and concatenation are also supported:
writer: ${table} {
name = "DatabaseWriter"
model = "model1"
append = false
source-file = "abcd"
}

scala- error loading a properties file packed inside jar

This is a newbie question, I read a lot but I am a bit confused.
I pass a properties file from inside a Jar, configuration is read, all is fine.
I wanted to add a try catch.I tried this but it does not work because the loading does not produce an exception if the properties file is not present. Therefore 3 questions:
Is it correct to load files like this?
Does it make sense to put a try/catch since the config is inside the jar?
If so, any suggestions on how to?
var appProps : Config = ConfigFactory.load()
try {
appProps = ConfigFactory.load("application.properties")
} catch {
case e: Exception => {
log.error("application.properties file not found")
sc.stop()
System.exit(1)
}
}
Your code looks ok in general.
ConfigFactory.load("resource.config") will handle a missing resource like an empty resource. As a result you get an empty Config. So, a try-catch-block does not really make sense.
You would usually specify a fallback configuration like this:
val appProps = ConfigFactory.load(
"application.properties"
).withFallBack(
ConfigFactory.load()
)
EDIT:
The sentence
As a result you get an empty Config
Is somewhat incomplete. ConfigFactory.load(resourceBaseName: String) applies defaultReference() and defaultOverrides(). So your resulting Config object probably contains some generic environment data and is not empty.
As far as I can see, your best option to check if the resource is there and emit an error message if not, is to look up the resource yourself:
val configFile = "application.properties"
val config = Option(getClass.getClassLoader.getResource(configFile)).fold {
println(s"Config not found!")
ConfigFactory.load()
} { resource =>
ConfigFactory.load(configFile)
}

How to use system properties to substitute placeholders in Typesafe Config file?

I need to refer to java.io.tmpdir in my application.conf file
I printed content of my config with
val c = ConfigFactory.load()
System.err.println(c.root().render())
and it renders it like
# dev/application.conf: 1
"myapp" : {
# dev/application.conf: 47
"db" : {
# dev/application.conf: 49
"driver" : "org.h2.Driver",
# dev/application.conf: 48
"url" : "jdbc:h2:file:${java.io.tmpdir}/db;DB_CLOSE_DELAY=-1"
}
...
}
# system properties
"java" : {
# system properties
"io" : {
# system properties
"tmpdir" : "/tmp"
},
....
So I guess that forward-reference does not work. Is there any way to get my options loaded after system properties, so config parser will correctly substitute values?
Forward references work fine; I believe the issue is just that you have the ${} syntax inside of quotes, so it doesn't have special meaning. Try it like this:
url = "jdbc:h2:file:"${java.io.tmpdir}"/db;DB_CLOSE_DELAY=-1"
(note that the ${} stuff is not quoted)
In the HOCON format, anything that's valid JSON will be interpreted as it would be in JSON, so quoted strings for example don't have special syntax inside them, other than the escape sequences JSON supports.

I don't understand what a YAML tag is

I get it on some level, but I have yet to see an example that didn't bring up more questions than answers.
http://rhnh.net/2011/01/31/yaml-tutorial
# Set.new([1,2]).to_yaml
--- !ruby/object:Set
hash:
1: true
2: true
I get that we're declaring a Set tag. I don't get what the subsequent hash mapping has to do with it. Are we declaring a schema? Can someone show me an example with multiple tag declarations?
I've read through the spec: http://yaml.org/spec/1.2/spec.html#id2761292
%TAG ! tag:clarkevans.com,2002:
Is this declaring a schema? Is there something else a parser has to do in order to successfully parse the file? A schema file of some type?
http://www.yaml.org/refcard.html
Tag property: # Usually unspecified.
none : Unspecified tag (automatically resolved by application).
'!' : Non-specific tag (by default, "!!map"/"!!seq"/"!!str").
'!foo' : Primary (by convention, means a local "!foo" tag).
'!!foo' : Secondary (by convention, means "tag:yaml.org,2002:foo").
'!h!foo': Requires "%TAG !h! <prefix>" (and then means "<prefix>foo").
'!<foo>': Verbatim tag (always means "foo").
Why is it relevant to have a primary and secondary tag, and why does a secondary tag refer to a URI? What problem is being solved by having these?
I seem to see a lot of "what they are", and no "why are they there", or "what are they used for".
I don't know a lot about YAML but I'll give it a shot:
Tags are used to denote types. A tag is declared using ! as you have seen from the "refcard" there. The %TAG directive is kind of like declaring a shortcut to a tag.
I'll demonstrate with PyYaml. PyYaml can parse the secondary tag of !!python/object: as an actual python object. The double exclamation mark is a substitution in itself, short for !tag:yaml.org,2002:, which turns the whole expression into !tag:yaml.org,2002:python/object:. This expression is a little unwieldy to be typing out every time we want to create an object, so we give it an alias using the %TAG directive:
%TAG !py! tag:yaml.org,2002:python/object: # declares the tag alias
---
- !py!__main__.MyClass # creates an instance of MyClass
- !!python/object:__main__.MyClass # equivalent with no alias
- !<tag:yaml.org,2002:python/object:__main__.MyClass> # equivalent using primary tag
Nodes are parsed by their default type if you have no tag annotations. The following are equivalent:
- 1: Alex
- !!int "1": !!str "Alex"
Here is a complete Python program using PyYaml demonstrating tag usage:
import yaml
class Entity:
def __init__(self, idNum, components):
self.id = idNum
self.components = components
def __repr__(self):
return "%s(id=%r, components=%r)" % (
self.__class__.__name__, self.id, self.components)
class Component:
def __init__(self, name):
self.name = name
def __repr__(self):
return "%s(name=%r)" % (
self.__class__.__name__, self.name)
text = """
%TAG !py! tag:yaml.org,2002:python/object:__main__.
---
- !py!Component &transform
name: Transform
- !!python/object:__main__.Component &render
name: Render
- !<tag:yaml.org,2002:python/object:__main__.Entity>
id: 123
components: [*transform, *render]
- !<tag:yaml.org,2002:int> "3"
"""
result = yaml.load(text)
More information is available in the spec