try flow link
I've been messing around with typing a "camel caser" function (one that consumes JSON and camel cases its keys). I've run into a few issues along the way, and I'm curious if y'all have any suggestions.
A camel caser never changes the shape of its argument, so I would like to preserve the type of whatever I pass in; ideally, calling camelize on an array of numbers would return another array of numbers, etc.
I've started with the following:
type JSON = null | string | number | boolean | { [string]: JSON } | JSON[]
function camelize<J: JSON>(json: J): J {
throw "just typecheck please"
}
This works great for the simple cases, null, string, number, and boolean, but things don't quite work perfectly for JSON dictionaries or array. For example:
const dictionary: { [string]: number } = { key: 123 }
const camelizedDictionary = camelize(dictionary)
will fail with a type error. A similar issue will come up if you pass in a value of, say, type number[]. I think I understand the issue: arrays and dictionaries are mutable, and hence invariant in the type of the values they point to; an array of numbers is not a subtype of JSON[], so Flow complains. If arrays and dictionaries were covariant though, I believe this approach would work.
Given that they're not covariant though, do y'all have any suggestions for how I should think about this?
Flow now supports $ReadOnly and $ReadOnlyArray so a different approach would be to define the JSON type as
type JSON =
| null
| string
| number
| boolean
| $ReadOnly<{ [string]: JSON }>
| $ReadOnlyArray<JSON>
This solves one of the issues noted above because $ReadOnlyArray<number> is a subtype of $ReadOnlyArray<JSON>.
This might not work depending on the implementation of the camelize function since it might modify the keys to be camel-case. But, it's good to know about the read-only utilities in Flow since they are powerful and allow for more concise and, possibly, more correct function types.
Use property variance to solve your problem with dictionaries:
type JSON = null | string | number | boolean | { +[string]: JSON } | JSON[]
https://flowtype.org/blog/2016/10/04/Property-Variance.html
As for your problem with Arrays, as you've pointed out the issue is with mutability. Unfortunately Array<number> is not a subtype of Array<JSON>. I think the only way to get what you want is to explicitly enumerate all of the allowed Array types:
type JSON = null | string | number | boolean | { +[string]: JSON } | Array<JSON> | Array<number>;
I've added just Array<number> here to make my point. Obviously this is burdensome, especially if you also want to include arbitrary mixes of JSON elements (like Array<string | number | null>). But it will hopefully address the common issues.
(I've also changed it to the array syntax with which I am more familiar but there should be no difference in functionality).
There has been talk of adding a read-only version of Array which would be analogous to covariant object types, and I believe it would solve your problem here. But so far nothing has really come of it.
Complete tryflow based on the one you gave.
Related
I am working on building a flink application which reads data from kafka
topics, apply some transformations and writes to the Iceberg table.
I read the data from kafka topic (which is in json) and use circe to decode
that to scala case class with scala Option values in it.
All the transformations on the datastream works fine.
Case Class Looks like below
Event(app_name: Option[String], service_name: Option[String], ......)
But when I try to convert the stream to a table to write to iceberg table
due to the case classes the columns are converted to Raw type as shown
below.
table.printSchema()
service_name RAW('scala.Option', '...'),
conversion_id RAW('scala.Option', '...'),
......
And the table write fails as below.
Query schema: [app_name: RAW('scala.Option', '...'), .........
Sink schema: [app_name: STRING, .......
Does flink table api support scala case classes with option values?
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/serialization/types_serialization/#special-types
I found out that it is supported in datastream at this documentation.
Is there a way to do this in Table API.
Thanks in advance for the help..
I had the exact same issue of Option being parsed as RAW and found yet another workaround that might interest you:
TL;DR:
instead of using .get which returns an Option and diligently declaring the return type to be Types.OPTION(Types.INSTANT) which doesn't work, I instead use .getOrElse("key", null) and declare the type as conventional. Table API then recognizes the column type, creates a nullable column and interprets the null correctly. I can then filter those rows with IS NOT NULL.
Detailed example:
For me it starts with a custom map function in which I unpack data where certain fields could be missing:
class CustomExtractor extends RichMapFunction[MyInitialRecordClass, Row] {
def map(in: MyInitialRecordClass): Row = {
Row.ofKind(
RowKind.INSERT,
in._id
in.name
in.times.getOrElse("time", null) // This here did the trick instead of using .get and returning an option
)
}
}
And then I use it like this explicitly declaring a return type.
val stream: DataStream[Row] = data
.map(new CustomExtractor())
.returns(
Types.ROW_NAMED(
Array("id", "name", "time"),
Types.STRING,
Types.STRING,
Types.INSTANT
)
val table = tableEnv.fromDataStream(stream)
table.printSchema()
// (
// `id` STRING,
// `name` STRING,
// `time` TIMESTAMP_LTZ(9),
// )
tableEnv.createTemporaryView("MyTable", table)
tableEnv
.executeSql("""
|SELECT
| id, name, time
|FROM MyTable""".stripMargin)
.print()
// +----+------------------+-------------------+---------------------+
// | op | id | name | time |
// +----+------------------+-------------------+---------------------+
// | +I | foo | bar | <Null> |
// | +I | spam | ham | 2022-10-12 11:32:06 |
This was at least for me exactly what I wanted. I'm very much new to Flink and would be curious if the Pros here think this is workable or horribly hacky instead.
Using scala 2.12.15 and flink 1.15.2
The type system of the Table API is more restrictive than the one of the DataStream API. Unsupported classes are immediately treated as black-boxed type RAW. This allows objects to still pass the API but it might not be supported by every connector.
From the exception, it looks like you declared the sink table with app_name: STRING, so I guess you are fine with a string representation. If this is the case, I would recommend to implement a user-defined function that performs the conversion to string.
I am working with Q# on a generic grover search implementation and I wanted to define a custom Oracle type
newtype ModelOracle = ((Qubit[], Qubit[], Int[], Qubit) => Unit);
// ...
function GroverMaxMatchingOracle(search_set: (Int,Int)[], vertices: Int[], marked_pts: Bool[]): ModelOracle {
return ModelOracle(ApplyMaxMatchingOracle(_,_,_,_,search_set, vertices, marked_pts));
}
that will fit into my model. But when I try to use it (kind of in the same way as they use StateOracle in the DatabaseSearch sample), I get an error saying that the new type ModelOracle is not a valid operation
fail: Microsoft.Quantum.IQSharp.Workspace[0]
QS5021: The type of the expression must be a function or operation type. The given expression is of type OracleHelper.ModelOracle.
What am I getting wrong about the types here?
It looks like you have defined things ok, so it might be that you have to unwrap the user defined type first with the ! operator.
So where you are using it you may have to do something like GroverMaxMatchingOracle!(...)
Another approach could be to name the tuple in your UDT:
newtype ModelOracle = (Apply: (Qubit[], Qubit[], Int[], Qubit) => Unit);
Then wherever you want to use it you can directly used the named item Apply like this: GroverMaxMatchingOracle::Apply(...)
If its helpful, there is a section on user defined types (8.2) in the book #cgranade and I are working on, Learn Quantum Computing with Python and Q#
This code (for checking the changed timestamp of the current directory):
my $date = ".".IO.changed.DateTime.yyyy-mm-dd but Dateish;
say $date;
yields the error:
«Ambiguous call to 'gist(Str+{Dateish}: )'; these signatures all match:: (Str:D: *%_):(Dateish:D: │ avalenn
| *%_) in block <unit> at <tmp> line 1»
Without the Dateish mix in, the string is simply 2018-05-12. Using any other kind of Dateish function, like .week-year also yields a different error:
«Str+{Dateish}Invocant of method 'Bridge' must be an object instance of type 'Int', not a type │ a3r0
| object of type 'Int'. Did you forget a '.new'? in block <unit> at <tmp> line 1»
Does it simply mean that you can't mix in a string with Dateish? I've done something similar with hours without a problem.
To answer your question, we should look at that role:
my role Dateish {
has Int $.year;
has Int $.month; # should be int
has Int $.day; # should be int
has Int $.daycount;
has &.formatter;
...
multi method Str(Dateish:D:) {
&!formatter ?? &!formatter(self) !! self!formatter
}
multi method gist(Dateish:D:) { self.Str }
...
}
So, role Dateish has several attributes, and the methods use those attributes to calculate their return values.
When you do $some-string but Dateish, you are not doing anything to initialize the attributes, and thus method calls that use them (potentially indirectly) fail in interesting ways.
How do you get a Dateish object from a DateTime then? Well, DateTime is one, already, or you can coerce to Date if that is what you want:
my $date = ".".IO.changed.DateTime.Date; say $date
You might also try to instantiate a Dateish by supplying all attributes, but I don't see any advantage over just using Date as intended.
Given the input value:
input =
name:'Foo'
id:'d4cbd9ed-fabc-11e6-83e6-307bd8cc75e3'
ref:5
addtData:'d4cbd9ed-fabc-11e6-83e6-307bd8cc75e3'
data:'bar'
When I try to destructure the input via a function like this:
simplify: (input)->
{ name, ref, id } = input
...the return value is still the full input or a copy of the input.
Am I missing something simple here? How can I access the destructured value. If you can't access the value via a return, it seems that destructuring has little value outside of locally scoped values.
While this isn't necessarily an advantage, the only way I was able to transpile and get the correct answer was to assign the destructure values to the local scope using # (aka this).
input =
name:'foo'
data:'bar'
id: 12314
key:'children'
ref:1
f = (input)->
{ #name, #id } = input
r = {}
f.call(r, input)
console.log r # Object {name: "foo", id: 12314}
working example - link
If someone has a better way to approach this, please add an answer so I can select it as this doesn't seem like the best way.
I have two methods like so:
Foo[] GetFoos(Type t) { //do some stuff and return an array of things of type T }
T[] GetFoos<T>()
where T : Foo
{
return GetFoos(typeof(T)) as T[];
}
However, this always seems to return null. Am I doing things wrong or is this just a shortfall of C#?
Nb:
I know I could solve this problem with:
GetFoos(typeof(T)).Cast<T>().ToArray();
However, I would prefer to do this wothout any allocations (working in an environment very sensitive to garbage collections).
Nb++:
Bonus points if you suggest an alternative non allocating solution
Edit:
This raises an interesting question. The MSDN docs here: http://msdn.microsoft.com/en-us/library/aa664572%28v=vs.71%29.aspx say that the cast will succeed if there is an implicit or explicit cast. In this case there is an explicit cast, and so the cast should succeed. Are the MSDN docs wrong?
No, C# casting isn't useless - you simply can't cast a Foo[] to a T[] where T is a more derived type, as the Foo[] could contain other elements different to T. Why don't you adjust your GetFoos method to GetFoos<T>()? A method only taking a Type object can easily be converted into a generic method, where you could create the array directly via new T[].
If this is not possible: Do you need the abilities an array offers (ie. indexing and things like Count)? If not, you can work with an IEnumerable<T> without having much of a problem. If not: you won't get around going the Cast<T>.ToArray() way.
Edit:
There is no possible cast from Foo[] to T[], the description in your link is the other way round - you could cast a T[] to a Foo[] as all T are Foo, but not all Foo are T.
If you can arrange for GetFoos to create the return array using new T[], then you win. If you used new Foo[], then the array's type is fixed at that, regardless of the types of the objects it actually holds.
I haven't tried this, but it should work:
T[] array = Array.ConvertAll<Foo, T>(input,
delegate(Foo obj)
{
return (T)obj;
});
You can find more at http://msdn.microsoft.com/en-us/library/exc45z53(v=VS.85).aspx
I think this converts in-place, so it won't be doing any re-allocations.
From what I understand from your situation, using System.Array in place of a more specific array can help you. Remember, Array is the base class for all strongly typed arrays so an Array reference can essentially store any array. You should make your (generic?) dictionary map Type -> Array so you may store any strongly typed array also while not having to worry about needing to convert one array to another, now it's just type casting.
i.e.,
Dictionary<Type, Array> myDict = ...;
Array GetFoos(Type t)
{
// do checks, blah blah blah
return myDict[t];
}
// and a generic helper
T[] GetFoos<T>() where T: Foo
{
return (T[])GetFoos(typeof(T));
}
// then accesses all need casts to the specific type
Foo[] f = (Foo[])GetFoos(typeof(Foo));
DerivedFoo[] df = (DerivedFoo[])GetFoos(typeof(DerivedFoo));
// or with the generic helper
AnotherDerivedFoo[] adf = GetFoos<AnotherDerivedFoo>();
// etc...
p.s., The MSDN link that you provide shows how arrays are covariant. That is, you may store an array of a more derived type in a reference to an array of a base type. What you're trying to achieve here is contravariance (i.e., using an array of a base type in place of an array of a more derived type) which is the other way around and what arrays can't do without doing a conversion.