java.lang.Long cannot be cast to java.lang.Double ERROR when using MAX() - google-cloud-dataprep

Since the update of Cloud Dataprep yesterday 19/11/2018, I got an error everytime I'm using the function MAX(), either alone or in pivot.
Some notes :
I used the MAX function on another dataset and it was working. ( So max() works )
I didn't have this issue before the update of dataprep yesterday, the
flow was working.
I tried many time to edit the recipe to isolate the
issue but it seems to be that MAX() function
The column i'm using MAX() on are of type INT. i tried to convert INT->
FLOAT -> INT to make sure it's INT before using MAX() but keep getting the same issue
Here is the log
java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
at com.trifacta.google.dataflow.functions.MaxCombineFn.binaryOperation(MaxCombineFn.java:18)
at com.trifacta.google.dataflow.functions.BinaryOperationCombineFn.addInput(BinaryOperationCombineFn.java:60)
at org.apache.beam.sdk.transforms.CombineFns$ComposedCombineFn.addInput(CombineFns.java:295)
at org.apache.beam.sdk.transforms.CombineFns$ComposedCombineFn.addInput(CombineFns.java:212)
at org.apache.beam.runners.core.GlobalCombineFnRunners$CombineFnRunner.addInput(GlobalCombineFnRunners.java:109)
at com.google.cloud.dataflow.worker.PartialGroupByKeyParDoFns$ValueCombiner.add(PartialGroupByKeyParDoFns.java:163)
at com.google.cloud.dataflow.worker.PartialGroupByKeyParDoFns$ValueCombiner.add(PartialGroupByKeyParDoFns.java:141)
at com.google.cloud.dataflow.worker.util.common.worker.GroupingTables$CombiningGroupingTable$1.add(GroupingTables.java:385)
at com.google.cloud.dataflow.worker.util.common.worker.GroupingTables$GroupingTableBase.put(GroupingTables.java:230)
at com.google.cloud.dataflow.worker.util.common.worker.GroupingTables$GroupingTableBase.put(GroupingTables.java:210)
at com.google.cloud.dataflow.worker.util.common.worker.SimplePartialGroupByKeyParDoFn.processElement(SimplePartialGroupByKeyParDoFn.java:35)
at com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:43)
at com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:48)
at com.google.cloud.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:271)
at org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:309)
at org.apache.beam.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:77)
at org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:621)
at org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.output(DoFnOutputReceivers.java:71)
at org.apache.beam.sdk.transforms.MapElements$1.processElement(MapElements.java:128)

I'm with Google Cloud Platform Support.
This is is an internal issue that happened after the update on the 19th (as you said). We know about this and we are working along the Trifacta team (as this is a third party product developed and managed by them).
There is a Public Issue regarding this, feel free to add info or anything you feel is needed.
EDIT: The issue is fixed now, could you try now and tell me if it worked?

Related

How to handle jOOQ depreciation warning on procedure that returns a trigger?

I use the following stored procedure to maintain the edit time on a few tables via triggers on those tables:
CREATE OR REPLACE FUNCTION maintain_edit_time()
RETURNS TRIGGER AS $t_edit_time$
BEGIN
NEW.edit_timestamp = NOW();
RETURN NEW;
END;
$t_edit_time$ LANGUAGE plpgsql;
When generating jOOQ objects for the database in question, I get the following generated code:
/**
* #deprecated Unknown data type. Please define an explicit {#link org.jooq.Binding} to specify how this type should be handled. Deprecation can be turned off using <deprecationOnUnknownTypes/> in your code generator configuration.
*/
#java.lang.Deprecated
public static Object maintainEditTime(Configuration configuration) {
MaintainEditTime f = new MaintainEditTime();
f.execute(configuration);
return f.getReturnValue();
}
/**
* #deprecated Unknown data type. Please define an explicit {#link org.jooq.Binding} to specify how this type should be handled. Deprecation can be turned off using <deprecationOnUnknownTypes/> in your code generator configuration.
*/
#java.lang.Deprecated
public static Field<Object> maintainEditTime() {
MaintainEditTime f = new MaintainEditTime();
return f.asField();
}
I assume this is because I do not have a jOOQ binding between TRIGGER and a Java object. However, I do not have a clue what I would define this binding as, nor do I have any need for a binding to exist.
Left alone, though, this generates a compile warning. What's the cleanest, easiest way to resolve this?
Options seem to include turning off deprecation, using ignoreProcedureReturnValues, or creating a binding. Ideally, I'd like to simply not generate a Java object for this procedure, but I could not find a way to do that.
Using ignoreProcedureReturnValues globally effects the project just because of this, which is fine for now, I don't have other procedures at all, much less others with a return value. But, I hate to limit future use. Also, I'm unclear one what the comment "This feature is deprecated as of jOOQ 3.6.0 and will be removed again in jOOQ 4.0." means on the jOOQ site under this flag. Is the flag going away, or is support for stored procedure return types going away? A brief dig through the jOOQ github issues didn't yield me an answer.
Tempted to simply turn off deprecation. This also seems like a global and not beneficial change, but it would serve the purpose.
If I created a binding, I have no idea what it would do, or how to define it since TRIGGER really isn't a sensible thing to bind a Java object to. I assume I'd specify as TRIGGER in the forcedType element, but then the Java binding seems like a waste of time at best and misleading at worst.
You've already found the perfect solution, which you documented in your own answer. I'll answer your various other questions here, for completeness' sake
Using ignoreProcedureReturnValues globally effects the project just because of this, which is fine for now, I don't have other procedures at all, much less others with a return value. But, I hate to limit future use. Also, I'm unclear one what the comment "This feature is deprecated as of jOOQ 3.6.0 and will be removed again in jOOQ 4.0." means on the jOOQ site under this flag. Is the flag going away, or is support for stored procedure return types going away? A brief dig through the jOOQ github issues didn't yield me an answer.
That flag has been introduced because of a backwards incompatible change in the code generator that affected only SQL Server: https://github.com/jOOQ/jOOQ/issues/4106
In SQL Server, procedures always return an INT value, just like functions. This change allowed for fetching this INT value using jOOQ generated code. In some cases, it was desireable to not have this feature enabled when upgrading from jOOQ 3.5 to 3.6. Going forward, we'll always generate this INT return type on SQL Server stored procedures.
This is why the flag has been deprecated from the beginning, as we don't encourage its use, except for backwards compatibility usage. It probably won't help you here.
Stored procedure support is definitely not going to be deprecated.
Tempted to simply turn off deprecation. This also seems like a global and not beneficial change, but it would serve the purpose.
Why not. A quick workaround. You don't have to use all the generated code. The deprecation is there to indicate that calling this generated procedure probably won't work out of the box, so its use is discouraged.
If I created a binding, I have no idea what it would do, or how to define it since TRIGGER really isn't a sensible thing to bind a Java object to. I assume I'd specify as TRIGGER in the forcedType element, but then the Java binding seems like a waste of time at best and misleading at worst.
Indeed, that wouldn't really add much value to your use cases as you will never directly call the trigger function in PostgreSQL.
Again, your own solution using <exclude> is the ideal solution here. In the future, we might offer a new code generation configuration flag that allows for turning on/off the generation of trigger functions, with the default being off: https://github.com/jOOQ/jOOQ/issues/9270
Well, after noting that an ideal way to do this would be to ignore that procedure, I did find how to ignore the procedure by name in the generally excellent jOOQ website documentation. Don't know how I missed in on first pass. If I needed to call this procedure in Java, I'm unclear which of the above options I would have used.
Luckily, there was no need for this procedure to be referenced in code, and I excluded it as shown below in in the jOOQ XML configuration.
<excludes>
databasechangelog.*
| maintain_edit_time
</excludes>

Metabase/Clojure error: Unfreezable type: class org.postgresql.jdbc.PgArray

Anyone knows something about this error in Metabase (or a similar one in any Clojure program)?
Unfreezable type: class org.postgresql.jdbc.PgArray
It happens regularly, but not always, when I use a postgresql array type (i.e. TEXT[]) in a question => it probably depends on the exact data in the pgArray somehow, but I wasn't able to figure out how.
There is a workaround to get rid of it: retype/cast all pgArrays to TEXT (or VARCHAR). But I would really like to understand why this is happening. Thx for any insights.
Metabase uses a library called Nippy:
https://github.com/metabase/metabase/blob/master/project.clj#L61
Nippy provides fast serialization of common types. The error "Unfreezable type":
https://github.com/ptaoussanis/nippy/blob/master/src/taoensso/nippy.clj#L720
occurs when Nippy comes across data of a type it doesn't know how to serialize. PgArray, as a bespoke Postgres array type, is evidently one of those.
Providing serialization guidance to Nippy is not hard. Maybe make an issue for the Metabase folks with your details asking if they can do this?

Spark cannot find case class on classpath

I have an issue where Spark is failing to generate code for a case class. Here is the spark error
Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 52, Column 43: Identifier expected instead of '.'
Here is the referenced line in the generated code
/* 052 */ private com.avro.message.video.public.MetricObservation MapObjects_loopValue34;
It should be noted that com.avro.message.video.public.MetricObservation is a nested case class in part of a larger hierarchy. It is also used in other places in the code fine. It should also be noted that this pipeline works fine if I use the RDD API, but I want to use the Dataset API because I want to write out the Dataset in parquet. Has anyone seen this issue before?
I'm using Scala 2.11 and Spark 2.1.0. I was able to upgrade to Spark 2.2.1 and the issue is still there.
Do you think that SI-7555 or something like it has any bearing on this? I have noticed the past that Scala reflection has had issues generating TypeTags for statically nested classes. Do you think something like that is going on or is this strictly a catalyst issue in spark? You might want to file a spark ticket too.
So it turns out that changing the package name of the affect class "fixes" (ie made go away) the problem. I really have no idea why this is or even how to reproduce it in a small test case. What worked for me was I just created a higher level package that work. Specifically com.avro.message.video.public -> com.avro.message.publicVideo.

Swift 1.2 -> Swift 2 Conversion time

Has anyone converted an app from 1.2 to Swift 2? My app is small - about 1k LOC, and its been converting for >2 hours now. I'm stuck on the following screen:
How long should I expect this to take? Thanks...
The process is long, but it shouldn't take more than several minutes.
The Swift converter is probably having an issue (e.g.: some kind of infinite loop).
You should abort and try to find what happened or maybe migrate manually.
Swift compiler has an issue with arrays. I have commented out all the elements of the array (like 10x UIColor), left only one element and conversion went smoothly.
Here's how you can debug the issue in your project:
Got to the Report navigator (CMD + 8)
Build your app, select the latest build and watch log (select All Messages filter)
The problematic file will be stuck on the compile status.
Navigate to that file and figure out what can hang the compiler (probably arrays/dictionaries).
Why build not convert? Because it's verbose.

Select name of root element with XPath in PostgreSQL

I've got a bunch of XML messages in a PostgreSQL 9.1.3 table, with a column content of type XML). They're not all the same "type", so I'm trying to extract the root type using a query like this:
SELECT xpath('name(/*)', content) FROM message;
as recommended by this answer to a similar SO question.
A sample message is:
<?xml version="1.0" encoding="UTF-8"?>
<WML version="6" xmlns="http://example.com/schemas/WML">...</WML>
For which case I'd hope to get the result '{WML}'. Unfortunately it just returns an empty array. Adding the namespaces parameter to xpath, or removing the namespace from the message, does not help.
A discussion on the PostgreSQL mailing lists seems to explain it as a bug in XPath handling in PostgreSQL. However that was in 2008, and a look at the PostgreSQL source shows that piece of code was changed in 2009. I'm not a PostgreSQL developer so I'm not confident that the bug is or is not a factor in my case.
But I'm wondering if there's a workaround, such as an alternative XPath expression that might work? I'd prefer not to have to resort to regular expressions to parse XML, though if you can suggest a short, punchy, robust RE then it would be better than nothing.
Clearly, this has not yet been solved as of June 2011.
I found this thread on pgsql-hackers that describes your problem exactly.
I don't know of a workaround for older versions, but this is fixed in PostgreSQL 9.2, so that's great.
(The likeliest workaround would likely to have been to write a function to parse the XML manually, but I'm glad I don't have to resort to that now!)