Apache calcite: cast integer to datetime - apache-beam

I am using Beam SQL and trying to cast integer to datetime field.
Schema resultSchema =
Schema.builder()
.addInt64Field("detectedCount")
.addStringField("sensor")
.addInt64Field("timestamp")
.build();
PCollection<Row> sensorRawUnboundedTimestampedSubset =
sensorRowUnbounded.apply(
SqlTransform.query(
"select PCOLLECTION.payload.`value`.`count` detectedCount, \n"
+ "PCOLLECTION.payload.`value`.`id` sensor, \n"
+ "PCOLLECTION.`timestamp` `timestamp` \n"
+ "from PCOLLECTION "))
.setRowSchema(resultSchema);
For some computation and windowing, I want to convert/cast timestamp to Datetime field? Please provide some pointers to convert timestamp in resultSchema to DateTime. datatype.

There is no out of the box way to do that in Beam (or in Calcite). Short version - Calcite or Beam have no way of knowing how you actually store the dates or timestamps in the integers. However, assuming you have epoch millis, this should work:
#Test
public void testBlah() throws Exception {
// input schema, has timestamps as epoch millis
Schema schema = Schema.builder().addInt64Field("ts").addStringField("st").build();
DateTime ts1 = new DateTime(2019, 8, 9, 10, 11, 12);
DateTime ts2 = new DateTime(2019, 8, 9, 10, 11, 12);
PCollection<Row> input =
pipeline
.apply(
"createRows",
Create.of(
Row.withSchema(schema).addValues(ts1.getMillis(), "two").build(),
Row.withSchema(schema).addValues(ts2.getMillis(), "twelve").build()))
.setRowSchema(schema);
PCollection<Row> result =
input.apply(
SqlTransform.query(
"SELECT \n"
+ "(TIMESTAMP '1970-01-01 00:00:00' + ts * INTERVAL '0.001' SECOND) as ts, \n"
+ "st \n"
+ "FROM \n"
+ "PCOLLECTION"));
// output schema, has timestamps as DateTime
Schema outSchema = Schema.builder().addDateTimeField("ts").addStringField("st").build();
PAssert.that(result)
.containsInAnyOrder(
Row.withSchema(outSchema).addValues(ts1, "two").build(),
Row.withSchema(outSchema).addValues(ts2, "twelve").build());
pipeline.run();
}
Alternatively you can always do it in java, not in SQL, just apply a custom ParDo to the output of the SqlTransform. In that ParDo extract the integer timestamp from the Row object, convert it to DateTime and then emit it, e.g. as part of another row with a different schema.

Related

How to pass datetime field in UTC format as a parmeter in Query in DB2

I have a date time field which is coming from an external system in UTC format 2022-01-02T08:00:00.000+00:00. This value should be queried in DB2 to determine whether the record exists or not. The date stored in DB2 is in the format 2022-01-01 08:00:00.000 Is there any way to convert the incoming date in the format 2022-01-01 08:00:00.000 ?
The final query should be something like
select * from table where changedate = '2022-01-02T08:00:00.000+00:00'
Db2 doesn't store timestamps in a string format. Some binary format is used for that.
So, if I got you right, your question should be changed to "how to convert YYYY-MM-DDTHH24.MI.SS.FF3XXXXXX string representation of timestamp to the timestamp data type".
Unfortunately, there is no such a built-in pattern in the TO_DATE / TIMESTAMP_FORMAT function, but you can use the following expression. T column has the timestamp data type, and you may use this expression in the select * from table where changedate = ... statement.
SELECT
S
, TO_DATE (TRANSLATE (SUBSTR (S, 1, 23), ' ', 'T'), 'YYYY-MM-DD HH24:MI:SS.FF3')
+ CAST (TRANSLATE (SUBSTR (S, 24), '', ':', '') || '00' AS DEC (6)) AS T
FROM
(
VALUES
'2022-01-02T08:00:00.000+00:00'
, '2022-01-02T08:00:00.000+03:30'
, '2022-01-02T08:00:00.000-03:30'
) T (S)
S
T
2022-01-02T08:00:00.000+00:00
2022-01-02-08.00.00.000000
2022-01-02T08:00:00.000+03:30
2022-01-02-11.30.00.000000
2022-01-02T08:00:00.000-03:30
2022-01-02-04.30.00.000000

How to run the postgres query with date as input on the column with timestamp in long format

I want to query postgres database table which has the column with timestamp in long milliseconds. But I have the time in date format "yyyy-MM-dd HH:mm:ssZ" like this. How can I convert this date format to long milliseconds to run the query?
You can either convert your long value to a proper timestamp:
select *
from the_table
where to_timestamp(the_millisecond_column / 1000) = timestamp '2020-10-05 07:42'
Or extract the seconds from the timestamp value :
select *
from the_table
where the_millisecond_column = extract(epoch from timestamp '2020-10-05 07:42') * 1000
The better solution is however to convert that column to a proper timestamp column to avoid the constant conversion between (milliseconds) and proper timestamp values

Apache PIG - Get only date from TimeStamp

I've the following code:
Data = load '/user/cloudera/' using PigStorage('\t')
as
( ID:chararray,
Time_Interval:chararray,
Code:chararray);
transf = foreach Source_Data generate (int) ID,
ToString( ToDate((long) Time_Interval), 'yyyy-MM-dd hh:ss:mm') as TimeStamp,
(int) Code;
SPLIT transf INTO Src25 IF (ToString(TimeStamp, 'yyyy-MM-dd')=='2016-07-25'),
Src26 IF (ToString(TimeStamp, 'yyyy-MM-dd')=='2016-07-26');
STORE Src25 INTO '/user/cloudera/2016-07-25' using PigStorage('\t');
STORE Src26 INTO '/user/cloudera/2016-07-26' using PigStorage('\t');
I want to split the files by date and the rules that I'm putting in Split statement it gives me error...
How can I transform TimeStamp (used on transf statement) in Date to make the comparasion?
Many thanks!
After you get the datetime object from ToDate, use GetYear(),GetMonth(),GetDay() on the datetime object and use CONCAT to construct only the date.
transf = foreach Source_Data generate
(int) ID,
ToString( ToDate((long) Time_Interval), 'yyyy-MM-dd hh:ss:mm') as TimeStamp,
(int) Code;
transf_new = foreach transf generate
ID,
TimeStamp,
CONCAT(CONCAT(CONCAT(GetYear(TimeStamp),'-')),(CONCAT(GetMonth(TimeStamp),'-')),GetDay(TimeStamp)) AS Day,-- Note:Brackets might be slightly off but it should be like 'yyyy-MM-dd' format
Code;
-- Now use the new Day column to split the data
SPLIT transf_new INTO Src25 IF (Day =='2016-07-25'),
Src26 IF (Day =='2016-07-26');

how to use utc in postgres timestamp with jdbc PrepareStatement parameter?

I'am using timestamp data type on pg9.4, but there come very strange problem with to_json.
now i am in Shanghai, UTC+08:00 timezone.
see below:
conn.createStatement().execute("set time zone 'UTC'");
String sql = "select to_json(?::timestamp) as a, to_json(current_timestamp::timestamp) as b";
PreparedStatement ps = conn.prepareStatement(sql);
Timestamp timestamp = new Timestamp(new Date().getTime());
ps.setTimestamp(1, timestamp);
ResultSet rs = ps.executeQuery();
while(rs.next()){
System.out.println("a " + rs.getString("a") + ", b " + rs.getString("b"));
}
output:
a "2015-09-24T16:52:42.529", b "2015-09-24T08:53:25.468191"
it's mean when i pass a TIMESTAMP parameter to pg with jdbc, the timezone is still in shanghai, not UTC.
this problem is not due to to_json function, i have make a table with one timestamp column, this problem still exits, the code of above is shortest sample.
how to let's all timestamp work in UTC timezone?
You need to set Calendar tzCal = Calendar.getInstance(TimeZone.getTimeZone("UTC"));, Before you create your prepared statement.
UPDATED CODE SNIPPET
conn.createStatement().execute("set time zone 'UTC'");
String sql = "select to_json(?::timestamp) as a, to_json(current_timestamp::timestamp) as b";
Calendar tzCal = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
PreparedStatement ps = conn.prepareStatement(sql);
Timestamp timestamp = new Timestamp(new Date().getTime());
ps.setTimestamp(1, timestamp);
ResultSet rs = ps.executeQuery();
while(rs.next()){
System.out.println("a " + rs.getString("a") + ", b " + rs.getString("b"));
}
This way you will be able to set timezone to UTC in your JDBC call.
If you want to run the whole application/JVM in UTC, set -Duser.timezone=UTC flag while starting JVM.
HTH.

HIVE - group by date function

Can anyone tell me why I'm not getting counts for each f0, MONTH, DAY, HOUR, MINUTE group in my result set?
Result set:
Query:
SELECT t.f0, MONTH(TO_DATE(Hex2Dec(t.f2))), DAY(TO_DATE(Hex2Dec(t.f2))), HOUR(TO_DATE(Hex2Dec(t.f2))), MINUTE(TO_DATE(Hex2Dec(t.f2))), COUNT(DISTINCT t.f1)
FROM table t
WHERE (t.f0 = 1 OR t.f0 = 2)
AND (t.f3 >= '2013-02-06' AND t.f3 < '2013-02-15')
AND (Hex2Dec(t.f2) >= 1360195200 AND Hex2Dec(t.f2) < 1360800000)
AND *EXTRA CONDITIONS*
GROUP BY t.f0, MONTH(TO_DATE(Hex2Dec(t.f2))), DAY(TO_DATE(Hex2Dec(t.f2))), HOUR(TO_DATE(Hex2Dec(t.f2))), MINUTE(TO_DATE(Hex2Dec(t.f2)))
Schema:
f0 INT (Partition Column)
f1 INT
f2 STRING
f3 STRING (Partition Column)
f4 STRING
f5 STRING
f6 STRING
f7 MAP<STRING,STRING>
*f2 is a unix timestamp in Hexadecimal format
This might be because to_date returns null when it's applied on a unix time.
According to the Hive manual:
to_date(string timestamp): Returns the date part of a timestamp
string: to_date("1970-01-01 00:00:00") = "1970-01-01"
Use from_unixtime instead to get back the correct date parts.
Note:
I assume Hex2Dec UDF is taken from the core library of HIVE-1545