Crate DB Timestamp query

Crate DB Timestamp query - ubuntu-16.04

I have created a table in crate DB with timestamp column. However while inserting records into it, there is not timezone information passed along as mentioned in the docs.
insert into t1 values(2,'2017-06-30T02:21:20');
this gets stored as:
2 | 1498789280000 (Fri, 30 Jun 2017 02:21:20 GMT)
Now my queries are all failing as the timestamp has got recorded as GMT and my queries are all in localtime timezone (Asia/Kolkata)
If anyone has run into this problem, could you please let me know whats the best way to modify the column to change values from GMT to IST without losing it, It has couple of millions of important records which cannot be lost or corrupted.
cheers!

CrateDB always assumes that timestamps are UTC when they are stored without timezone information. This is due to the internal representation as a simple long data type - which means that your timestamp is stored as a simple number: https://crate.io/docs/reference/en/latest/sql/data_types.html#timestamp
CrateDB also accepts the timezone information in your ISO string, so just inserting insert into t1 values(2,'2017-06-30T02:21:20+05:30'); will convert it to the appropriate UTC value.
For records that are already stored, you can make the DB aware of the timezone when querying for the field and convert the output back by passing the corresponding timezone value into a date_trunc or date_format function: https://crate.io/docs/reference/en/latest/sql/scalar.html#date-and-time-functions

this UPDATE test set ts = date_format('%Y-%m-%dT%H:%i:%s.%fZ','+05:30', ts); should do it.
cr> create table test(ts timestamp);
CREATE OK, 1 row affected (0.089 sec)
cr> insert into test values('2017-06-30T02:21:20');
INSERT OK, 1 row affected (0.005 sec)
cr> select date_format(ts) from test;
+-----------------------------+
| date_format(ts) |
+-----------------------------+
| 2017-06-30T02:21:20.000000Z |
+-----------------------------+
SELECT 1 row in set (0.004 sec)
cr> UPDATE test set ts = date_format('%Y-%m-%dT%H:%i:%s.%fZ','+05:30', ts);
UPDATE OK, 1 row affected (0.006 sec)
cr> select date_format(ts) from test;
+-----------------------------+
| date_format(ts) |
+-----------------------------+
| 2017-06-30T07:51:20.000000Z |
+-----------------------------+
SELECT 1 row in set (0.004 sec)
`

Related

Sometime Postgresql Query takes to much time to load Data

I am using PostgreSQL 13.4 and has intermediate level experience with PostgreSQL.
I have a table which stores reading from a device for each customer. For each customer we receive around ~3,000 data in a day and we store them in to our usage_data table as below
We have the proper indexing in place.
Column | Data Type | Index name | Idx Access Type
-------------+-----------------------------+---------------------------+---------------------------
id | bigint | |
customer_id | bigint | idx_customer_id | btree
date | timestamp | idx_date | btree
usage | bigint | idx_usage | btree
amount | bigint | idx_amount | btree
Also I have common index on 2 columns which is below.
CREATE INDEX idx_cust_date
ON public.usage_data USING btree
(customer_id ASC NULLS LAST, date ASC NULLS LAST)
;
Problem
Few weird incidents I have observed.
When I tried to get data for 06/02/2022 for a customer it took almost 20 seconds. It's simple query as below
SELECT * FROM usage_data WHERE customer_id =1 AND date = '2022-02-06'
The execution plan
When I execute same query for 15 days then I receive result in 32 milliseconds.
SELECT * FROM usage_data WHERE customer_id =1 AND date > '2022-05-15' AND date <= '2022-05-30'
The execution plan
Tried Solution
I thought it might be the issue of indexing as for this particular date I am facing this issue.
Hence I dropped all the indexing from the table and recreate it.
But the problem didn't resolve.
Solution Worked (Not Recommended)
To solve this I tried another way. Created a new database and restored the old database
I executed the same query in new database and this time it takes 10 milliseconds.
The execution plan
I don't think this is proper solution for production server
Any idea why for any specific data we are facing this issue?
Please let me know if any additional information is required.
Please guide. Thanks

postgres - date time is automatically converted

Using vscode in debug mode, when I hover over a date field, it show as the below image.
but when I logged it out, it get converted to
"execution_date":"2021-12-02T20:23:48.322Z"
which is minus 7 hour.
The field is stored in postgres database on AWS RDS as timestamp, running show timezone; returns UTC and I am using VSCode in GMT+7 time. How can I fix this because the date get changed and used to call api, so the returned result would be incorrect.

This is not a complete answer as that would depend on more information. Instead it is an explanation of what is going on that may help you troubleshoot:
set TimeZone = UTC;
show timezone;
TimeZone
----------
UTC
--Show that timestamp is taken at UTC
select now();
now
-------------------------------
2021-12-05 18:23:38.604681+00
--Table with timestamp and timestamptz to show different behavior.
create table dt_test(id integer, ts_fld timestamp, tsz_fld timestamptz);
--Insert local time 'ICT'
insert into dt_test values (1, '2021-12-03 03:23:48.322+07', '2021-12-03 03:23:48.322+07');
--The timestamp entry ignores the time zone offset, while the timestamptz uses it to rotate to UTC as '2021-12-03 03:23:48.322+07' is same as '2021-12-02 20:23:48.322+00'
select * from dt_test ;
id | ts_fld | tsz_fld
----+-------------------------+----------------------------
1 | 2021-12-03 03:23:48.322 | 2021-12-02 20:23:48.322+00
--timestamp takes the value as at 'ICT' and then rotates it to the current 'TimeZone' UTC. The timestamptz takes the value at UTC at rotates it to 'ICT'
select ts_fld AT TIME ZONE 'ICT', tsz_fld AT TIME ZONE 'ICT' from dt_test ;
timezone | timezone
----------------------------+-------------------------
2021-12-02 20:23:48.322+00 | 2021-12-03 03:23:48.322
I am guessing at some point in the process to get the value for the API the code is taking the timestamp value and applying AT TIME ZONE 'ICT' either in the database or downstream using some equivalent procedure.

I have found the problem, it's is related to how Sequelize and postgres deal with timestamp without timezone. If you have the same problem like me, please refer to the following link: https://github.com/sequelize/sequelize/issues/3000

PySpark rolling operation with variable ranges

I have a dataframe looking like this
some_data | date | date_from | date_to
1234 |1-2-2020| 1-2-2020 | 2-2-2020
5678 |2-2-2020| 1-2-2020 | 2-3-2020
and I need to perform some operations on some_data based on time ranges that are different for every row, and stored in date_from and date_to. This is basically a rolling operation on some_data vs date, where the width of the window is not constant.
If the time ranges were the same, like always 7 days preceding/following, I would just do a window with rangeBetween. Any idea how I can still use rangeBetween with these variable ranges? I could really use the partitioning capability Window provides...
My current solution is:
a join of the table with itself to obtain a secondary/nested date column. at this point every date has the full list of possible dates
some wheres to select, for each primary date the proper secondary dates according to date_from and date_to
a groupby the primary date with agg performing the actual operation on the selected rows
But I am afraid this would not be very performant on large datasets. Can this be done with Window? Do you have a better/more performant suggestion?
Thanks a lot,
Andrea.

Iterate through all rows, convert timestamp stored in database to UTC version and update it back

I have postgres database table, in that I have stored timestamps which are actually "Australia/Melbourne" timezone version of timestamps, and I want to update those to "UTC" version. How can I do that in one single postgres function?
I have looked for functions as I think we can iterate through rows in table and execute update in for loops. I know the single query using which you can get and update only one record at a time:
SELECT timestamp_column AT TIME ZONE 'Australia/Melbourne' AT TIME ZONE 'UTC'
from my_schema.my_table;`
Current table:
timestamp
---------------
2018-08-27 16:15:25.348
2018-05-15 13:52:12.052
2018-05-15 14:28:58.239
...
...
Expected table:
timestamp
----------------
2018-08-27 06:15:25.348
2018-05-15 03:52:12.052
2018-05-15 04:28:58.239
...
...

To create column with date datatype in hive table

I have created table in HIVE(0.10.0) using values :
2012-01-11 17:51 Stockton Children's Clothing 168.68 Cash
2012-01-11 17:51 Tampa Health and Beauty 441.08 Amex
............
Here date and time are tab separated values and I need to work on date column, Since Hive doesn't allow "date" datatype ,I have used "TIMESTAMP" for first date column(2012-01-11,...),
however after creating table it is showing NULL values for first column.
How to solve this? Please guide.

I loaded the data into a table with all columns defined as string and then casted the date value and loaded into another table where the column was defined as DATE. It seems to be working without any issues. The only difference is that I am using a Shark version of Hive, and to be honest with you, I am not sure whether there are any profound differences with actual Hive and Shark Hive.
Data:
hduser2#ws-25:~$ more test.txt
2010-01-05 17:51 Visakh
2013-02-16 09:31 Nair
Code:
[localhost:12345] shark> create table test_time(dt string, tm string, nm string) row format delimited fields terminated by '\t' stored as textfile;
Time taken (including network latency): 0.089 seconds
[localhost:12345] shark> describe test_time;
dt string
tm string
nm string
Time taken (including network latency): 0.06 seconds
[localhost:12345] shark> load data local inpath '/home/hduser2/test.txt' overwrite into table test_time;
Time taken (including network latency): 0.124 seconds
[localhost:12345] shark> select * from test_time;
2010-01-05 17:51 Visakh
2013-02-16 09:31 Nair
Time taken (including network latency): 0.397 seconds
[localhost:12345] shark> select cast(dt as date) from test_time;
2010-01-05
2013-02-16
Time taken (including network latency): 0.399 seconds
[localhost:12345] shark> create table test_date as select cast(dt as date) from test_time;
Time taken (including network latency): 0.71 seconds
[localhost:12345] shark> select * from test_date;
2010-01-05
2013-02-16
Time taken (including network latency): 0.366 seconds
[localhost:12345] shark>
If you are using TIMESTAMP, then you could try something in the lines of concatenating the date and time strings and then casting them.
create table test_1 as select cast(concat(dt,' ', tm,':00') as string) as ts from test_time;
select cast(ts as timestamp) from test_1;

It works fine for me by using load command from beeline side.
Data:
[root#hostname workspace]# more timedata
buy,1977-03-12 06:30:23
sell,1989-05-23 07:23:12
creating table statement:
create table mytime(id string ,t timestamp) row format delimited fields terminated by ',';
And loading data statement:
load data local inpath '/root/workspace/timedata' overwrite into table mytime;
Table structure:
describe mytime;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| id | string | |
| t | timestamp | |
+-----------+------------+----------+--+
result of querying:
select * from mytime;
+------------+------------------------+--+
| mytime.id | mytime.t |
+------------+------------------------+--+
| buy | 1977-03-12 06:30:23.0 |
| sell | 1989-05-23 07:23:12.0 |
+------------+------------------------+--+

Apache Hive Data Types are very important for query language and data modeling (representation of the data structures in a table for a company’s database).
It is necessary to know about the data types and its usage to defining the table column types.
There are mainly two types of Apache Hive Data Types. They are,
Primitive Data types
Complex Data types
Will discuss about Complex data types,
Complex Data types further classified into four types. They are explained below,
2.1 ARRAY
It is an ordered collection of fields.
The fields must all be of the same type
Syntax: ARRAY
Example: array (1, 4)
2.2 MAP
It is an unordered collection of key-value pairs.
Keys must be primitives,values may be any type.
Syntax: MAP
Example: map(‘a’,1,’c’,3)
2.3 STRUCT
It is a collection of elements of different types.
Syntax: STRUCT
Example: struct(‘a’, 1 1.0)
2.4 UNION
It is a collection of Heterogeneous data types.
Syntax: UNIONTYPE
Example: create_union(1, ‘a’, 63)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Crate DB Timestamp query - ubuntu-16.04

Related

Sometime Postgresql Query takes to much time to load Data

postgres - date time is automatically converted

PySpark rolling operation with variable ranges

Iterate through all rows, convert timestamp stored in database to UTC version and update it back

To create column with date datatype in hive table

Categories

Resources