What can you do in case you have different date formats in the origin?
I have a case where we are using a to_date function to get the information from a table, but I am getting an error because some of the records have a date format YYYY-DD-MM instead of YYYY-MM-DD
How to apply a uniform solution for this?
To handle this situation (arbitrary text should be converted into a structured date value), I would probably work with regular expressions.
That way you can select the set of records that fit the format you like to support and perform the type conversion on those records.
For example:
create column table date_vals (dateval nvarchar (4000), date_val date)
insert into date_vals values ('2018-01-23', NULL);
insert into date_vals values ('12/23/2016', NULL);
select dateval, to_date(dateval, 'YYYY-MM-DD') as SQL_DATE
from date_vals
where
dateval like_regexpr '[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'
union all
select dateval, to_date(dateval, 'MM/DD/YYYY') as SQL_DATE
from date_vals
where
dateval like_regexpr '[[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4}';
This approach also provides a good option to review the non-matching records and possible come up with additional required pattern.
Why not use a case when in the select where you would test the different regular expressions, then use the to_date to return the date with the proper format.
This would avoid a union all and 2 select statements.
You could add more "format" without more "select" in an additional union.
Unless like_regexpr only works in where clause (I have to admit I never tried that function).
Related
I'm working myself through the Datacamp SQL track, and I'm currently working with date values. I've encountered two examples which seem contradictory to me.
-- Count requests created on January 31, 2017
SELECT count(*)
FROM evanston311
WHERE date_created::date='2017-01-31';
And:
-- Count requests created on February 29, 2016
SELECT count(*)
FROM evanston311
WHERE date_created>= '2016-02-29'
AND date_created< '2016-03-01';
Why do I need to cast the value as date in the first case but not the other?
As with most typed languages, you can rely on implicit type casting... until you can't.
Something like date_created >= '2016-02-29' Postgres can use the type of date_created to figure out how to implicitly cast '2016-02-29'. There's no ambiguity. But sometimes Postgres can't make a guess at all.
OTOH a function like date_part has multiple signatures date_part(text, timestamp) and date_part(text, interval). If you pass it a date string...
test=# select date_part('day', '2019-01-03');
ERROR: function date_part(unknown, unknown) is not unique
LINE 1: select date_part('day', '2019-01-03');
^
HINT: Could not choose a best candidate function. You might need to add explicit type casts.
...Postgres cannot make a guess because the second string could be interpreted as either a timestamp or an interval type. You need to resolve this ambiguity.
# select date_part('day', '2019-01-03'::date);
date_part
-----------
3
Now that Postgres knows you're passing in a date it can correctly guess to use it as a timestamp.
Another reason is as a cheap way to truncate timestamps. In your example date_created::date = '2017-01-31' will truncate date_created to be a date and make the comparison work. Of course, date_created should already be a date...
You can use it on the value being compared if you're not sure if that value will be a date or a timestamp.
select * from table
where date_created = $1::date
This will work the same with '2019-01-02' or '2019-01-02 03:04:05'.
Which brings us to our final reason: making up for bad schemas. Like if date_created is actually a timestamp, or all too common, text. In that case you need to explicitly control how comparisons are made. For example, let's say we had text_created of type text that contained timestamps as strings: naught. And maybe some poorly formatted data crept in that has extra spaces on the end...
-- Text comparison compares the values exactly.
test=# select * from test where text_created = '2019-01-04';
date_created | time_created | text_created
--------------+--------------+--------------
-- Date comparison compares as dates ignoring the extra whitespace.
test=# select * from test where text_created::date = '2019-01-04';
date_created | time_created | text_created
--------------+--------------+--------------
| | 2019-01-04
See Chapter 10. Type Conversion in the Postgres docs for more.
insert into employee(eid,dojo) SELECT
14,coalesce(to_char(dojo,'dd-mm-yyyy'),'')
from employee;
I have to insert into table by selecting it from table,my column dojo has not null constraint and timestamp doesn't allow '' to insert please provide an alternate for this if timestamp is null from select query
Your current query has severals problems, two of which I think my answer can resolve. First, you are trying to insert an empty string '' to handle NULL values in the dojo column. This won't work, because empty string is not a valid timestamp. As others have pointed out, one solution would be to use current_timestamp as a placeholder.
Another problem you have is that you are incorrectly using to_char to format your timestamp data. The output of to_char is a string, and the way you are using it would cause Postgres to reject it. Instead, you should be using to_timestamp(), which can parse a string and return a timestamp. Something like the following is what I believe you intend to do:
insert into employee (eid, dojo)
select 14, coalesce(to_timestamp(dojo, 'DD/MM/YYYY HH:MI:SS PM'), current_timestamp)
from employee;
This assumes that your timestamp data is formatted as follows:
DD/MM/YYYY HH:MI:SS PM (e.g. 19/2/1995 12:00:00 PM)
It also is not clear to me why you are inserting back into the employee table which has non usable data, rather than inserting into a new table. If you choose to reuse employee you might want to scrub away the bad data later.
you can use some default date value like 1st jan 1900 or now()
your query should be like
insert into employee(eid,dojo) SELECT
14,coalesce(to_char(dojo,'dd-mm-yyyy'),now())
from employee;
There is no such thing as a non-null yet blank timestamp. NULL = blank.
There is literally nothing you can do but store a valid timestamp or a null. Since you have a non-null constraint your only option is to pick a default timestamp that you consider "blank".
Using a hard coded date to indicate a blank value is a terrible terrible terrible idea btw. If it is blank, remove the not null constraint, make it null and move on.
I am not trying to be condescending but I do not think you understand nulls. See here
https://en.wikipedia.org/wiki/Null_(SQL)
I have a column saved as a character data type. This column is what I am going to be using as a date. The column goes "YYYY-MM-DD" in that format.
This is a problem because if I ever need to filter by date, I have to go
select col_1, col_2
from table
where date LIKE '2016-04%;
If I want to search for a date range, this turns into a giant complicated mess.
What is the easiest way to convert this to a "date" data type? I want it to continue to be in YYYY-MM-DD order (no timestamp).
My ultimate goal is to be able to search for dates in a format like this:
select col_1, col_2
from table
where date between 2016-01-01 AND 2016-05-31;
What do you guys recommend? I am terrified I am going to corrupt my date if I use an alter statement to convert my data type. (I have a copy of the data saved and can upload it again, but it will take forever.)
Edit: This is a VERY Large table.
Edit Part 2: I originally stored the data as a varchar data type because my dates were not uploading correctly and I got an error message when I tried to save as a date data type. The every date in this column is in the "YYYY-MM-DD" order. My solution was to save it as varchar to avoid the error message (I couldn't figure out what was wrong. I even got rid of leading and trailing spaces.)
Storing a date as a varchar was the wrong choice to begin with. It's very good that you want to change that.
The first step is to convert the columns using an ALTER TABLE statement:
alter table the_table
ALTER COLUMN col_1 TYPE date using col_1::date,
ALTER COLUMN col_2 TYPE date using col_2::date;
Note that this will fail if you have any value in those columns that cannot be convert to a correct date. If you get that you need to first fix those invalid strings before you can change the data type.
I want it to continue to be in YYYY-MM-DD order
This is a misconception. A DATE (or timestamp) does not have a "format". Once it's stored as a date you can display it in any format you want.
My ultimate goal is to be able to search for dates in a format like this:
2016-01-01 is not a valid date literal, a proper (i.e. correctly typed) date constant can be specified e.g. using date '2016-01-01' (note the single quotes!
So your query becomes:
select col_1, col_2
from table
where col_1 between date '2016-01-01' AND date '2016-05-31';
If you have a lot of queries like that you should consider creating an index on the date columns.
Regarding the date constant format:
Are you telling me that despite having the varchar data types, I can still (as of right now) search between specific dates by just typing the word date and putting single quotes between two dates
No, that's not the case. SQL is a strongly typed language and as such will only compare values of the same type.
Using an ANSI date literal (or e.g. to_date()) results in a type constant (i.e. a value with a specific data type).
The difference between date '2016-01-01' and '2016-01-01' is the same as between42(a number) and'42'` (a string).
If you compare a string with a date, you are comparing apples and oranges and the database will do an implicit data type conversion from one type to the other. This is something that should be avoided at all costs.
If you do not want to change the table, you should use the query sagi provided which explicitly converts the strings to dates and then does the comparison on (real) date values (not strings)
You can use POSTGRES TO_DATE() cast function :
SELECT col_1,col_2
FROM Your_Table
WHERE to_date(date_col,'yyyy-mm-dd') between to_date('2016-05-31','yyyy-mm-dd') and to_date('2016-01-01','yyyy-mm-dd')
What #a_horse said.
Plus, if you can't change the data type for some odd reason, to_date() is a safe option to convert the column on the column, but there is no point to use the same expression for provided constants. So:
SELECT col_1, col_2
FROM tbl
WHERE to_date(date, 'YYYY-DD-MM') BETWEEN date '2016-05-31' AND date '2016-01-01';
Or just use string literals without type. The type date is deferred from the context in this expression. And you don't even need to_date(). Since you are using ISO format already. A plain cast is safe:
WHERE date::date BETWEEN '2016-05-31' AND '2016-01-01';
Be sure to use ISO 8601 format for all date strings, so they are unambiguous and valid with any locale.
You can even have an expression index to support the query. Match the actual expression used in queries:
CREATE INDEX tbl_date_idx ON tbl ((date::date)); -- parentheses required!
But I wouldn't use the basic type name date as identifier to begin with.
I've date stored as [27/Feb/2016:00:24:31 +0530].
I want date format in 27/Feb/2016 and also want to order by it.
I've tried this solution but it returns in form 2016-02-27 and also orders properly.
SELECT
TO_DATE( FROM_UNIXTIME( UNIX_TIMESTAMP( SUBSTR( time, 2, 11), 'dd/MMM/yyyy' ))) AS real_date,
url
FROM cleanned_logs
ORDER BY real_date ASC;
To get desired format i tried with date_format() function.It is not available in 1.2.1 so i switched to it from 1.0.1.
SELECT
DATE_FORMAT( FROM_UNIXTIME( UNIX_TIMESTAMP( SUBSTR(time,2,11),'dd/MMM/yyyy')), 'dd/MMM/yyyy') AS real_date,
url
FROM cleanned_logs
ORDER BY real_date ASC;
It gives me desired format but does not order properly.
UPDATED:
SELECT display_date,COUNT(url) FROM
(
SELECT SUBSTR(time,2,11) as display_date,url,UNIX_TIMESTAMP(SUBSTR(time,2,11),'dd/MMM/yyyy') as real_date FROM cleanned_logs order by real_date ASC
)b group by real_date;
Creates problem in grouping. Here hive expects real_date in select clause.
I think you're mixing up the formatting or display of data, with the underlying data itself. If the table stores a date as a string formatted in one manner, [27/Feb/2016:00:24:31 +0530] it's still a string, and strings sort differently than actual dates, timestamps, or numbers.
Ideally, you would store the date as a TIMESTAMP datatype. When you want to display it, use DATE_FORMAT, and when you want to sort it, use ORDER BY on the underlying data field. So if your field is of type TIMESTAMP called some_time, you could query as
SELECT DATE_FORMAT(some_time, 'dd/MMM/yyyy')
FROM some_table
WHERE some_condition
ORDER BY some_time DESC
If you're stuck with a string that's stored as a valid timestamp value, then you'll have to do more work, perhaps
SELECT SUBSTR(some_time, 2, 11)
FROM some_table
WHERE some_condition
ORDER BY unix_timestamp(SUBSTR(some_time,2,11), 'dd/MMM/yyyy'))
The second option displays the value as desired, and orders by a number -- a unix timestamp is just a number, but it has the same order as the date, so no need to cast that further to an actual date.
I have created a table with a column date_time type (varchar2 (40) ) but when i try to insert the current system date and time the doesnt work it gives error (too many values). please tell me what's wrong with the insert statement.
create table HR (type varchar2 (20), raised_by number (6), complaint varchar2 (500), date_time varchar2(40))
insert into HR values ('request',6785,'good morning',sysdate,'YYYY/MM/DD:HH:MI:SSAM')
The immediate cause of the error is that you have too many values, as the message says; that is, more elements in your values clause than there are columns. It is better to explicitly list the column names to avoid future problems and confusion, so you're really doing this:
insert into HR (type, raised_by, complaint, date_time)
values ('request',6785,'good morning',sysdate,'YYYY/MM/DD:HH:MI:SSAM')
... sp you have four columns, but five values. You're trying to insert the current date/time as a string so you would need to use the to_char() function:
insert into HR (type, raised_by, complaint, date_time)
values ('request',6785,'good morning',
to_char(sysdate,'YYYY/MM/DD:HH:MI:SSAM'))
But it is bad practice to store a date (or any other structured data, such as a number) as a string. As the documentation notes:
Each value manipulated by Oracle Database has a data type. The data
type of a value associates a fixed set of properties with the value.
These properties cause Oracle to treat values of one data type
differently from values of another. For example, you can add values of
NUMBER data type, but not values of RAW data type.
If you use a string then you can put invalid values in. If you use a proper DATE data type then you cannot accidentally put an invalid or confusing value in. Oracle will also be able to optimise the use of the column, and will be able to compare values safely and efficiently. Although the format you're using is better than some, using string comparison you still can't easily compare two values to see which is earlier, so you can't properly order by the date_time column for example.
Say you inserted two rows with values 2013/11/15:09:00:00AM and 2013/11/15:08:00:00PM - which is earlier? You need to look at the AM/PM marker to realise the first one is earlier; with a string comparison you'd get it wrong because 8 would be sorted before 9. Using HH24 instead of HH and AM avoids that, but would still be less efficient than a true date.
If you need to store a date with a time component you can use the DATE data type, which has precision down to the second; or if you need fractional seconds too then you can use TIMESTAMP. Then your table and insert would be:
create table HR (type varchar2 (20), raised_by number (6),
complaint varchar2 (500), date_time date);
insert into HR (type, raised_by, complaint, date_time)
values ('request',6785,'good morning',sysdate);
You can still get the value in the format you wanted for display purposes as part of a query:
select type, raised_by, complaint,
to_char(date_time, 'YYYY/MM/DD:HH:MI:SSAM') as date_time
from HR
order by date_time;
TYPE RAISED_BY COMPLAINT DATE_TIME
-------------------- ---------- -------------------- ---------------------
request 6785 good morning 2013/11/15:08:44:35AM
Only treat a date as a string for display.
You can use TO_DATE() or TO_TIMESTAMP or To_char() function,
insert into HR values ('request',6785,'good morning',TO_DATE(sysdate, 'yyyy/mm/dd hh24:mi:ss'))
insert into HR values ('request',6785,'good morning',TO_TIMESTAMP(systimestamp, 'yyyy/mm/dd hh24:mi:ss'))
sysdate - It will give date with time.
systimestamp - It will give datetime with milliseconds.
To_date() - Used to convert string to date.
To_char() - Used to convert date to string.
Probably here you have to use To_char() because your table definition have varchar type for date_time column.
Use TIMESTAMP datatype for date_time. And while inserting use the current timestamp.
create table HR (type varchar2(20), raised_by number(6), complaint varchar2(500), date_time timestamp);
insert into HR values ('request',6785,'good morning', systimestamp);
For other options: http://psoug.org/reference/timestamp.html