Newbie to HQL - Date conversions in Hive - extract Year from free flow text varchar100 - date

I have a simple HQL statement which works. I want to be able to count the occurrence's by Year or by Month. The data Quality is not good,and the incorporationdate column is held as a varchar100 and contains free flowing text and nulls
So I cannot use Substring or YEAR as I need to only perform a extract on the format mm/dd/yyyy to pull out the Year or month. Ideally I would like to create a View and create 2 new Columns , one to show the year and one to show the month this would be the perfect scenario.
select
incorporationdate, count(incorporationdate) from default.chjp2
group by companynumber,incorporationdate
===================================================
Regards
JP

you could use if and test any condition you want before using substring:
select if(date is not null and date rlike '^\d{2}\/\d{2}\/\d{4}$', substr(7),null)

Related

How Can I compare table column by convert it first in postgresql + typeorm

I am beginner in Backend technology and I am developing one query in Typeorm QueryBuilder + PostgreSQL.
My query is look like this :
But I can't convert my timestamp into following format. Can anyone who has expertise in it, please help me to select all records with timestamp in specific format.
I am suffering this things from last two days but still not find any solution. Actually I want to compare this date format in having clause. so I want to first select that date format then I can use that same method to compare column with current date and previous dates.

Is there a way to pull just the Year out a VARCHAR datetime value?

I am working on a project, in Snowflake, that requires me to combine pest & weather data tables, but the opposing tables do not share a common column. My solution has been to create a view that extracts the year from the Pest Table dates, format ex.
CREATION_DATE: 03/26/2020 09:11:15 PM,
to match the YEAR column in the Weather tables, format ex.
DATEYEAR: 2021.
However, I have come to find that the dates in the pest report are VARCHAR as opposed to traditional date/datetime values. Is there a way to pull just the Year out the VARCHAR date value? Additional information: I cannot change the tables themselves, I will need to create a view that preserves all other columns and adds a new "DATEYEAR" column.
Yes , we can and below is working example:
create table test (dt string );
insert into test(dt) values ('01/04/2022');
Select dt, DATE_PART( year, dt::date) from test
To make it easy, you can split the string into an array and take the third member of the array (using 2 since arrays are 0 based):
select strtok_to_array('03/26/2020', '/')[2]::int as MY_YEAR;

Azure Data Factory - Date Expression in component 'derived column' for last 7 Days

I am very new to Azure Data Factory. I have created a simple Pipeline using the same source and target table. The pipeline is supposed to take the date column from the source table, apply an expression to the column date (datatype date as shown in the schema below) in the source table, and it is supposed to either load 1 if the date is within the last 7 days or 0 otherwise in the column last_7_days (as in schema).
The schema for both source and target tables look like this:
Now, I am facing a challenge to write an expression in the component DerivedColumn. I have managed to find out the date which is 7 days ago with the expression: .
In summary, the idea is to load last_7_days column in target Table with value '1' if date >= current date - interval 7 day and date <= current date like in SQL.I would be very grateful, if anyone could help me with any tips and suggestions. If you require further information, please let me know.
Just for more information: source/target table column date is static with 10 years of date from 2020 till 2030 in yyyy-mm-dd format. ETL should run everyday and only put value 1 to the last 7 days: column last_7_days looking back from current date. Other entries must recieve value 0.
You currently use the expression bellow:
case ( date == currentDate(),1, date >= subDays(currentDate(),7),1, date <subDays(currentDate(),7,0, date > currentDate(),0)
If we were you, we will also choose case() function to build the expression.
About you question in comment, I'm afraid no, there isn't an another elegant way for. To achieve our request, Data Flow expression can be complex. It may be comprised with many functions. case() function is the best one for you.
It's very clear and easy to understand.

How to merge two data streams in Alteryx

Alteryx
Table 1 is a google sheet file. It has x fields with primary key.
Every day to that table is added the weekday with the x data
For example:
Monday
Tuesday (is added on Tuesday) and so on.
My problem is that my workflow has a formula that does calculations with all the Weekdays.
Example:
Balance = All_Income - Monday - Tuesday - Wednesday - Thursday - Friday - Saturday - Sunday
But today for example, in the google sheet data I don't have the other weekdays except Monday and Tuesday, so I get the error "Unknown Variable" for Thursday.
I've inserted a Text Input and added all the weekdays.
I want to (Append maybe) these two data streams together so that I have all the weekdays there.
So if I run the calculations I have all the weekdays there.
Right now that formula works only on Sunday, when all weekdays are inserted as columns.
Any idea how to achieve this?
(p.s Creating the weekdays as columns in the google sheet with empty rows is not an option).
I managed to do it by creating a Text Input with the same column names (headers) as the other Data Source and performing a union.
Apparently I needed to perform a IF statement to check all Weekdays if they exist and replace the null values with.
If anyone encounters the same error, feel free to contact for help :)
Use the Transpose tool to verticalise the days of the week. Then Summarize using the primary key and sum the [value] field. That will give you the balance regardless of which days of the week are present in your worksheet. This technique applies to any problem in which one needs to aggregate multiple fields which may or may not be present or known.
Here is the simplest path to victory:
1. Input worksheet.
2. Connect Transpose tool.
3. In Transpose Key Columns, select only primary key.
4. In Transpose Data Columns, deselect all fields except for days of week and Dynamic or Unknown Columns. This will still work even if the worksheet doesn't have all the days of the week because as they come in, the Dynamic or Unknown Columns option will select them as Data columns.
5. In Transpose Missing Columns, select Ignore.
6. Connect Summarize tool.
7. In Summarize, group by primary key and sum on [Value] field.
From here, you can rename the sum_value field to Balance or something else friendly. You can also use a Join tool, joining on primary keys, to the original worksheet to get back to where you started with the new aggregated value.

sqlite3: retrieving data based on month and year from local database in iphone

In my application I have to store data month wise and year wise. So for this, I have to store the data along with date into database.
My requirement is how to store in terms of date and how to retrieve data with group by month and year. In my app I am showing a table of years and months, based on selected month and year. I have to show the data in a dashboard.
My problem is in storing and retrieving date data types.
Use the following syntax
SELECT * FROM DATABASE WHERE REQUIREDDATEFIELD LIKE '%2011-01%';
2011 is supposed to be the year
01 is supposed to be the month
DATABASE is supposed to be your mysql database name
REQUIREDDATEFIELD is supposed to be the field you are hoping to sort from month and year.
like '%2011-01%' is supposed to be meaning, all the records containing 2011-01 in the given field. It could be in the beginning or the end or in the middle of a large text, so having % in both the beginning and end of the search criteria is a good habit.
You just select either for a specific month or year or month and year. Or if you want all, you use GROUP BY.
I know this answer is quite vague and generic, but that's because your question is vague. You probably need to be more specific. Explain not only what you want to do, but what you have tried, and in which way that didn't work.