Hive date based partitions

Hive date based partitions - date

I have data in the following form on HDFS:-
/basedir/yyyymmdd/fixedname/files
Where the yyyymmdd is the date folder and files are the list of files added in the directory. I need a table in hive to pick up data from yyyymmdd/fixedname directory. This should also work when i added a new date. e.g. i add something on 5th March 2013 so all files added on that day would go to 20130305/fixedname folder. On 6th March 2013, all files would go to 20130306/fixedname folder.
How do i alter a hive table to pickup data from the changing date but fixed folder within it?

Do you have a partitioned table? Let's say that you already have a partitioned table by the column date and you want to add new data. In this case, you will have to add the data to the new directory and tell to hive table (specifically to the metastore) that it has a new partition using ALTER TABLE ADD PARTITION COMMAND.
Let's say that you do have not created any table yet. In this case you will have to create a partitioned table and then insert the data into this table from queries. The magic comes up when you set these two flags:
set hive.exec.dynamic.partition=yes
set hive.exec.dynamic.partition.mode = nonstrict;
These flags allow dynamic partitions (For more details read here).
Remember that you will have directories like:
/date=YYYYMMDD/fixedname/files
So you have to tell to Hive to pick up all the data into subdirectories in a recursive way. You should set the following flag (here there is a better explanation)
SET mapred.input.dir.recursive=true;
Finally you will able to make queries by date and get all the data in the subdirectories from the date you specified in the query (/date=YYYYMMDD/...).
Hope this helps you.

Related

Data Factory - Can I use the date field in a CSV to determine the destination folder in Copy Activity

I have some CSV files that I want to copy to a specific folder in ADLS based on the date column within the file.
i.e. CSV file has a column named "date" that reads "2022-02-23" on all rows. I want to copy that file to a folder that has the corresponding year and month, such as "/curated/UK/ProjectABC/2022/02"
I've got a Lookup activity that's pointing to the source CSV file and populating a Set Variable activity with the month using this dynamic content - #substring(string(activity('Lookup1').output.firstrow.date),5,2)
Would this be the right approach, to use a variable?
I cant use variables in the Directory portion of the Sink Dataset, as far as I know.
Have you come across this situation before?

Sounds like you're on the right path. You can use absolutely use Dataset parameters:
Then populate them in your pipeline using a variable (or parameter, or expression):

POWER QUERY APPEND date is missing

I have two tables with similar columns
Apply append, all details were ok except for the date of the new table.
Old data dates are available, but the new one is missing and specified as "null"
I check their format, both are the same
Anyone once knows what is the issue.
Below screenshot for reference
enter image description here

It's look like new table has different column name for date.

Date in table is dd.mm.yyyy - Can't import to postgres via csv

I'm trying to add a .csv to a table in database.
All dates in the .csv is in this format dd.mm.yyyy ( 18.10.2017).
I'm importing via pgadmin and always get an invalid input error.
I've tried to use almost all date formatting options for the column but without any luck.
I would rather not change the csv manually.
Can anyone help me with this?

I almost always import data into a staging table where all the columns are strings.
Then I use queries to load the final table.
This has several advantages:
It gives me much more control over how the data is transformed.
It makes it easier to debug problems -- the entire staging table can be queried to find all rows with a particular issue (for instance).
Additional validations can be performed before loading into the final table.
This is just a suggestion, but you might find that overall this takes less time.

The DateStyle setting is probably set to MDY. You can check this by running:
show datestyle;
Although dd.mm.yyy isn't listed as a standard input format, if you expect it to work, you will need the DateStyle to line up with the ordering here (DMY).
The date/time style can be selected by the user using the SET datestyle command, the DateStyle parameter in the postgresql.conf configuration file, or the PGDATESTYLE environment variable on the server or client.
See section "Date Order Conventions":
https://www.postgresql.org/docs/current/static/datatype-datetime.html

Excel 2010 - Pivot using external csv file - how to make dates work?

I have a set of pivot tables that use external csv files as their data sources. The csv files originally contained dates in the format dd/mm/yy (e.g. 31/01/13). The pivot tables did not recognise these as dates. I converted the dates in the csv files to dd/mm/yyyy (e.g. 31/01/2013) but these were still not recognised as dates by the pivot tables.
I tried setting up a calculated field =DATEVALUE(date_from_csv) but when used in the pivot table (I'm using the Max option to select the most recent date) I get #VALUE! errors.
I have tried converting the csv file to xlsx and also importing the data into the workbook that contains the pivot table - but I can't change from the external connection to use the internal data. I don't want to rebuild the pivots as there are a lot of variables and formatting that would take ages to redo.
Any ideas??

The problem was caused by the date column being blank for some rows and I found that if I moved a row to the top (after the header line) that had all the fields filled in, then Excel got the formats correct and the pivot tables now work!

Filemaker Pro 11 Script - Add fields dynamically?

So we use FMP11 to do inventory management. I do price updates to our products 3 times a week and it would be nice to store our past cost values into a separate table for historical pricing. I know how I would go about doing most of it, but is it possible to create a new field that is labeled as today's date on the fly? So my headers would be labeled with that days date and the old pricing value from my other fields would be inserted.

It is a bad idea to create new fields for the purpose you're describing. Create additional records instead, and do your report going from top to bottom instead of left to right.
That said, if you want to do it, you can using FileMaker Server Advanced with JDBC and the ALTER TABLE command.

Create an new table (e.g. ArchivePricing) to hold the values you want to reference at a later date (e.g. ChangeDate, Price, Item, ItemID, etc.).
Create a new field in the current table called z|newprice - use this to type in your new pricing (you might do this on a list layout so you can easily change a bunch of prices).
Create a button that triggers a script that:
creates a new record in the new ArchivePricing table and inserts the ItemID (thus creating a link to the original table) - this can be done using script parameters or setting a variable)... the script continues.
uses the "set field" script step to insert info to this new record in the ArchivePricing table.
uses the Get (CurrentDate) function to insert the date into the ChangeDate field (thus capturing the date the change was made).
Before the script finishes be sure to use "set field" back in the original table to move the value in z|newprice field into your normal Price field. Do this at the end of the script and then commit record.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Hive date based partitions - date

Related

Data Factory - Can I use the date field in a CSV to determine the destination folder in Copy Activity

POWER QUERY APPEND date is missing

Date in table is dd.mm.yyyy - Can't import to postgres via csv

Excel 2010 - Pivot using external csv file - how to make dates work?

Filemaker Pro 11 Script - Add fields dynamically?

Categories

Resources