How to enter estimated dates in a PSQL timestamp field? - postgresql

An existing database uses a timestamp field, but sometimes the user only knows year and month of the event. How should the data be stored when only the month, day, or time is known? Keep in mind this is for appointment data, e.g. a ."time of appointment" field, but sometimes its historical in nature like "saw my lawyer, I think it was June 2019".
I'm really looking for a general solution that is language independent, likely focusing on relatively standard DB data structures for handling the limitations of a timestamp field in this regard and field types for such a common programming situation that I have named "an estimated timestamp".
It appears a timestamp field doesn't allow a date such as "2019-07-00-T00:00" which would appear to be the ideal solution if it did as it: a) maintains natural sort orders and b) provides a clear indicator its an estimated date with adding an estimated T/F field.
What solutions have you come up with for such a situation, with the understanding this DB data is accessed by many web based front ends.

Related

What is the "Tableau" way to deal with changing data?

As a background to this question: I've been using Tableau for some time now, but I've been using code (Python, Swift, etc) as a crutch for getting some of the more complicated things done. My employer is now making me move what I can away from custom code and into retail software packages because it will make things easier to maintain if I get hit by a bus or something.
The scenario: With code, I find it very easy to deal with constantly changing/growing data by using recursion. I know that this isn't something I can do with Tableau, but I've found that for many problems so far there is a "Tableau way" of thinking/doing that can solve a lot of problems. And, I'm not allowed to use Rserve/TabPy.
I have a batch of transactional data that grows every month by about 1.6mil records. What I would like to do is build something in Tableau that can let me track a complicated rolling total across the data without having to do it manually. In my code of choice, it would have been something like:
Import the data into a frame
For every unique date value in the 'transaction date' field, create a new column with that name
Total the number of transaction in each account for that day
Write the data to the applicable column
Move on to the next day
Then create new columns that store the sum total of transactions for that account over all of the 30 day periods available (date through date + 29 days)
Select the max value of the accounts for a customer for those 30-day sums
Dump all of that 30-day data into a new table based on the customer identifier
It's a lot of steps, but with a couple of nice recursive functions, it's done in a snap with a bit of code. Plus, it can handle the data as it changes.
The actual question: How in the world do I approach problems like this within Tableau since my brain goes straight to recursive function land? I can do this manually with Tableau Prep, but it takes manual tweaking every time the data changes. Is there a better way, or is this just not within the realm of what Tableau really does?
*** Edit 10/1/2020: Minor typo fix. ***

MongoDB - Storing date without timezone

We have a simple application in which we have all user in same timezone & therefore we are not interested to store timezone information in mongo date object.
Reason for such extreme step is we have multiple micro service using common database managed by different developers. Each of them requires to explicitly set timezone related stuff in query & forgetting same results in invalid dataset.
Since currently MongoDB folks Mongo Data Types
doesn't support storing dates without timezone.
Just eager to know that is their any alternative approach to represent date without timezone in mongo by which we can still able to take advantage of mongo database queries date based syntax like date ranges, date etc.
At the same time it would be convenient for DBA's to read and manage records.
Look at this answer: https://stackoverflow.com/a/6776273/6105830
You can use two types of long representation (milliseconds or format yyyyMMddHHmmss). These are the only ways to not store timezone and still be able to make range queries.
Unfortunately you lost some aggregation properties. But you can do something like keeping two representations and use them at opportune times.
UPDATE:
Do not store date as I said before. You will lost many and many features of MongoDB and also will be hard to perform major operators on date fields.
Newer versions of MongoDB has operators to deal with timezone, and it should be enough to work with ISOTime formats. My application was using my own suggestion to store date. Now I have to let my users select their TimeZone (company has grown and we need to expand to other countries). We are struggling to change all models to use timestamp instead of a normalized date format.
For further more explore the link: https://docs.mongodb.com/manual/reference/method/Date/
and you can also use MongoDB official community channel for questioning
Here is the link: https://developer.mongodb.com/community/forums/
You could consider storing all the dates in UTC and presenting to the users in UTC, so you don't have the problem of silent conversion by either client JavaScript, server or MongoDB and therefore confusion.
You can do it like this: new Date(Date.UTC(2000, 01, 28))
Here's MDN link on the subject.

Update timezone columns in a Postgres DB

I have started creating a product database using timestamp without timezone. Then, realizing my error, I started using timestamp with timezone. Now I'd like to unify this to the latter.
Question: Is it possible in an existing Postgres 8.4 DB already containing data to convert all the columns of type timestamp without TZ to ones with TZ?
The best solution would be a script that would do this in one execution (of course). Even a script that would fix a single column at a time would be great. The problem is that a naïve ALTERing the column fails on some existing VIEWs that use it in output (though I fail to see why it is bad in this case - it's just widening the output type a bit).
You want ALTER TABLE ... ALTER COLUMN ... TYPE ... USING (...) which does what you would expect. You will need to decide what timezone these times are in, and supply the suitable AT TIME ZONE expression for your USING clause.
These will ALTERs will rewrite each table, so allow for that. You may want to CLUSTER them afterwards.
However, you seem to think that the two types are interchangeable. They are not. That is why you need to drop and rebuild your views. Also you will want to rewrite any applications appropriately too.
If you can't see why they are different, make a good hot cup of tea or coffee, sit down and read the time & date sections of the manuals and spend an hour or so reading them thoroughly. Perhaps some of the Q&As here too. This is not necesarily a minor change. I'd be especially wary of any daylight-saving / Summer shifts in whatever time zone(s) you decide apply.

User behaviors analysis, stackoverflow public data dump

I have a question - what would be the best way to figure out in which timezone particular user is situated based on the location field data? It seems like considerable amount of users have this field populated with some data, the form, however, is far from being normalized.
While I am figuring out ways to normalize users locations and infer timezones, I wonder, if someone did it before and could share some experience, or maybe (ideally) there is some magic webservice which I can ask for timezones by a given location?
So far I am running through fairly simple process - tokenizing the field, sorting, grouping by frequencies and assigning timezones manually based on my best knowledge.

Calculating price drop Apps or Apps gonna free - App Store

I am working on a Website which is displaying all the apps from the App Store. I am getting AppStore data by their EPF Data Feeds through EPF Importer. In that database I get the pricing of each App for every store. There are dozen of rows in that set of data whose table structure is like:
application_price
The retail price of an application.
Name Key Description
export_date The date this application was exported, in milliseconds since the UNIX Epoch.
application_id Y Foreign key to the application table.
retail_price Retail price of the application, or null if the application is not available.
currency_code The ISO3A currency code.
storefront_id Y Foreign key to the storefront table.
This is the table I get now my problem is that I am not getting any way out that how I can calculate the price reduction of apps and the new free apps from this particular dataset. Can any one have idea how can I calculate it?
Any idea or answer will be highly appreciated.
I tried to store previous data and the current data and then tried to match it. Problem is the table is itself too large and comparing is causing JOIN operation which makes the query execution time to more than a hour which I cannot afford. there are approx 60, 000, 000 rows in the table
With these fields you can't directly determine price drops or new application. You'll have to insert these in your own database, and determine the differences from there. In a relational database like MySQL this isn't too complex:
To determine which applications are new, you can add your own column "first_seen", and then query your database to show all objects where the first_seen column is no longer then a day away.
To calculate price drops you'll have to calculate the difference between the retail_price of the current import, and the previous import.
Since you've edited your question, my edited answer:
It seems like you're having storage/performance issues, and you know what you want to achieve. To solve this you'll have to start measuring and debugging: with datasets this large you'll have to make sure you have the correct indexes. Profiling your queries should helping in finding out if they do.
And probably, your environment is "write once a day", and read "many times a minute". (I'm guessing you're creating a website). So you could speed up the frontend by processing the differences (price drops and new application) on import, rather than when displaying on the website.
If you still are unable to solve this, I suggest you open a more specific question, detailing your DBMS, queries, etc, so the real database administrators will be able to help you. 60 million rows are a lot, but with the correct indexes it should be no real trouble for a normal database system.
Compare the table with one you've downloaded the previous day, and note the differences.
Added:
For only 60 million items, and on a contemporary PC, you should be able to store a sorted array of the store id numbers and previous prices in memory, and do an array lookup faster than the data is arriving from the network feed. Mark any differences found and double-check them against the DB in post-processing.
Actually I also trying to play with these data, and I think best approach for you base on data from Apple.
You have 2 type of data : full and incremental (updated data daily). So within new data from incremental (not really big as full) you can compare only which record updated and insert them into another table to determine pricing has changed.
So you have a list of records (app, song, video...) updated daily with price has change, just get data from new table you created instead of compare or join them from various tables.
Cheers