I have a MySQL table with over 6 million records, each with a epoch timestamp. I need to plot all the timestamps across time of day. In other words, I need to see how many timestamps are between 7am and 8am, 8am to 9am, etc - for all 24 hour blocks in day. I do not need them plotted by day of the week or month, just time in the day. Each timestamp is in UTC.
can someone help me?
You could use MySQL's FROM_UNIXTIME function to get date strings from the database, and dump the results into a file, which you can subsequently read into MATLAB. Next, one of the ways to extract the time of day of each record is to use MATLAB's datevec function, to get each component of the date string seperately:
datevec('2007-11-30 10:30:19')
ans =
2007 11 30 10 30 19
For instance, if you read in the data as one long vector with date strings, you could apply datevec to this vector, and subsequently grab the 'hour column' of the resulting matrix. Then, you can make a histogram of the counts using the hist or histc functions, depending on whether you want to specify bin centers or bin edges. If you have an hour column H, something like hist(H, 0:23) should work. The histc function might be a bit more natural for the nature of your data, but is slightly more involved; check the documentation.
Related
When I apply Month/Year to Cases or Deaths from my data, the values explode. For Cases it goes from approximately 48 million to over 1 billion, and for Deaths it goes from about 700 thousand to over 22 million. However, when I try the same thing with Initial Claims or the Stringency Index, my values remain correct. I'm trying to find the month over month percentage change by the way. And I'm using the Date column. I only select 2020 and 2021 in the filter for Year.
What I'm asking about is Sheet 21.
Link to workbook: https://public.tableau.com/app/profile/nilajah.rivers/viz/CoronaVirusProject_16323687296770/Sheet21
Your problem is that the data points are daily cumulative deaths. If you change the date aggregation to anything other than days, Tableau will default to summing the numbers for all the days in the month. This will give the wrong result, obviously.
If you want to show the correct total deaths or cases regardless of the time aggregation (months, days, weeks etc.) then you could use the New Case or New Death numbers plus a running sum table calculation. This will always give the correct total for the time period.
Table calculations will also allow automatic calculation of the period to period % change from the same data fields.
This is a common problem when working with datasets that offer pre-calculated aggregations. Tableau doesn't need that as it can dynamically calculate the aggregation of a field over any given time period but it is easy to forget which field has pre-aggregated data and which has raw data. Pre-aggregated fields assume a particular time period and can't be used for different time periods without disentangling that assumption (which is unnecessary if you also have the raw data (in this case daily new deaths/cases).
I have a chart that shows the number of departures for a given 15 minute interval as seen here.
I need to compound these counts backwards for one hour. For example, the 3 departures shown at 11:00 need to also be represented at the 10:00, 10:15, 10:30, and 10:45 columns. When completed, the 10:00 would have a total of 6 departures (10:15 -> 6, 10:30 ->5, 10:45 -> 4, 11:00 -> 4).
I have done this via VBA in excell, but am now needing to replicate the chart in Tableau and have been beating my head in for about two weeks now. I'd love to hear any and all suggestions.
You can use a Cartesian join against a large enough date range of your choosing to in effect resample your data and add the additional time intervals you desire.
For example, if you have a month's worth of data (min date -> max date = 30 days), then you have (30 * 24 * 4) 2880 15 minute intervals.
Create all those intervals in a separate data sheet
Add a bogus column with value of link for all rows
Create the same bogus in your actual data source
Join the two sheets together on the link column
Create a calculated field that is something along the following:
[Interval] <= [Flight Time] AND [Interval] >= DATEADD('hour',-1,[Flight Time])
This calculated field will evaluate to TRUE when the interval time is within one hour before the flight time. You can then drag this field onto your filter shelf and select TRUE value only. Effectively your [Interval] field becomes your new date field.
I would recommend adding that filter to the context and applying across the entire datasource. Before you add this filter you'll have 2880 times the about of data so be sure to do a live view first. Be careful with extracts using Cartesian joins as you could potentially be extracting more than you bargained for.
See the following links for different techniques on how to do this and re-sampling dates in general in tableau.
https://community.tableau.com/thread/151387
Depending on the size of your data (and if a live view is not necessary) it is often times easier and more efficient to do this type of pre-processing outside of tableau in SQL or something like python's pandas library.
Here is another solution provided from the Tableau Cumunity Forum. I have not tried tyvich's solution yet, but I know this one got me where I needed. Please follow the link to see the solution using moving table calculations.
https://community.tableau.com/thread/251154
I'm using Postgresql (on Amazon Redshift), and I need to calculate the difference between two dates and then use that value in a formula to compute a ratio, so the date difference needs to be translated to a numeric value, preferably a float or double precision.
I have two dates: 1/1/2017 and 1/1/2014. I need to find the difference between these two dates in number of days.
When I use the age function I get 1080 days:
select age('2017-01-01','2014-01-01')
However, since age returns an interval and I need to work with a numeric result, I am using EXTRACT to convert the final value. I chose epoch since I wasn't able to find any other value for EXTRACT that would yield the number of time units between the two dates. This formula yields 1095.75 days (the divisor is the number of seconds in a day):
select extract(epoch from age('2017-01-01','2014-01-01'))/86400
Why am I getting a difference of 19.75 days when using age vs using extract?
Did you try
select '2017-01-01'::date - '2014-01-01'::date;
The difference between two dates is number of days in integer
1080 is the figure you would get if every month was 30 days long (36 months by 30 days equals 1080), as it would be if you used justify_days (either explicitly or if the DBMS called it implicitly). You don't say how you're getting this 1080 figure since I believe the duration would normally just print out something like 3 years, but that seems the most likely case
1095.75 seems the more correct figure, being 365.25 days multiplied by three years.
Out of those two, I would go with the latter method.
Although, as pointed out at http://www.postgresql.org/docs/8.1/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT, calculating the difference between two date types should yield the number of days:
select dtend - dtstart from somewhere
Redshift release notes say they recently released a months between function which looks similar to oracles months between function if that's what you're looking for. http://docs.aws.amazon.com/redshift/latest/dg/r_MONTHS_BETWEEN_function.html
I am relatively inexperienced with matlab, as I only use it occasionally. I am trying to plot a large range of values against time and I am running into some problems.
The data, which is from a text file, with about 55000 entries, gives the information in the following format:
year month day hour minute second value
The seconds column has accuracy of 6 decimal places and there are about 24hrs worth of data.
What I want to do is plot the values against time, which works fine. However as a result of my code below, the x-axis has label ticks in serial date number format, which is not very useful when looking at the figure. I want to change the labels to something more useful such intervals of hours. However I am not sure how to go about doing this.
Here is the code:
A = dlmread('data.txt',' ');
time = datenum(A(:,1),A(:,2),A(:,3),A(:,4),A(:,5),A(:,6));
scatter(time,A(:,7),1)
axis([min(time) max(time) min(A(:,7)) max(A(:,7))])
I found a solution here: matlab ticks with certain labels however, the process here is manual and with so much information I don't want to do this manually. How would I automate this process? or is there a better way to do what I am trying to achieve?
EDIT: I also found this method: http://www.mathworks.com/help/matlab/ref/datetick.html#btpnuk4-1, however, I dont want to show the actual date, I rather want to show intervals of time, ie an hour or 30 minutes.
EDIT 2: I have found a somewhat satisfactory solution. It could still be improved upon, so I don't know if I should submit this as an answer to my own question or not, but here it is:
A = dlmread('data.txt',' ');
time = datenum(A(:,1),A(:,2),A(:,3),A(:,4),A(:,5),A(:,6));
temp= time(1);
timediff = time - temp;
scatter(timediff,A(:,7),1)
axis([min(timediff) max(timediff) min(A(:,7)) max(A(:,7))])
datetick('x', 'HH')
This takes the original time vector in serialized time format and subtracts the first time from all the subsequent times to get the difference. The it uses the datetick function to to convert that to hours. It isn't ideal because instead of 24 hours it goes back to 00, but its the best I have tried thus far.
With reference to the other article, you will have to follow the same method but in order to automate the process you'll need to form the vectors of xtick and xticklabels as you read in the data and after you've plotted the data change the xticks and xticklabels.
Its not difficult what you're trying to do, but I will need more details of how you want to organize the ticks to be able to exactly say the steps that you'd have to follow
Matlab serial time is simply days since January 1, 0000, so your timediff variable is really elapsed days (and fractions thereof) since the start of your experiment. If you want your x ticks to be elapsed hours you could multiply timediff by 24.
scatter(timediff * 24, values)
This avoids the weirdness that can arise when using datetick as well.
MatLAB noob here..
I have a 2 column matrix with start/end (in seconds) times - in a 2 column matrix.
I also have a single column matrix of time stamps. How do I find the time stamps that occur in each interval?
Not going to put any code here, as you provided none.
There are some alternatives, this is the most straight forward (though might not be the most efficient for the computer):
Use a for loop, for each line of your start/end matrix, and for each do another for loop for each element in your time stamp matrix, and assess, using if function, if each time stamp is between start and end times.
If you don't know how to use FOR and IF, type help FOR, help IF and find out. Or google it