Negative intercept correction - linear-regression

I have my company data with sales, hours and productivity(sales/hours), I'm trying to find slope and intercept for x = sales y = productivity. However, our database do not accept negative intercept values and I'm unable to figure out how to fix it. Theoretically negative intercepts are acceptable but business point of view it does not. Please help career on line.
Model or a method to correct negative intercepts. Using Excel and Python

Related

Is it possible to access forecast results in calculated measure?

I am trying to find time series outlier using Tableau forecast. I need to compare the actual value with the 95% confidence level in forecast results to determine if it is an outlier.
I understand I can view the forecast results on the chart. But I want to use the forecast results in calculated measure. Is there any way to do it? I cannot find any Tableau functions to retrieve the forecast results.
Xuefei. Doesn't look like there is a way currently, at least going by their help page - https://help.tableau.com/v2019.1/pro/desktop/en-us/forecast_options.htm. If you haven't already considered this - integration with R is easy and that way you could just model it in R (accounting for additive/multiplicative, trend/cyclicity/seasonality) and access the forecast values from R. Integration with Python is also supposed to be easy, although I haven't tried it myself.
Example of code in Tableau to incorporate R code for linear regression (this is the formula for the calc field in Tableau)
SCRIPT_REAL("
fv=log(.arg1)
fpri=.arg2
fit=lm(fv~fpri)
exp(fit$fitted)",SUM([Impressions]),SUM([CPM]))

Interesting results from LSTM RNN : lagged results for train and validation data

As an introduction to RNN/LSTM (stateless) I'm training a model with sequences of 200 days of previous data (X), including things like daily price change, daily volume change, etc and for the labels/Y I have the % price change from current price to that in 4 months. Basically I want to estimate the market direction, not to be 100% accurate. But I'm getting some odd results...
When I then test my model with the training data, I notice the output from the model is a perfect fit when compared to the actual data, it just lags by exactly 4 months:
When I shift the data by 4 months, you can see it's a perfect fit.
I can obviously understand why the training data would be a very close fit as it has seen it all during training - but why the 4 months lag?
It does the same thing with the validation data (note the area I highlighted with the red box for future reference):
Time-shifted:
It's not as close-fitting as the training data, as you'd expect, but still too close for my liking - I just don't think it can be this accurate (see the little blip in the red rectangle as an example). I think the model is acting as a naive predictor, I just can't work out how/why it's possibly doing it.
To generate this output from the validation data, I input a sequence of 200 timesteps, but there's nothing in the data sequence that says what the %price change will be in 4 months - it's entirely disconnected, so how is it so accurate? The 4-month lag is obviously another indicator that something's not right here, I don't know how to explain that, but I suspect the two are linked.
I tried to explain the observation based on some general underlying concept:
If you don't provide a time-lagged X input dataset (lagged t-k where k is the time steps), then basically you will be feeding the LSTM with like today's closing price to predict the same today's closing price..in the training stage. The model will (over fit) and behave Exactly as the answer is known already (data leakage)
If the Y is the predicted percentage change (ie. X * (1 + Y%) = 4 months future price), the present value Yvalue predicted really is just the future discounted by the Y%
so the predicted value will have 4 months shift
Okay, I realised my error; the way I was using the model to generate the forecast line was naive. For every date in the graph above, I was getting an output from the model, and then apply the forecasted % change to the actual price for that date - that would give predicted price in 4 months' time.
Given the markets usually only move within a margin of 0-3% (plus or minus) over a 4 month period, that would mean my forecasts was always going to closely mirror the current price, just with a 4 month lag.
So at every date the predicted output was being re-based, so the model line would never deviate far from the actual; it'd be the same, but within a margin of 0-3% (plus or minus).
Really, the graph isn't important, and it doesn't reflect the way I'll use the output anyway, so I'm going to ditch trying to get a visual representation, and concentrate on trying to find different metrics that lower the validation loss.

How to delete the same peak value in peak analysis and to find the duration of each event (which contain peak value)?

I am a newbie in matlab programming. Actually I have asked this question in mathwork website, but still I did not get the answer, so maybe I can get it here.
I am trying to do peak analysis to find the peak flow of storm water flow. Here is my code :
%% Peak flow analysis
% define data which are used for analysis
Date=finalCSVnew{:,1};
Flow=finalCSVnew{:,7};
figure(2);
[pks,locs]=findpeaks(Flow,Date,'MinPeakProminence',1,'MinPeakDistance',1);
findpeaks(Flow,Date,'MinPeakProminence',1,'MinPeakDistance',1);
text(locs+.02,pks,num2str((1:numel(pks))'));
xlabel('Date and Time');
ylabel('Flow [m3/h]');
title('Find All Peak Flows');
datacursormode on
I managed to plot the peak flow, and find the details about pks and locs. Here, each event should contain one peak flow. So in my case (based on attached picture) I should have 16 events. However, there is duplicate value in event 1 and event 2 which I want to delete one of them, but I am confused about how to do it. Also, I try to find the tutorial for calculating the duration of each event in the website, but I found nothing. I want to know about how to calculate the duration (probably in minutes) based on the peak flow data I got and to delete the peak value in the plot and in pks data which contain duplicates. Is it possible to do that? Could you please help me? Thank you very much for your help.peak flow events
For duplicate values, you can use the unique function to find values which are the same and remove them.
C = unique(pks) % find any unique values and output values without repetitions
https://au.mathworks.com/help/matlab/ref/unique.html
Provide more details about the duration you want to measure. Do you want to measure the duration of just the peak flow? Or of the entire curve leading up to the peak?

Remove Spikes from Periodic Data with MATLAB

I have some data which is time-stamped by a NMEA GPS string that I decode in order to obtain the single data point Year, Month, Day, etcetera.
The problem is is that in few occasions the GPS (probably due to some signal loss) goes boinks and it spits out very very wrong stuff. This generates spikes in the time-stamp data as you can see from the attached picture which plots the vector of Days as outputted by the GPS.
As you can see, the GPS data are generally well behaved, and the days go between 1 and 30/31 each month before falling back to 1 at the next month. In certain moments though, the GPS spits out a random day.
I tried all the standard MATLAB functions for despiking (such as medfilt1 and findpeaks), but either they are not suited to the task, either I do not know how to set them up properly.
My other idea was to loop over differences between adjacent elements, but the vector is so big that the computer cannot really handle it.
Is there any vectorized way to go down such a road and detect those spikes?
Thanks so much!
you need to filter your data using a simple low pass to get rid of the outliers:
windowSize = 5;
b = (1/windowSize)*ones(1,windowSize);
a = 1;
FILTERED_DATA = filter(b,a,YOUR_DATA);
just play a bit with the windowSize until you get the smoothness you want.

Microsoft ReportViewer (2010) Chart Time as Category with intervalled values

first question here, so please pardon me if I'm doing something wrong.
I'm trying to create a Line Chart in MS ReportViewer 2010, which should show how many people were registered on any day. Basically it should show on the X axis the last 30 days, every day on a tick mark and on the Y axis the number of people registered on that day.
On my dataset, I have a 'People' table which includes the 'RegistrationDate' column.
So far (in the last 3 hours :) ) I've managed to do this:
- RegistrationDate on the X (Category) Axis
- CountRows() on the Y (Values) Axis
and if I leave 'Auto' in the Minimum and Maximum scale value I do get some result, but I have these problems:
1) in particular the chart includes on the X-Axis only the dates where there is at least one person registered, but leaves out the ones with zero. Basically the axis isn't divided in 30 days, but around 20, leaving out the days where there are no registrations
There's a check on 'always include 0' but changes nothing
2) I've tried to set the X axis minimum / maximum manually and the data disappears !
Thanks in advance !!
[edit_update] after bashing my head on it for 24h, and realizing
reportviewer documentation and tutorials are scant to say the least (I
guess people use other tools ?), I've implemented a workaround in the
code. In a loop from minDate to maxDate, I filled a list of objects
that have date and registration count as members, thereby filling the
x axis with every value possible, zeros as well. Far from nice and not
very flexible (I still don't understand how the x axis grouping works
very well), but it sort of does its job. Is this a case where I should
reply my own question ? [end_update]
after bashing my head on it for 24h, and realizing reportviewer documentation and tutorials are scant to say the least (I guess people use other tools ?), I've implemented a workaround in the code. In a loop from minDate to maxDate, I filled a list of objects that have date and registration count as members, thereby filling the x axis with every value possible, zeros as well. Far from nice and not very flexible (I still don't understand how the x axis grouping works very well), but it sort of does its job. Is this a case where I should reply my own question ?