Time-series Stock Data in Matlab - matlab

I'm a MatLab beginner, and have no idea what I'm doing.
I have stock data in CSV format which is something like this:
+--------+--------+------+------+-----+-------+
| Ticker | Date | Open | High | Low | Close |
+--------+--------+------+------+-----+-------+
| APPL | 25-Oct | 10 | 12 | 9 | 12 |
| XYZ | 25-Oct | 10 | 12 | 9 | 12 |
| APPL | 26-Oct | 12 | 15 | 10 | 15 |
+--------+--------+------+------+-----+-------+
There are many stock tickers each day. The file is many rows long listing daily stock prices for each ticket on a particular stock exchange.
I'm aiming to do some fun time-series analysis on the 'close' price for each ticker.
To start with making simple charts of a single ticker over time, or multiple tickers over time would be awesome.
Questions:
1. Best way to import data.
I have a big long CSV. But am lost as to which import method is best. Column Vectors, Numeric Matrix, Cell Array or Table?
2. I need to create a time-series object for each ticker, right?
How would one go about that? I've been looking at this guide, but I'm unsure how to make an object for each ticker, over the span of time defined in the file.
http://www.mathworks.com/help/matlab/ref/timeseries-class.html
Any advice, pointers and resources that are good for beginners are appreciated massively!
Thanks!

There are a ton of ways to import data into MATLAB. Before you import data, I would make sure numeric columns hold ONLY numeric data or MATLAB can complain. Some options in my personal order of preference:
d = readtable('mycsvfile.csv'); % puts data in nice table datatype. I find it makes code more readable.
d = csvread('myfile.csv',1,0); % the 1 skips the first row which is probably header names for the csv file. Puts all the data in a matrix and you have to keep track of what column is what.
xlsread is good for reading excel files
Copy and paste the data into a variable in your workspace. Do save blahblah.mat so you can easily load the data later.
I personally wouldn't bother with financial time series objects. It's just going to complicate your life if you're new to MATLAB. If you loaded the data using tableread (i.e. option 1) you can then execute something like:
aapl_indicator = strcmp(d.Ticker, 'AAPL');
to get a vector indicating whether a row in your table is AAPL or not. Then:
close_price_aapl = d.Close(aapl_indicator);
will give you a vector of Apple's closing prices.
When you get down to doing math, you want to be using the matrices.

Related

Tableau - Return Name of Column with Max Value

I am new to Tableau visualization and need some help.
I have a set of shipping lanes which have whole numbers values based on the duration for the shipment.
Ex:
| Lane Name | 0 Day | 1 Day | 2 Day | 3 Day | 4 Day |
| SFO-LAX | 0 | 30 | 60 | 10 | 0 |
| JFK-LAX | 0 | 10 | 20 | 50 | 80 |
For each Lane Name, I want to return the column header based on the max value.
i.e. for SFO-LAX I would return '2 Day', for JFK-LAX I would return '4 Day', etc.
I then want to set this as a filter to only show 2 Day and 3 Day results in my Tableau data set.
Can someone help?
Two steps to this.
The first step is pretty easy, pivot your data. Read the Tableau help to learn how to PIVOT your data for analysis - i.e. make it look to Tableau as a longer 3 column data set with Lane, Duration, Value as the 3 columns. Tableau's PIVOT feature will let you view your data in that format (which makes analysis much easier) without actually changing the format of your data files.
The second step is a bit trickier than you'd expect at first glance, and there are a few ways to accomplish it. The Tableau features that can be used for this are LOD calcs, table calcs, filters and possibly sets. These are some of the more powerful but complicated parts of Tableau, so worth your time to learn about, but expect to take a while to spin up on them.
The easiest solution is probably to use one of the RANK() function - start as a quick table calc. Set your partitioning and addressing as desired so that the ranks are computed for the blocks of data that you desire - say partitioning on Lane and addressing or computing by Duration. Then when you are happy with the ranks you see, move the rank calculation to the filter shelf and only display data where rank = 1.
This is a quick solution once you get the hang of it, but it can get slow for very large data sets since the rank calculations are done on the client side, requiring fetching all the data that you end up not displaying. If performance becomes an issue, you might want to look at other solutions to do more of the calculations server side - possibly using LOD calcs or analytic aka windowing queries invoked from custom SQL

Issue with displaying of information in a Tables/Matrix visual - Power BI

Hi I'm new to Power BI desktop, but have come across an issue when displaying information, Hopefully it's due to my lack of knowledge, but I can't seem to find a way to display values in rows one after the other similar to Pivot tables functionality.
For example so if I had the following table
Location | Salary | Number
A | 100 | 1
A | 200 | 2
B | 100 | 3
B | 400 | 4
C | 400 | 5
D | 800 | 6
What I'd like to produce is something like .....
A | B | C | D
300 | 500 | 400 | 800 <-- Salary Sum
3 | 7 | 5 | 6 <-- Number Sum
I have a direct link with my data source, please suggest a way to display the same with tables/matrix
Thank you in advance
Unfortunately this is currently not supported in Power BI, but maybe there is some light at the end of the tunnel... The Power BI team have started working on this much requested feature. See here
As Tom said, this is available with the August release. You can check which version you have by going to File -> Help -> About. If you have an older verion, you can go here to download the right one for you (32-bit vs 64-bit).
Once you have made sure you are running the August version, simply create a matrix with Location in the Columns field and Salary and Number in the Values field. Then go into the formatting pane and under Values, turn Show on rows to on.
Try this : Go to query editor, select the first column of the desired table, location in your case and from transform tab, select unpivot other columns.
That's it! Now go and drop your visual.

Is it possible to make multiple fields default to the same date, but also be individually editable?

I am VERY new to Access - I was sort of thrust into designing a database for a research project I'm involved in. So, please bear with me because I know next to nothing :) The problem I am having is thus:
My database is for a medical research project, and is very time and date dependent, by which I mean I need to capture the date and time for each piece of data so that we end up with a sort of timeline of events for each subject.
As is, I have something like the following for each piece of data: (Each in it's own field)
ArrivalDate
ArrivalTime
HeartRateDate
HeartRateTime
HeartRateData
TemperatureDate
TemperatureTime
TemperatureData
BloodPressureDate
BloodPressureTime
BloodPressureData
There are around 200 similar pieces of data that I need to collect for each patient. To avoid having to re-enter the same data over and over, and also to reduce the potential for error, I would like to have all of the date fields in a given patient record default to the first one that is entered, in this case "Arrival Date". However, I also need each date field to be editable without affecting the others. The reason for this is that in the event that a patient's visit occurs over the span of a few days we can accurately record that.
I have tried messing around with the default value setting, as well as setting the control source to reference the "Arrival Date" field, but then of course any changes to one field affect them all. I am not even sure that what I am trying to do is possible but I will appreciate any help and/or suggestions!
Thank you in advance
Having all this data in separate columns of a big table isn't going to work. You don't measure things like temperature or blood pressure only once per patient, do you?
This is a classic one-to-many relation.
You should have a separate Measurements table, looking e.g. like this:
+--------+-----------+---------------+------------------+-----------+
| MeasID | PatientID | MeasType | MeasDateTime | MeasValue |
+--------+-----------+---------------+------------------+-----------+
| 1 | 1 | Temperature | 2017-05-17 14:30 | 38.2 |
| 2 | 1 | BloodPressure | 2017-05-17 14:30 | 130/90 |
| 3 | 1 | Temperature | 2017-05-17 18:00 | 38.5 |
| 4 | 2 | Temperature | etc. | |
+--------+-----------+---------------+------------------+-----------+
As Barmar wrote, there is no reason to have separate columns for date and time.
In the form where measurements are entered, you can use the BeforeInsert event to set MeasDateTime to the current time, with the Now() function.
So the user never has to enter it manually, but they can edit it if the measurement was at a different time than entering the data.

How to count frequency of columns in row in a typedpipe in scalding?

I'm currently working on a mapreduce job using scalding. I'm trying to threshold based on how many times I see a particular value among the rows in my typedpipe. For example, if I had these rows in my typedpipe:
Column 1 | Column 2
'hi' | 'hey'
'hi' | 'ho'
'hi' | 'ho'
'bye' | 'bye'
I would want to append to each row the frequency I saw the value in column 1 and column 2 in every row. Meaning the output would look like:
Column 1 | Column 2 | Column 1 Freq | Column 2 Freq
'hi' | 'hey'| 3 | 1
'hi' | 'ho' | 3 | 2
'hi' | 'ho' | 3 | 2
'bye' | 'bye' | 1 | 1
Currently, I'm doing that by grouping the typed pipe by each column, like so:
val key2Freqs = input.groupBy('key2) {
_.size('key2Freq)
}.rename('key2 -> 'key2Right).project('key2Right, 'key2Freq);
Then joining the original input with key2Freqs like so:
.joinWithSmaller('key2 -> 'key2Right, key2Freqs, joiner = new LeftJoin)
However, this is really slow and seems to me to be pretty inefficient for what is essentially a pretty simple task. It gets especially long b/c I have 6 different keys that I want to get these values for, and I am currently mapping and joining 6 different times in my job. There must be a better way to do this, right?
If the number of the different values in each column is small enough to fit them all into memory, you could .map your columns into Map[String,Int], and then .groupAll.sum to count them all in one go (I am using the "typed api" notation, don't quite remember how exactly this is done in the fields api, but you get the idea). You'll need to use the MapMonoid from algebird, or just write your own if you don't want to add a dependency for this one thing, it is not hard.
You'd then end up with a pipe, containing a single entry for the resulting Map. Now, you can get your original pipe, and do .crossWithTiny to bring the map with counts into it, and then .map to extract individual counts.
Otherwise, if you can't keep all that in memory, then what are you doing now seems like it is the only way ... unless you are actually looking for an approximation of "top hitters", rather than exact counts of the entire universe ... in which case, check out algebird's SketchMap.

How to create a form for a 2D/multi-dimensional table?

I need to create a form in CakePHP to allow users enter data(numbers) into the cells shown on the table below. The input screen needs look like the table shown below. The users should be able to select any cell they want to enter/update the value, then type in the value, click submit and submit the value/s. Each "Metric section (e.g. Metric A, Metric B..)" will have a submit button so the users can edit/update each section on the table.
___________________________ ____________________
|___ ___|___2006_______|__2007____|___2008___|
|_METRIC A__|______________|__________|__________|
| item A1 | 1 | 5 | 7 |
| item A2 | 15 | 18 | 21 |
| item A3 | 3 | 6 | 11 |
| item A4 | 1 | 1 | 3 |
|___________|______________|__________|__________|
|_METRIC B__|______________|__________|__________|
| item B1 | 12 | 18 | 31 |
| item B2 | 1 | 4 | 6 |
| item B3 | 0 | 0 | 2 |
--------------------------------------------------
As you can see each metric section is a two dimensional table. So I would like to capture the input data in a 2D array. Currently I have successfully created the display for the data (which was much easier). I simply created an array of metrics, which is an array of 2D arrays. Then I passed that array of 2D arrays to the view file to display the table.
I am kind of lost on how to get the user input for this table. Anyone had any similar experiences? Any suggestion will be greatly helpful to me.
You might want to start be researching some Javascript grid solutions with editing capabilities. Ext's Grid comes to mind, but there will most definitely be alternatives for your JavaScript framework of choice (eg. jQuery). This will handle all of the onclick goodness on the client side leaving you to implement an AJAX action that the form can submit data at. There it is up to you to determine which models you update with each part of the data.
In case this becomes useful to anyone else, I'm posting my solution here..
I managed to tackle this one by using 'Parallel two dimensional arrays'.
It's a simply trick. You create the 2D array for the display of data. So you have an array for each row in the table. For example if the columns are 'Year 2006', 'Year 2007' & 'Year 2008' like in my question above. You will create a row of data with a value for each year.
$data_for_row_array = array('10', '15', '35');
Like this you create a data array for each row in your table and you will get an array of rows:
$rows_array = array(data_for_row_1_array, data_for_row_2_array, ..) etc
This will do for the display array.. Now if you want to capture users input for each of those cells in the table all you need to do is create a similar second 2D array with the ids of each data cell taken from the database.. if an id does not exist yet, just leave it blank. And when the user enters data and submits it, just loop through the two 2D arrays and use the ID from one array and the data from the other array to match with it. because in both arrays the matching ID and it's data value will be at the same position. So if you find and ID in the "ids array" and a data value in the same position in the "data array", you just use it to update the database table. This is the concept of 'parallel two dimensional arrays'.
And if you find and empty id value in the IDs array yet some data value at the same position in the data array that means the user has entered a completely new value so you save it as a new data cell in the database.
Hope this gives you some idea.. if it isn't clear just let me know and i will explain it in more detail.