Final Mat_Lab Project - matlab

I am working on a final Mat_Lab project and while I have the code written I am having an issue where it won't graph an imported data.
the first two parts are as follow:
%parte 1 abre el archivo:
Data = readtable('Proyecto_Final.xlsx');
opts = 'skip Ln1';
%parte 2 graficas lineales:
hold on;
figure(1);
x=Data(:, 1);
y=Data(:, 2:11);
plot(x,y,'m*:');
xlabel('Time(s)');
ylabel("Day 1", "Day 2", "Day 3", "Day 4", "Day 5", "Day 6", "Day 7", "Day 8", "Day 9", "Day 10");
hold off;
The data in question comes from an Excell file and the following mistakes come through:
Error using tabular/plot (line 217)
Tables and timetables do not have a plot method. To plot a table or a timetable, use the stackedplot function. As an
alternative, extract table or timetable variables using dot or brace subscripting, and then pass the variables as input
arguments to the plot function.
Error in ProyectoFinal (line 17)
plot(x,y,'m*:');
Which I did added under the ylabel as stackedplot (x,y);

You can't access a column of a MATLAB table by the command Data(:, 1), rather you should use Data.Var1 where Var1 is the name of the column.
Using the following test.txt file an MWE (minimum workable example) is provided. Edit this as per your table:
test.txt
1 12 12 47
2 24 19 32
4 45 48 31
5 54 12 27
6 68 95 56
7 82 45 56
8 94 36 56
9 102 12 24
MWE:
Data = readtable('test.txt');
hold on;
figure(1);
x = Data.Var1
y = [Data.Var2, Data.Var3, Data.Var4];
plot(x,y,'m*:');
xlabel('Time(s)');
ylabel("Day data")
legend("Day 1", "Day 2", "Day 3");
hold off;
Output plot:
Note: I think with the command \ylabel, you are actually trying to produce legends. See the corresponding documentation (ylabel and legend).

Solved the issues pertaining the project, I stayed all night up until 6am going back and forth a Discord server called matlab, project ran well and was successfully submitted. Thanks for all the help, here is how it was solved:
Thanks again for your help and pointers, may these images help you.

Related

How to find all the rows that share the same value on two columns?

Dataset example:
sex favourite_meal favourite_color age weight(kg)
Tom M pizza red 18 90
Jess F lasagna blue 20 43
Mark M pizza red 30 68
David M hamburger purple 25 70
Lucy F sushi green 18 47
How can I compare each row with the others and find which one share for example the same (sex,favourite_meal) couple. The idea is to check on a large dataset which rows share the same values on two attributes (columns). In this example would be Tom and Mark which share (M, pizza); how to do the same on a large dataset where you can't check by eye?
One awk option is process the source data twice. First get the count of uniq values in columns 2 and 3 into an array. Then use those counts to filter the data:
awk 'NR==FNR {p[$2" "$3]++} FNR<NR {for(n in p) if (p[n]>1 && $2" "$3==n) { print}}' m.dat m.dat
Tom M pizza red 18 90
Mark M pizza red 30 68
you can use pandas to do this
import pandas as pd
# Initialize data to Dicts of series.
d = {'Name': pd.Series(['Tom', 'Jess', 'Mark', 'David', 'Lucy']),
'sex': pd.Series(['M', 'F', 'M', 'M', 'F']),
'favorite_meal': pd.Series(['pizza', 'lasanga', 'pizza', 'hamburger', 'sushi']),
'favorite_color': pd.Series(['red', 'blue', 'red', 'purple', 'green']),
'age': pd.Series([18, 20, 30, 20, 18]),
' weights(kg)': pd.Series([90, 43, 68, 70, 47])
}
df = pd.DataFrame(d)
for y, x in df.groupby(['favorite_meal', 'sex']):
print("....................")
print(x.to_string(index=0, header=0))
In each iteration, the for loop is operating on a set of similar rows.

Regular expression for MS SQL

I want arrange these numbers first two numbers into following categories i confuse because it repeats it self.
Thank you fo your help.
210690, 391910, 392490, 880390, 847321, 940290, 300420, 300410, 901890, 901890, 030269,080530, 630399
1-5
6-14
5
16-24
25-27
28-38
39-40
41-41
44-46
47-49
50-63
64-67
68-70
71
72-83
84-85
86-89
90-92
93
94-96
97
98-99

Analyze weather data stored in csv

I have some weather data stored in a csv file in the form of: „id, date, temperature, rainfall“, with id being the weather station and, obviously, date being the date of measurement. The file contains the data of 3 different stations over a period of 10 years.
What I'd like to do is analyze the data of each station and each year. For example: I'd like to calculate day-to-day differences in temperature [abs((n+1)-n)] for each station and each year.
I thought while-loops could be a possibility, with the loop calculating something as long as the id value is equal to the one in the next row.
But I’ve no idea how to do it.
Best regards
If you still need assistance, I would consider importing the .csv file data using "readtable". So long as only the first row are text, MATLAB will create a 'table' variable (this shouldn't be an issue for a .csv file). The individual columns can be accessed via "tablename.header" and can be reestablished as double data type (ex variable_1=tablename.header). You can then concatenate your dataset as you like. As for sorting by date and station id, I would advocate using "sortrows". For example, if the station id is the first column, sortrow(data,1) will sort "data" by the station id. sortrow(data, [1 2]) will sort "data" by the first column, then by the second column. From there, you can write an if statement to compare the station id's and perform the required calculations. I hope my brief answer is somewhat helpful.
A basic code structure would be:
path=['copy and paste file path here']; % show matlab where to look
data=readtable([path '\filename.csv'], 'ReadVariableNames',1); % read the file from csv format to table
variable1=data.header1 % general example of making double type variable from table
variable2=data.header2
variable3=data.header3
double_data=[variable1 variable2 variable3]; % concatenates the three columns together
sorted_data=sortrows(double_data, [1 2]); % sorts double_data by column 1 then column 2
It always helps to have actual data to work on and specifics as to what kind of output format is expected. Basically, ins and outs :) With the little info provided, I figured I would generate random data for you in the first section, and then calculate some stats in the second. I include the loop as an example since that's what you asked, but I highly recommend using vectorized calculations whenever available, such as the one done in summary stats.
%% example for weather stations
% generation of random data to correspond to what your csv file looks like
rng(1); % keeps the random seed for testing purposes
nbDates = 1000; % number of days of data
nbStations = 3; % number of weather stations
measureDates = repmat((now()-(nbDates-1):now())',nbStations,1); % nbDates days of data ending today
stationIds = kron((1:nbStations)',ones(nbDates,1)); % assuming 3 weather stations with IDs [1,2,3]
temp = rand(nbStations*nbDates,1)*70+30; % temperatures are in F and vary between 30 and 100 degrees
rain = max(rand(nbStations*nbDates,1)*40-20,0); % rain fall is 0 approximately half the time, and between 0mm and 20mm the rest of the time
csv = table(measureDates, stationIds, temp, rain);
clear measureDates stationIds temps rain;
% augment the original dataset as needed
years = year(csv.measureDates);
data = [csv,array2table(years)];
sorted = sortrows( data, {'stationIds', 'measureDates'}, {'ascend', 'ascend'} );
% example looping through your data
for i = 1 : size( sorted, 1 )
fprintf( 'Id=%d, year=%d, temp=%g, rain=%g', sorted.stationIds( i ), sorted.years( i ), sorted.temp( i ), sorted.rain( i ) );
if( i > 1 && sorted.stationIds( i )==sorted.stationIds( i-1 ) && sorted.years( i )==sorted.years( i-1 ) )
fprintf( ' => absolute difference with day before: %g', abs( sorted.temp( i ) - sorted.temp( i-1 ) ) );
end
fprintf( '\n' ); % new line
end
% depending on the statistics you wish to do, other more efficient ways of
% accessing summary stats might be accessible, for example:
grpstats( data ...
, {'stationIds','years'} ... % group by categories
, {'mean','min','max','meanci'} ... % statistics we want
, 'dataVars', {'temp','rain'} ... % variables on which to calculate stats
) % doesn't require data to be sorted or any looping
This produces one line printed for each row of data (and only calculates difference in temperature when there is no year or station change). It also produces some summary stats at the end, here's what I get:
stationIds years GroupCount mean_temp min_temp max_temp meanci_temp mean_rain min_rain max_rain meanci_rain
__________ _____ __________ _________ ________ ________ ________________ _________ ________ ________ ________________
1_2016 1 2016 82 63.13 30.008 99.22 58.543 67.717 6.1181 0 19.729 4.6284 7.6078
1_2017 1 2017 365 65.914 30.028 99.813 63.783 68.045 5.0075 0 19.933 4.3441 5.6708
1_2018 1 2018 365 65.322 30.218 99.773 63.275 67.369 4.7039 0 19.884 4.0615 5.3462
1_2019 1 2019 188 63.642 31.16 99.654 60.835 66.449 5.9186 0 19.864 4.9834 6.8538
2_2016 2 2016 82 65.821 31.078 98.144 61.179 70.463 4.7633 0 19.688 3.4369 6.0898
2_2017 2 2017 365 66.002 30.054 99.896 63.902 68.102 4.5902 0 19.902 3.9267 5.2537
2_2018 2 2018 365 66.524 30.072 99.852 64.359 68.69 4.9649 0 19.812 4.2967 5.6331
2_2019 2 2019 188 66.481 30.249 99.889 63.647 69.315 5.2711 0 19.811 4.3234 6.2189
3_2016 3 2016 82 61.996 32.067 98.802 57.831 66.161 4.5445 0 19.898 3.1523 5.9366
3_2017 3 2017 365 63.914 30.176 99.902 61.932 65.896 4.8879 0 19.934 4.246 5.5298
3_2018 3 2018 365 63.653 30.137 99.991 61.595 65.712 5.3728 0 19.909 4.6943 6.0514
3_2019 3 2019 188 64.201 30.078 99.8 61.319 67.082 5.3926 0 19.874 4.4541 6.3312

An issue with argument "sortv" of function seqIplot()

I'm trying to plot individual sequences by means of function seqIplot() in TraMineR. These individual sequences represent work trajectories, completed by former school's graduates via a WEB questionnaire.
Using argument "sortv", I'd like to sort my sequences according to the order of the levels of one covariate, the year of graduation, named "PROMO".
"PROMO" is a factor variable contained in a data frame named "covariates.seq", gathering covariates together:
str(covariates.seq)
'data.frame': 733 obs. of 6 variables:
$ ID_SQ : Factor w/ 733 levels "1","2","3","5",..: 1 2 3 4 5 6
7 8 9 10 ...
$ SEXE : Factor w/ 2 levels "Féminin","Masculin": 1 1 1 1 2 1
1 2 2 1 ...
$ PROMO : Factor w/ 6 levels "1997","1998",..: 1 2 2 4 4 3 2 2
2 2 ...
$ DEPARTEMENT : Factor w/ 10 levels "BC","GCU","GE",..: 1 4 7 8 7 9
9 7 7 4 ...
$ NIVEAU_ADMISSION: Factor w/ 2 levels "En Premier Cycle",..: NA 1 1 1 1
1 NA 1 1 1 ...
$ FILIERE_SECTION : Factor w/ 4 levels "Cursus Classique",..: NA 4 2 NA
1 1 NA NA 4 3 ..
I'm also using "SEXE", the graduates' gender, as a grouping variable. To plot the individual sequences so, my command is as follows:
seqIplot(sequences, group = covariates.seq$SEXE,
sortv = covariates.seq$PROMO,
cex.axis = 0.7, cex.legend = 0.7)
I expected that, by using a process time axis (with the year of graduation as sequence-dependent origin), sorting the sequences according to the order of the levels of "PROMO" would give a plot with groups of sequences from the longest (for the older graduates) to the shortest (for the younger graduates).
But I've got an issue: in the output plot, the sequences don't appear to be correctly sorted according to the levels of "PROMO". Indeed, by using "sortv = covariates.seq$PROMO" as in the command above, the plot doesn't show groups of sequences from the longest to the shortest, as expected. It looks like the plot obtained without using the argument "sortv" (see Figures below).
Without using argument "sortv"
Using "sortv = covariates.seq$PROMO"
Note that I have 733 individual sequences in my object "sequences", created as follows:
labs <- c("En poste","Au chômage (d'au moins 6 mois)", "Autre situation
(d'au moins 6 mois)","En poursuite d'études (thèse ou hors
thèse)", "En reprise d'études / formation (d'au moins 6 mois)")
codes <- c("En poste", "Au chômage", "Autre situation", "En poursuite
d'études", "En reprise d'études / formation")
sequences <- seqdef(situations, alphabet = labs, states = codes, left =
NA, right = "DEL", missing = NA,
cnames = as.character(seq(0,7400/365,1/365)),
xtstep = 365)
The values of the covariates are sorted in the same order as the individual sequences. The covariate "PROMO" doesn't contain any missing value.
Something's going wrong, but what?
Thank you in advance for your help,
Best,
Arnaud.
Using a factor as sortv argument in seqIplot works fine as illustrated by the example below:
sdc <- c("aabbccdd","bbbccc","aaaddd","abcabcab")
sd <- seqdecomp(sdc, sep="")
seq <- seqdef(sd)
fac <- factor(c("2000","2001","2001","2000"))
par(mfrow=c(1,3))
seqIplot(seq, with.legend=FALSE)
seqIplot(seq, sortv=fac, with.legend=FALSE)
seqlegend(seq)

Matlab Code for Reading Text file with inconsistent rows

I am new to Matlab and have been working my way through using Google. But now I have hit the wall it seems.
I have a text file which looks like following:
Information is for illustration reasons only
Aggregated Results
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -5.20; -1717; 330; 323
02-Oct-2008; -1.79; -595; 333; 324
03-Oct-2008; -2.29; -765; 334; 321
04-Oct-2008; -2.74; -917; 335; 317
Total Period; -0.80; -8612; 10748; 10276
Aggregated Results for location State PA
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -5.20; -1717; 330; 323
02-Oct-2008; -1.79; -595; 333; 324
03-Oct-2008; -2.29; -765; 334; 321
Total Period; -0.80; -8612; 10748; 10276
Results for account A1
Date;$/Val1;Total $;Exp. Val1;Act. Val1
01-Oct-2008; -7.59; -372; 49; 51
Total Period; -0.84; -1262; 1502; 1431
Results for account A2
Date;$/MWh;Total $;Exp. MWh;Act. MWh
01-Oct-2008; -8.00; -392; 49; 51
02-Oct-2008; 0.96; 47; 49; 51
03-Oct-2008; -0.75; -37; 50; 48
04-Oct-2008; 1.28; 53; 41; 40
Total Period; -0.36; -534; 1502; 1431
I want to extract following information in a cell/matrix format so that I can use it later to selectively do operations like average of accounts A1 and A2 or average of PA and A1, etc.
PA -0.8
A1 -0.84
A2 -0.036
I'd go this way:
fid = fopen(filename,'r');
A = textscan(fid,'%s','delimiter','\r');
A = A{:};
str_i = 'Total Period';
ix = find(strncmp(A,str_i,length(str_i)));
res = arrayfun(#(i) str2num(A{ix(i)}(length(str_i)+2:end)),1:numel(ix),'UniformOutput',false);
res = cat(2,res{:});
This way you'll get all the numerical values after a string 'Total Period' in a matrix, so that you may pick the values you need.
Similarly you may operate with strings PA, A1 and A2.
Matlab is not that nice when it comes to dealing with messy data. You may want to preprocess it a bit first.
However, here is an easy general way to import mixed numeric and non-numeric data in Matlab for a limited number of normal sized files.
Step 1: Copy the contents of the file into excel and save it as xls or xlsx
Step 2: Use xlsread
[NUM,TXT,RAW]=xlsread('test.xlsx')
From there the parsing should be maneagable.
Hopefully they will add non-numeric support to csvread or dlmread in the future.