Problems with fprintf format (Matlab) - matlab

I want to correct variables' format in a txt file (show at the end, replace spaces for tab spaces), using the next Matlab code (previous import):
id = fopen('datoscorfecha.txt', 'w');
fprintf(id, '%5s %3s %3s %3s %4s %3s %6s\n',...
'fecha', 'dia','mes', 'ano', 'hora', 'min', 'abs370');
datos = cat(2,dia, mes, ano, hora, min1, abs370);
datos = datos';
fecha = Fecha'; % Imported as a string
fprintf(id, '%16s %2i %2i %4i %2i %2i %8.4f\n',...
fecha, datos);
fclose(id);
type datoscorfecha.txt
But I get this error:
Error using fprintf
Unable to convert 'string' value to
'int64'.
Fecha dia mes ano hora min abs370
03/06/2016 00:00 3 6 2016 0 0 29.356218
03/06/2016 00:05 3 6 2016 0 5 30.45703
03/06/2016 00:10 3 6 2016 0 10 27.53877
03/06/2016 00:15 3 6 2016 0 15 23.19832
03/06/2016 00:20 3 6 2016 0 20 22.333924
03/06/2016 00:25 3 6 2016 0 25 22.086426
03/06/2016 00:30 3 6 2016 0 30 20.933898

Maybe something like this can allow you to replace the spaces with tabs. Here I read the text file using the textscan() function and separate the columns. I also parse each value/term as a string. By using the writematrix() function I can write the data to a new text file the but with the Delimeter set to tab.
Text.txt (Input)
Fecha dia mes ano hora min abs370
03/06/2016 00:00 3 6 2016 0 0 29.356218
03/06/2016 00:05 3 6 2016 0 5 30.45703
03/06/2016 00:10 3 6 2016 0 10 27.53877
03/06/2016 00:15 3 6 2016 0 15 23.19832
03/06/2016 00:20 3 6 2016 0 20 22.333924
03/06/2016 00:25 3 6 2016 0 25 22.086426
03/06/2016 00:30 3 6 2016 0 30 20.933898
datoscorfecha.txt (Output)
Fecha dia mes ano hora min abs370
03/06/2016 00:00 3 6 2016 0 0 29.3562
03/06/2016 00:05 3 6 2016 0 5 30.4570
03/06/2016 00:10 3 6 2016 0 10 27.5388
03/06/2016 00:15 3 6 2016 0 15 23.1983
03/06/2016 00:20 3 6 2016 0 20 22.3339
03/06/2016 00:25 3 6 2016 0 25 22.0864
03/06/2016 00:30 3 6 2016 0 30 20.9339
Full Script:
File_ID = fopen("Text.txt");
Data = textscan(File_ID, '%s %s %s %s %s %s %s %s', 'Delimiter',' ');
fclose(File_ID);
% Data = readtable("Text.txt");
Column_1 = string(Data{:,1});
Column_2 = string(Data{:,2});
Column_3 = string(Data{:,3});
Column_4 = string(Data{:,4});
Column_5 = string(Data{:,5});
Column_6 = string(Data{:,6});
Column_7 = string(Data{:,7});
Column_8 = string(Data{:,8});
for Index = 2: length(Column_8)
Number = str2double(char(Column_8(Index,1)));
Number = num2str(Number);
Decimal_String = split(Number,".");
Decimal_String = Decimal_String{2};
if length(Decimal_String) ~= 4
Number = string(Number) + "0";
end
Column_8(Index,1) = Number;
end
Table = [Column_1 Column_2 Column_3 Column_4 Column_5 Column_6 Column_7 Column_8];
writematrix(Table,"datoscorfecha.txt",'Delimiter','tab');
type datoscorfecha.txt
Ran using MATLAB R2019b

Related

Using a growth formula for grouped observations

I have a dataset which is shown below:
clear
input year price growth id
2008 5 -0.444 1
2009 . . 1
2010 7 -0.222 1
2011 9 0 1
2011 8 -0.111 1
2012 9 0 1
2013 11 0.22 1
2012 10 0 2
2013 12 0.2 2
2013 . . 2
2014 13 0.3 2
2015 17 0.7 2
2015 16 0.6 2
end
I want to generate variable growth which is the growth of price. The growth formula is:
growth = price of second-year - price of base year / price of base year
The base year is always 2012.
How can I generate this growth variable for each group of observation (by id)?
The base price can be picked out directly by egen:
bysort id: egen price_b = total(price * (year == 2012))
generate wanted = (price - price_b) / price_b
Notice that total is used along with the assumption that, for each id, you have only one observation with year = 2012.
The following works for me:
bysort id: generate obs = _n
generate double wanted = .
levelsof id, local(ids)
foreach x of local ids {
summarize obs if id == `x' & year == 2012, meanonly
bysort id: replace wanted = (price - price[`=obs[r(min)]']) / ///
price[`=obs[r(min)]'] if id == `x'
}
If the id values are consecutive, then the following will be faster:
forvalues i = 1 / 2 {
summarize obs if id == `i' & year == 2012, meanonly
bysort id: replace wanted = (price - price[`=obs[r(min)]']) / ///
price[`=obs[r(min)]'] if id == `i'
}
Results:
list, sepby(id)
+-----------------------------------------------+
| year price growth id obs wanted |
|-----------------------------------------------|
1. | 2008 5 -.444 1 1 -.44444444 |
2. | 2009 . . 1 2 . |
3. | 2010 7 -.222 1 3 -.22222222 |
4. | 2011 9 0 1 4 0 |
5. | 2011 8 -.111 1 5 -.11111111 |
6. | 2012 9 0 1 6 0 |
7. | 2013 11 .22 1 7 .22222222 |
|-----------------------------------------------|
8. | 2012 10 0 2 1 0 |
9. | 2013 12 .2 2 2 .2 |
10. | 2013 . . 2 3 . |
11. | 2014 13 .3 2 4 .3 |
12. | 2015 17 .7 2 5 .7 |
13. | 2015 16 .6 2 6 .6 |
+-----------------------------------------------+

Convert hourly data to daily data in Matlab

We have two matrices. Name one of them "Date", And another name is "Data"
There are several columns in the Date matrix included:
year month day julusi hour
1951 1 1 1 0
1951 1 1 1 3
1951 1 1 1 6
1951 1 1 1 9
1951 1 1 1 12
1951 1 1 1 15
1951 1 1 1 18
1951 1 1 1 21
1951 1 2 2 0
1951 1 2 2 3
1951 1 2 2 6
1951 1 2 2 9
1951 1 2 2 12
1951 1 2 2 15
1951 1 2 2 18
1951 1 2 2 21
.... . . . .
.... . . . .
1951 12 30 364 0
1951 12 30 364 3
1951 12 30 364 6
1951 12 30 364 9
1951 12 30 364 12
1951 12 30 364 15
1951 12 30 364 18
1951 12 30 364 21
1951 12 31 365 0
1951 12 31 365 3
1951 12 31 365 6
1951 12 31 365 9
1951 12 31 365 12
1951 12 31 365 15
1951 12 31 365 18
1951 12 31 365 21
.... .. . .. .
2018 12 31 365 0
2018 12 31 365 3
2018 12 31 365 6
2018 12 31 365 9
2018 12 31 365 12
2018 12 31 365 15
2018 12 31 365 18
2018 12 31 365 21
In my Data matrix, there are 410 columns(198696*410).The size of my Date matrices is equal. "198696*1". I want to convert the "Data Matrix on basis the Date Matrix to daily data
I use the following code
N=0;
for year=1951:2018;
for Juliusi=1:365;
cxa=(Date(:,4)==Juliusi);
cxb=(Date(:,1)==year);
a=cxa & cxb;
N=N+1;
dayy(N,:)=nanmean(Data(a,:));
end;end;
The conversion result is correct, but the size of the matrix is not the same
198696/8=24837 is correct but my matrix 24820 is incorrect
Where is the problem?
What to do to consider leap days?
Since I recently learned from Luis Mendo, that convolution is the key to success, I came up with the following idea: If your data is complete, i.e. you can guarantee, that there are always 8 entries for each day, you can just simply use the following approach:
% Some test data.
Date = [
1951 1 1 1 0;
1951 1 1 1 3;
1951 1 1 1 6;
1951 1 1 1 9;
1951 1 1 1 12;
1951 1 1 1 15;
1951 1 1 1 18;
1951 1 1 1 21;
1952 1 2 2 0;
1952 1 2 2 3;
1952 1 2 2 6;
1952 1 2 2 9;
1952 1 2 2 12;
1952 1 2 2 15;
1952 1 2 2 18;
1952 1 2 2 21]
% Temporary result for convolution.
temp = conv2(Date, ones(8, 1)) / 8;
% Extract values of interest.
dayy = temp(8:8:end, :)
Output:
Date =
1951 1 1 1 0
1951 1 1 1 3
1951 1 1 1 6
1951 1 1 1 9
1951 1 1 1 12
1951 1 1 1 15
1951 1 1 1 18
1951 1 1 1 21
1952 1 2 2 0
1952 1 2 2 3
1952 1 2 2 6
1952 1 2 2 9
1952 1 2 2 12
1952 1 2 2 15
1952 1 2 2 18
1952 1 2 2 21
dayy =
1951.0000 1.0000 1.0000 1.0000 10.5000
1952.0000 1.0000 2.0000 2.0000 10.5000
If you need the year and day information, then these could be obtained separately. But in your original post, these information seemed to be unneeded.
Just to be sure: I DO know, I used the Date matrix in my example. But since, Date follows the same format as Data, and you can easily verify the results of the wanted mean operation, I used it as an example.

Perl function localtime giving incorrect values for years between 1964 and 1967

I was getting some whacky values from localtime function in Perl. The following is some code for which I get incorrect values.
In particular, this code is meant to determine the weekday for the first of each year.
#!/usr/bin/perl
use strict 'vars';
use Time::Local;
use POSIX qw(strftime);
mytable();
sub mytable {
print "Year" . " "x4 . "Jan 1st (localtime)" . " "x4 . "Jan 1st (Gauss)\n";
foreach my $year ( 1964 .. 2017 )
{
my $janlocaltime = evalweekday( 1,1,$year);
my $jangauss = gauss($year);
my $diff = $jangauss - $janlocaltime;
printf "%4s%10s%-12s ",$year,"",$janlocaltime;
printf "%12s",$jangauss;
printf " <----- ERROR: off by %2s", $diff if ( $diff != 0 );
print "\n";
}
}
sub evalweekday {
## Using "localtime"
my ($day,$month,$year) = #_;
my $epoch = timelocal(0,0,0, $day,$month-1,$year-1900);
my $weekday = ( localtime($epoch) ) [6];
return $weekday;
}
sub gauss {
## Alternative approach
my ($year) = #_;
my $weekday =
( 1 + 5 * ( ( $year - 1 ) % 4 )
+ 4 * ( ( $year - 1 ) % 100 )
+ 6 * ( ( $year - 1 ) % 400 )
) % 7;
return $weekday;
}
Here is the output which shows the years with incorrect values:
Year Jan 1st (localtime) Jan 1st (Gauss)
1964 2 3 <----- ERROR: off by 1
1965 4 5 <----- ERROR: off by 1
1966 5 6 <----- ERROR: off by 1
1967 6 0 <----- ERROR: off by -6
1968 1 1
1969 3 3
1970 4 4
1971 5 5
1972 6 6
1973 1 1
1974 2 2
1975 3 3
1976 4 4
1977 6 6
1978 0 0
1979 1 1
1980 2 2
1981 4 4
1982 5 5
1983 6 6
1984 0 0
1985 2 2
1986 3 3
1987 4 4
1988 5 5
1989 0 0
1990 1 1
1991 2 2
1992 3 3
1993 5 5
1994 6 6
1995 0 0
1996 1 1
1997 3 3
1998 4 4
1999 5 5
2000 6 6
2001 1 1
2002 2 2
2003 3 3
2004 4 4
2005 6 6
2006 0 0
2007 1 1
2008 2 2
2009 4 4
2010 5 5
2011 6 6
2012 0 0
2013 2 2
2014 3 3
2015 4 4
2016 5 5
2017 0 0
In fact, the errors seem to extend as far back as 1900, but I just haven't verified that they are in fact wrong prior to 1964.
perl --version returns the following:
This is perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level
(with 2 registered patches, see perl -V for more detail)
Copyright 1987-2013, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
I'm not sure whether it's relevant, but my operating system is macOS Sierra Version 10.12.3.
I've read through the documentation, but I don't see anything (or I'm being blind) regarding values returned prior to 1968. I've also tried to do a websearch but am not pulling up anything beyond the typical misunderstandings of array values and the numbering of months and days of the year.
Could someone help me out and explain what I'm getting wrong? Or, if this is an issue with my version of Perl, let me know what I can do to fix it.
This is likely to do with how negative epoch values are handled in Time::Local. Have a look at perldoc Time::Local #Negative-Epoch-Values
On my Linux box (perl 5.20), your code demonstrates the issue nicely. If you print out the epoch value received, you will see the issue, namely that the epoch returned by timelocal becomes huge instead of more negative:
Year Epoch Jan 1st (localtime) Jan 1st (Gauss)
1964 2966342400 2 3 <----- ERROR: off by 1
1965 2997964800 4 5 <----- ERROR: off by 1
1966 3029500800 5 6 <----- ERROR: off by 1
1967 3061036800 6 0 <----- ERROR: off by -6
1968 -63185400 1 1
1969 -31563000 3 3
1970 -27000 4 4
1971 31509000 5 5
1972 63045000 6 6
Why don't you try using DateTime library instead:
use DateTime;
my $dt = DateTime->new(
year => 1966, # Real Year
day => 1, # 1-31
month => 1, # 1-12
hour => 0, # 0-23
second => 0, # 0-59
);
print $dt->dow . "\n";
6
6 = Saturday which matches the Wikipedian view: Jan 1, 1966 (Saturday)

Aggregating second-by-second sampling interval to 30 sec interval, POSIXct

New to [R]studio and respectfully requesting help.
Goal: I'd like to take data collected at 1 second intervals, collapse it to 30 sec intervals, and, subsequently, have the "mean" of each variable associated with it.
Here is what my data looks like:
line datetime AA BB CC
1 2016-06-27 14:13:16 6 0 0.0
2 2016-06-27 14:13:17 10 0 48.6
3 2016-06-27 14:13:18 7 0 52.0
4 2016-06-27 14:13:19 13 0 54.4
5 2016-06-27 14:13:20 16 0 60.8
6 2016-06-27 14:13:21 6 0 65.5
7 2016-06-27 14:13:22 6 0 47.5
8 2016-06-27 14:13:23 6 1 46.8
9 2016-06-27 14:13:24 4 1 55.5
10 2016-06-27 14:13:25 4 1 51.1
11 2016-06-27 14:13:26 4 1 53.4
What I'd like to see is this:
line datetime AA BB CC
1 2016-06-27 14:13:16 18 1 50.5
2 2016-06-27 14:13:46 19 1 52.8
(here, variables AA, BB, and CC were averaged).
There have been questions similar to this, but none that were similar enough to give me a foundation to work on with my little coding and programming knowledge. I've been pacing back and forth between probable base r solutions and probable package solutions to no avail; mainly because the language/syntax implementation is still a bit foreign to me.
I think you want to try this: (base solution)
etw
datetime AA BB CC
1 2016-06-27 14:13:16 6 0 0.0
2 2016-06-27 14:13:17 10 0 48.6
3 2016-06-27 14:13:18 7 0 52.0
4 2016-06-27 14:13:19 13 0 54.4
5 2016-06-27 14:13:20 16 0 60.8
6 2016-06-27 14:13:21 6 0 65.5
7 2016-06-27 14:13:22 6 0 47.5
8 2016-06-27 14:13:23 6 1 46.8
9 2016-06-27 14:13:24 4 1 55.5
10 2016-06-27 14:13:25 4 1 51.1
11 2016-06-27 14:13:26 4 1 53.4
aggregate(x = etw, by = list(cut(etw$datetime,breaks = "10 sec")), FUN=mean )
Group.1 datetime AA BB CC
1 2016-06-27 14:13:16 2016-06-27 14:13:20 7.8 0.3 48.22
2 2016-06-27 14:13:26 2016-06-27 14:13:26 4.0 1.0 53.40
you can change the 10 sec part to 30 sec. however - take care: breaks = "10 sec" will cut the range into 10 sec slices starting with the minimum time. which in your case result in a single slice.
you can also manually define the range using
breaks = seq.POSIXt(from = as.POSIXct("2016-06-27 14:13:00"),to = as.POSIXct("2016-06-27 14:14:00"),by="10 sec"))
aggregate(x = etw,FUN=mean, by = list(cut(etw$datetime,breaks = seq.POSIXt(from = as.POSIXct("2016-06-27 14:13:00"),to = as.POSIXct("2016-06-27 14:14:00"),by="10 sec"))) )
Group.1 datetime AA BB CC
1 2016-06-27 14:13:10 2016-06-27 14:13:17 9.000000 0.0000000 38.75000
2 2016-06-27 14:13:20 2016-06-27 14:13:23 6.571429 0.5714286 54.37143
this is not exactly what you wanted to get but imho - your sample data does not correspond to the desired output :)

Matlab: Join datasets by not exact but similar values

I have two example datasets, A and B below, that I want to join in Matlab to create C. The keys will be 'product' and 'year', but the problem is that the product number in dataset B only matches the one in A by the first 4 digits. Is there a way to join 'almost' matching numbers in this way?
A
product tariff year
202341 2 1999
202341 4 2000
202341 20 2008
202355 9 1999
202355 16 2000
438811 0 1999
438891 8 1999
438891 3 2001
671212 15 2005
671260 10 2005
and
B
product avg_tariff year
2023 5,5 1999
2023 10 2000
2023 20 2008
4388 4 1999
4388 3 2001
6712 12,5 2005
are joined to produce matrix C
C
product tariff year avg_tariff
202341 2 1999 5,5
202341 4 2000 10
202341 20 2008 20
202355 9 1999 5,5
202355 16 2000 10
438811 0 1999 4
438891 8 1999 4
438891 3 2001 3
671212 15 2005 12,5
671260 10 2005 12,5
Thanks in advance
Oscar
Since this question is related to a previous one of yours I answered, I will reuse the code and update it to the new data:
a.csv
product tariff year
202341 2 1999
202341 4 2000
202341 20 2008
202355 9 1999
202355 16 2000
438811 0 1999
438891 8 1999
438891 3 2001
671212 15 2005
671260 10 2005
b.csv
product avg_tariff year
2023 5.5 1999
2023 10 2000
2023 20 2008
4388 4 1999
4388 3 2001
6712 12.5 2005
MATLAB code
(using the Dataset class from the Statistics Toolbox):
%# read A, and build dataset
fid = fopen('a.csv','rt');
C = textscan(fid, '%s%f%f', 'Delimiter',' ', 'MultipleDelimsAsOne',true, 'HeaderLines',1);
fclose(fid);
dA = dataset({C{1} 'product'}, {C{2} 'tariff'}, {C{3} 'year'});
%# read B, and build dataset
fid = fopen('b.csv','rt');
C = textscan(fid, '%s%f%f', 'Delimiter',' ', 'MultipleDelimsAsOne',true, 'HeaderLines',1);
fclose(fid);
dB = dataset({C{1} 'product'}, {C{2} 'avg_tariff'}, {C{3} 'year'});
%# truncate productA
dA.productLong = dA.product;
dA.product = cellfun(#(s)s(:,1:end-2), cellstr(dA.product), 'UniformOutput',false);
%# inner join (keep only rows that exist in both datasets)
ds = join(dA, dB, 'keys',{'product' 'year'}, 'type','inner', 'MergeKeys',true);
%# restore the long product number as first column, and sort by it
ds.product = ds.productLong;
ds.productLong = [];
ds = sortrows(ds, 'product')
The result as expected:
ds =
product tariff year avg_tariff
'202341' 2 1999 5.5
'202341' 4 2000 10
'202341' 20 2008 20
'202355' 9 1999 5.5
'202355' 16 2000 10
'438811' 0 1999 4
'438891' 8 1999 4
'438891' 3 2001 3
'671212' 15 2005 12.5
'671260' 10 2005 12.5
load the product array and treat it as strings using textscan:
fidA = fopen('A.txt');
fidB = fopen('B.txt');
A = textscan(fidA,'%s%s%s','delimiter',' ');
B = textscan(fidB,'%s%s%s','delimiter',' ');
fclose(fidA);
fclose(fidB);
keep only the first 4 chars of product in A
for i = 1:length(A{1})
rowKeyA{i} = [A{1}{i}(1:4),A{3}{i}]; %product(1:4),year
end
for i = 1:length(B{1})
rowKeyB{i} = [B{1}{i},B{3}{i}]; %product,year
end
now just find matches between rowKeyA and rowKeyB
for i = 1:length(rowKeyA)
j = find(strcmp(rowKeyB,rowKeyA{i}),1);
if(j)
fprintf('%s %s %s\n',rowKeyA{i},A{2},B{2});
end
end