How do I model serial correlation in a binomial model? - mixed-models

I'd like to test for trends in proportions of animals sampled over 12 years at six different beaches so that there is a separate trend test per beach. In the data below 'thisbeach' is the number of animals sampled at that particular beach and 'notthisbeach' is the number of animals sampled at all other beaches.
dat <- data.frame(fBeach = as.factor(rep(c("B6", "B5", "B2", "B1", "B4", "B3"), each=12)),
year = rep(seq(1:12),6),
notthisbeach = c(4990, 1294, 4346, 4082, 4628, 5576, 5939, 5664, 6108, 5195, 5564, 4079, 4694, 1224, 4052,
4019, 4457, 5242, 5259, 5198, 5971, 5208, 5168, 3722, 5499, 1288, 4202, 3988, 4773, 6018,
5952, 6100, 7308, 5821, 6030, 4546, 4698, 1300, 3884, 3943, 4717, 5911, 6110, 6076, 7606,
6138, 6514, 4767, 4830, 1307, 4886, 4327, 5285, 6344, 6627, 5824, 7305, 5991, 6073, 4647,
4584, 1162, 4200, 3956, 4710, 5664, 5533, 4828, 6082, 4697, 4721, 3529),
thisbeach = c(869, 221, 768, 781, 1086, 1375, 1145, 1074, 1968, 1415, 1250, 979, 1165, 291, 1062,
844, 1257, 1709, 1825, 1540, 2105, 1402, 1646, 1336, 360, 227, 912, 875, 941, 933,
1132, 638, 768, 789, 784, 512, 1161, 215, 1230, 920, 997, 1040, 974, 662, 470,
472, 300, 291, 1029, 208, 228, 536, 429, 607, 457, 914, 771, 619, 741, 411,
1275, 353, 914, 907, 1004, 1287, 1551, 1910, 1994, 1913, 2093, 1529))
glmmTMB indicates serial correlation is present;
require(glmmTMB)
require(DHARMa)
require(multcomp)
dat.TMB <- glmmTMB(cbind(notthisbeach,thisbeach) ~ year*fBeach, family = "betabinomial", data=dat)
simres <- simulateResiduals(dat.TMB,plot=T)
res = recalculateResiduals(simres, group = dat$year)
testTemporalAutocorrelation(res, time=unique(dat$year))
Durbin-Watson test
data: simulationOutput$scaledResiduals ~ 1
DW = 0.40903, p-value = 0.0002994
alternative hypothesis: true autocorrelation is not 0
However, I can't seem to find any examples including an autocorrelation structure in a model of this type.
Does anyone have any advice please?

I'm not sure I get the setup of the number of animals at this beach vs that beach, and depending on your research question you may need to do something different. However, basic patterns are easy enough to implement in glmmtmb. The example below shows an ar1.
dat <- data.frame(fBeach = as.factor(rep(c("B6", "B5", "B2", "B1", "B4", "B3"), each=12)),
year = rep(seq(1:12),6),
notthisbeach = c(4990, 1294, 4346, 4082, 4628, 5576, 5939, 5664, 6108, 5195, 5564, 4079, 4694, 1224, 4052,
4019, 4457, 5242, 5259, 5198, 5971, 5208, 5168, 3722, 5499, 1288, 4202, 3988, 4773, 6018,
5952, 6100, 7308, 5821, 6030, 4546, 4698, 1300, 3884, 3943, 4717, 5911, 6110, 6076, 7606,
6138, 6514, 4767, 4830, 1307, 4886, 4327, 5285, 6344, 6627, 5824, 7305, 5991, 6073, 4647,
4584, 1162, 4200, 3956, 4710, 5664, 5533, 4828, 6082, 4697, 4721, 3529),
thisbeach = c(869, 221, 768, 781, 1086, 1375, 1145, 1074, 1968, 1415, 1250, 979, 1165, 291, 1062,
844, 1257, 1709, 1825, 1540, 2105, 1402, 1646, 1336, 360, 227, 912, 875, 941, 933,
1132, 638, 768, 789, 784, 512, 1161, 215, 1230, 920, 997, 1040, 974, 662, 470,
472, 300, 291, 1029, 208, 228, 536, 429, 607, 457, 914, 771, 619, 741, 411,
1275, 353, 914, 907, 1004, 1287, 1551, 1910, 1994, 1913, 2093, 1529))
head(dat)
dim(dat)
require(glmmTMB)
require(DHARMa)
require(multcomp)
# function to test ar
testar <- function(mod, dat) {
simres <- simulateResiduals(mod,plot=T)
res <- recalculateResiduals(simres, group = dat$year)
print(testTemporalAutocorrelation(res, time=unique(dat$year)))
}
mod.TMB <- glmmTMB(cbind(notthisbeach,thisbeach) ~ year*fBeach, family = "betabinomial", data=dat)
testar(mod.TMB, dat)
# results
# Durbin-Watson test
#
# data: simulationOutput$scaledResiduals ~ 1
# DW = 0.40903, p-value = 0.0002994
# alternative hypothesis: true autocorrelation is not 0
mod.TMB.ar <- glmmTMB(cbind(notthisbeach,thisbeach) ~ as.factor(year) + fBeach + ar1(as.factor(year) + 0 | fBeach), family = "betabinomial", data=dat)
testar(mod.TMB.ar, dat)
#
# Durbin-Watson test
#
# data: simulationOutput$scaledResiduals ~ 1
# DW = 1.179, p-value = 0.1242
# alternative hypothesis: true autocorrelation is not 0
VarCorr(mod.TMB.ar)
# Conditional model:
# Groups Name Std.Dev. Corr
# fBeach as.factor(year)1 0.21692 0.464 (ar1)

Related

Extracting Temperatures from .ravi file in Matlab

My Problem
Much like the post here: How can I get data from 'ravi' file?, I have a .ravi file (a radiometric video file, which is rather similar to an .avi) and I am trying to extract the Temperatures in it, to use them together with additional sensor data.
A Sample File can be found in the documentation (http://infrarougekelvin.com/en/optris-logiciel-eng/) when you download the "PIX Connect Software". Unfortunately, according to the documentation, the temperature information is stored in a 16 Bit Format, that Matlab seems to be rather unhappy with.
How I tried to solve my problem
I tried to follow the instructions from the before mentioned post, but I somehow struggle to reach results, which are even close to the correct temperatures.Original Picture with temperatures in the Optris Software
I tried to read the video with different methods:
At first I hoped to use the videorecorder Feature in Matlab:
video = VideoReader(videoPath);
frame1 = video.read(1);
imagesc(frame1)
But it only resulted in this poor picture, which is exactly what I can see, when I try to play the .ravi file in a media player like vlc.
First try with videorecorder function
Then I tried to look at the binary representation of my file and noticed, that I could separate the frames at a certain marker Beginning of a new frame in binary representation
So I tried to read the file with the matlab fread function:
fileID = fopen(videoPath);
[headerInfo,~] = fread(fileID,[1,123392],'uint8');
[imageMatrix,count] = fread(fileID,[video.width, video.height],'uint16', 'b');
imagesc(imageMatrix')
Now the image looks better, and you can at least see the brake disc, but it seems, as if the higher temperatures have some kind of offset, that is stil missing, for the picture to be right.
Also, the values that I read from the file are nowhere near actual temperatures, as the other post and the documentation suggests.
Getting somewhere!
My Question
Am I somehow missing something important? Could someone point me in the right direction, where to look or how to get the actual temperatures from my video? As it worked with the cpp code in the other post, I am guessing this might be a matlab problem.
A relatively simple solution for getting the raw frame data is converting the RAVI video file to raw video file format.
You can use FFmpeg (command line tool) for converting the RAVI to RAW format.
Example:
ffmpeg -y -f avi -i "Sequence_LED_Holder.ravi" -vcodec rawvideo "Sequence_LED_Holder.yuv"
The YUV (raw binary data) file, can be simply read by MATLAB using fread function.
Note: the .yuv is just a convention (used by FFmpeg) for raw video files - the actual pixel format is not YUV, but int16 format.
You can try parsing the RAVI file manually, but using FFmpeg is much simpler.
The raw file format is composed of raw video frames one after the other with no headers.
I our case, each frame is width*height*2 bytes.
The pixel type is int16 (may include negative values).
The IR video frames has no color information.
The colors are just "false colors" created using palette and used for visualization.
The code sample uses a palette from different IR camera manufacture.
Getting the temperature:
I could not find the way to convert the pixel value to the equivalent temperature.
I didn't read the documentation - there is a chance that the conversion is documented somewhere.
The MATLAB code sample applies the following stages:
Convert RAVI file format to RAW video file format using FFmpeg.
Read video frames as [cols, rows] size int16 matrix.
Remove the first line that probably contains data (not pixels).
Use linear contrast stretch - for visualization.
Apply false colors - for visualization.
Display the processed video frame.
Here is the code sample:
%ravi_file_name = 'Brake disc.ravi';
%ravi_file_name = 'Combustion process.ravi';
%ravi_file_name = 'Electronic board.ravi';
%ravi_file_name = 'Sequence_carwheels.ravi';
%ravi_file_name = 'Sequence_drop.ravi';
ravi_file_name = 'Sequence_LED_Holder.ravi';
%ravi_file_name = 'Steel workpiece with hole.ravi';
yuv_file_name = strrep(ravi_file_name, '.ravi', '.yuv'); % Same file name with .yuv extension.
% Get video resolution.
vidinfo = mmfileinfo(ravi_file_name);
cols = vidinfo.Video.Width;
rows = vidinfo.Video.Height;
% Execute ffmpeg (in the system shell) for converting RAVI to raw data file.
% Remark: download FFmpeg if needed, and make sure ffmpeg executable is in the execution path.
if ~exist(yuv_file_name, 'file')
% Remark: For some of the video files, cmdout returns a string with lots of meta-data
[status, cmdout] = system(sprintf('ffmpeg -y -f avi -i "%s" -vcodec rawvideo "%s"', ravi_file_name, yuv_file_name));
if (status ~= 0)
fprintf(cmdout);
error(['Error: ffmpeg status = ', num2str(status)]);
end
end
% Get the number of frames according to file size.
filesize = getfield(dir(yuv_file_name), 'bytes');
n_frames = filesize / (cols*rows*2);
f = fopen(yuv_file_name, 'r');
% Iterate the frames (skip the last frame).
for i = 1:n_frames-1
% Read frame as cols x rows and int16 type.
% The data is signed (int16) and not uint16.
I = fread(f, [cols, rows], '*int16')';
% It looks like the first line contains some data (not pixels).
data_line = I(1, :);
I = I(2:end, :);
% Apply linear stretch - in order to "see something"...
J = imadjust(I, stretchlim(I, [0.02, 0.98]));
% Apply false colors - just for visualization.
K = ColorizeIr(J);
if (i == 1)
figure;
h = imshow(K, []); %h = imshow(J, []);
impixelinfo
else
if ~isvalid(h)
break;
end
h.CData = K; %h.CData = J;
end
pause(0.05);
end
fclose(f);
imwrite(uint16(J+2^15), 'J.tif'); % Write J as uint16 image.
imwrite(K, 'K.png'); % Write K image (last frame).
% Colorize the IR video frame with "false colors".
function J = ColorizeIr(I)
% The palette apply different IR manufacture - don't expect the result to resemble OPTRIS output.
% https://groups.google.com/g/flir-lepton/c/Cm8lGQyspmk
colormapIronBlack = uint8([...
255, 255, 255, 253, 253, 253, 251, 251, 251, 249, 249, 249, 247, 247,...
247, 245, 245, 245, 243, 243, 243, 241, 241, 241, 239, 239, 239, 237,...
237, 237, 235, 235, 235, 233, 233, 233, 231, 231, 231, 229, 229, 229,...
227, 227, 227, 225, 225, 225, 223, 223, 223, 221, 221, 221, 219, 219,...
219, 217, 217, 217, 215, 215, 215, 213, 213, 213, 211, 211, 211, 209,...
209, 209, 207, 207, 207, 205, 205, 205, 203, 203, 203, 201, 201, 201,...
199, 199, 199, 197, 197, 197, 195, 195, 195, 193, 193, 193, 191, 191,...
191, 189, 189, 189, 187, 187, 187, 185, 185, 185, 183, 183, 183, 181,...
181, 181, 179, 179, 179, 177, 177, 177, 175, 175, 175, 173, 173, 173,...
171, 171, 171, 169, 169, 169, 167, 167, 167, 165, 165, 165, 163, 163,...
163, 161, 161, 161, 159, 159, 159, 157, 157, 157, 155, 155, 155, 153,...
153, 153, 151, 151, 151, 149, 149, 149, 147, 147, 147, 145, 145, 145,...
143, 143, 143, 141, 141, 141, 139, 139, 139, 137, 137, 137, 135, 135,...
135, 133, 133, 133, 131, 131, 131, 129, 129, 129, 126, 126, 126, 124,...
124, 124, 122, 122, 122, 120, 120, 120, 118, 118, 118, 116, 116, 116,...
114, 114, 114, 112, 112, 112, 110, 110, 110, 108, 108, 108, 106, 106,...
106, 104, 104, 104, 102, 102, 102, 100, 100, 100, 98, 98, 98, 96, 96,...
96, 94, 94, 94, 92, 92, 92, 90, 90, 90, 88, 88, 88, 86, 86, 86, 84, 84,...
84, 82, 82, 82, 80, 80, 80, 78, 78, 78, 76, 76, 76, 74, 74, 74, 72, 72,...
72, 70, 70, 70, 68, 68, 68, 66, 66, 66, 64, 64, 64, 62, 62, 62, 60, 60,...
60, 58, 58, 58, 56, 56, 56, 54, 54, 54, 52, 52, 52, 50, 50, 50, 48, 48,...
48, 46, 46, 46, 44, 44, 44, 42, 42, 42, 40, 40, 40, 38, 38, 38, 36, 36,...
36, 34, 34, 34, 32, 32, 32, 30, 30, 30, 28, 28, 28, 26, 26, 26, 24, 24,...
24, 22, 22, 22, 20, 20, 20, 18, 18, 18, 16, 16, 16, 14, 14, 14, 12, 12,...
12, 10, 10, 10, 8, 8, 8, 6, 6, 6, 4, 4, 4, 2, 2, 2, 0, 0, 0, 0, 0, 9,...
2, 0, 16, 4, 0, 24, 6, 0, 31, 8, 0, 38, 10, 0, 45, 12, 0, 53, 14, 0,...
60, 17, 0, 67, 19, 0, 74, 21, 0, 82, 23, 0, 89, 25, 0, 96, 27, 0, 103,...
29, 0, 111, 31, 0, 118, 36, 0, 120, 41, 0, 121, 46, 0, 122, 51, 0, 123,...
56, 0, 124, 61, 0, 125, 66, 0, 126, 71, 0, 127, 76, 1, 128, 81, 1, 129,...
86, 1, 130, 91, 1, 131, 96, 1, 132, 101, 1, 133, 106, 1, 134, 111, 1,...
135, 116, 1, 136, 121, 1, 136, 125, 2, 137, 130, 2, 137, 135, 3, 137,...
139, 3, 138, 144, 3, 138, 149, 4, 138, 153, 4, 139, 158, 5, 139, 163,...
5, 139, 167, 5, 140, 172, 6, 140, 177, 6, 140, 181, 7, 141, 186, 7,...
141, 189, 10, 137, 191, 13, 132, 194, 16, 127, 196, 19, 121, 198, 22,...
116, 200, 25, 111, 203, 28, 106, 205, 31, 101, 207, 34, 95, 209, 37,...
90, 212, 40, 85, 214, 43, 80, 216, 46, 75, 218, 49, 69, 221, 52, 64,...
223, 55, 59, 224, 57, 49, 225, 60, 47, 226, 64, 44, 227, 67, 42, 228,...
71, 39, 229, 74, 37, 230, 78, 34, 231, 81, 32, 231, 85, 29, 232, 88,...
27, 233, 92, 24, 234, 95, 22, 235, 99, 19, 236, 102, 17, 237, 106, 14,...
238, 109, 12, 239, 112, 12, 240, 116, 12, 240, 119, 12, 241, 123, 12,...
241, 127, 12, 242, 130, 12, 242, 134, 12, 243, 138, 12, 243, 141, 13,...
244, 145, 13, 244, 149, 13, 245, 152, 13, 245, 156, 13, 246, 160, 13,...
246, 163, 13, 247, 167, 13, 247, 171, 13, 248, 175, 14, 248, 178, 15,...
249, 182, 16, 249, 185, 18, 250, 189, 19, 250, 192, 20, 251, 196, 21,...
251, 199, 22, 252, 203, 23, 252, 206, 24, 253, 210, 25, 253, 213, 27,...
254, 217, 28, 254, 220, 29, 255, 224, 30, 255, 227, 39, 255, 229, 53,...
255, 231, 67, 255, 233, 81, 255, 234, 95, 255, 236, 109, 255, 238, 123,...
255, 240, 137, 255, 242, 151, 255, 244, 165, 255, 246, 179, 255, 248,...
193, 255, 249, 207, 255, 251, 221, 255, 253, 235, 255, 255, 24]);
lutR = colormapIronBlack(1:3:end);
lutG = colormapIronBlack(2:3:end);
lutB = colormapIronBlack(3:3:end);
% Convert I to uint8
I = im2uint8(I);
R = lutR(I+1);
G = lutG(I+1);
B = lutB(I+1);
J = cat(3, R, G, B);
end
Sample output:
Update:
Python code sample using OpenCV (without colorizing):
Using Python and OpenCV, we may skip the FFmpeg conversion part.
Instead of converting the RAVI file to YUV file, we may fetch undecoded RAW video from the RAVI file.
Open a video file and set CAP_PROP_FORMAT property for fetch undecoded RAW video:
cap = cv2.VideoCapture(ravi_file_name)
cap.set(cv2.CAP_PROP_FORMAT, -1) # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1).
When reading a video frame (using ret, frame = cap.read()), the undecoded frame is read as a "long" row vector of uint8 elements.
Converting the frame to int16 type, and reshaping to cols x rows:
First, we have to "view" the vector elements as int16 elements (opposed to uint8 elements): frame.view(np.int16)
Second, we have to reshape the vector into a matrix.
Conversion and reshaping code:
frame = frame.view(np.int16).reshape(rows, cols)
Complete Python code sample:
import numpy as np
import cv2
ravi_file_name = 'Sequence_LED_Holder.ravi'
cap = cv2.VideoCapture(ravi_file_name) # Opens a video file for capturing
# Fetch undecoded RAW video streams
cap.set(cv2.CAP_PROP_FORMAT, -1) # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1). [Using cap.set(cv2.CAP_PROP_CONVERT_RGB, 0) is not working]
cols = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) # Get video frames width
rows = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) # Get video frames height
while True:
ret, frame = cap.read() # Read next video frame (undecoded frame is read as long row vector).
if not ret:
break # Stop reading frames when ret = False (after the last frame is read).
# View frame as int16 elements, and reshape to cols x rows (each pixel is signed 16 bits)
frame = frame.view(np.int16).reshape(rows, cols)
# It looks like the first line contains some data (not pixels).
# data_line = frame[0, :]
frame_roi = frame[1:, :] # Ignore the first row.
# Normalizing frame to range [0, 255], and get the result as type uint8 (this part is used just for making the data visible).
normed = cv2.normalize(frame_roi, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
cv2.imshow('normed', normed) # Show the normalized video frame
cv2.waitKey(10)
cap.release()
cv2.destroyAllWindows()
Sample output:
Note:
In case colorization is required, you may use the following example: Thermal Image Processing
In most of ravi files processed with Ffmpeg, there are non-pixel values on the first line of the raw image.
This first line stores some redondant information such as image width and height.
We have to skip this line which corresponds to the image width. Since data values are 16-bit, we must multiply by 2, to get the exact offset of the binary data. We have also to calculate the exact size of the image: imageLength = Frame size - (image width * 2).
In the other case, data are from the start of the file and we can use the frame size (w * h * 2) to copy binary data and update the offset.
To know if it's necessary to calculate the data offset, we just look at image height. If this value is odd, that means there is a supplementary first line and thus we apply the correction. If the value is even, no correction for the data offset.
This is the same story when parsing the original ravi files. First we have to find the offset of the movi tag in the file. If the movi tag if followed by the ix00 tag, that means we have just after a series of values that give the offset and the size of each frames from the offset of the movi tag. Real data are elsewere in the file. If ix00 tag is not present, that means that data are just inside the movi chunck, after the 00db flag, and frame by frame. In this last case, we can also look for the idx1 tag (at the end of the file) which gives access to the exact offset and size of each frame.
Both approaches allow a rather correct image representation in grayscale or in pseudo-color, but the temperature formula provided by the libirimager tool-kit (float t = (float)data[i] / 10.f - 100.f) is incorrect and I do not undestand why, since the formula was correct when I was using raw data produced by the PI-160 camera.
Fmmpeg test
I found an alternative way. In recent ravi Optris file we can get the temperature range in the INFO chunk. Then, it's easy to find the minimal and maximal values in raw data and to interpolate in reference to the temperature scale.
with correct temperatures
Each frame holds 16-bit values by pixel with low byte first and high byte after. To find the temperature you have to apply this formula: temp = (hi * 256.0 + lo) / 10.0 - 100.0.
With low value, you can create a grayscale image. I used this approach with old Pi-160 Optris camera with success. However with new PI-450, it's more difficult since PI Connect does not support binary export now.
I tested the solution with Ffmpeg without success. You get a 16-bit data file, but the offset of real data is incorrect, and thus the temperature is aberrant.
Did you succeed ?
Sample of binary reading:

Addition of elements in multiple list corresponding to a key column of RDD

I have PythonRDDs. I have to perform addition of elements for multiple lists. Add element 1 of list 1 to element 1 of list 2 then add to element 1 of list 3.
For Canada, add 47,59,77 as element 1, 97,98,63 as second element and so on.
I tried to flatten the list to add them and tried to convert to dataframe but I failed to do so. And I want to do it through all 3 ways
countryCounts = [
('CANADA','47;97;33;94;6'),
('CANADA','59;98;24;83;3'),
('CANADA','77;63;93;86;62'),
('CHINA','86;71;72;23;27'),
('CHINA','74;69;72;93;7'),
('CHINA','58;99;90;93;41'),
('ENGLAND','40;13;85;75;90'),
('ENGLAND','39;13;33;29;14'),
('ENGLAND','99;88;57;69;49'),
('GERMANY','67;93;90;57;3'),
('GERMANY','9;15;20;19'),
('GERMANY','77;64;46;95;48'),
('INDIA','90;49;91;14;70'),
('INDIA','70;83;38;27;16'),
('INDIA','86;21;19;59;4')
]
countryCountsRdd = sc.parallelize(countryCounts)
countryCountsSplit.collect()
countryCountsGroup=countryCountsSplit.groupByKey().mapValues(list)
countryCountsGroup.collect()
CountsSplit=countryCountsRdd.map(lambda x : (x[0], ",".join(x[1].split(';'))))
countryCountsSplit.collect()
Inputs :
Way 1
[('CANADA', [47, 97, 33, 94, 6]), ('CANADA', [59, 98, 24, 83, 3]), ('CANADA', [77, 63, 93, 86, 62]), ('CHINA', [86, 71, 72, 23, 27]), ('CHINA', [74, 69, 72, 93, 7]), ('CHINA', [58, 99, 90, 93, 41]), ('ENGLAND', [40, 13, 85, 75, 90]), ('ENGLAND', [39, 13, 33, 29, 14]), ('ENGLAND', [99, 88, 57, 69, 49]), ('GERMANY', [67, 93, 90, 57, 3]), ('GERMANY', [9, 15, 20, 19]), ('GERMANY', [77, 64, 46, 95, 48]), ('INDIA', [90, 49, 91, 14, 70]), ('INDIA', [70, 83, 38, 27, 16]), ('INDIA', [86, 21, 19, 59, 4])]
Way 2:
[('CANADA', [[47, 97, 33, 94, 6], [59, 98, 24, 83, 3], [77, 63, 93, 86, 62]]), ('CHINA', [[86, 71, 72, 23, 27], [74, 69, 72, 93, 7], [58, 99, 90, 93, 41]]), ('INDIA', [[90, 49, 91, 14, 70], [70, 83, 38, 27, 16], [86, 21, 19, 59, 4]]), ('ENGLAND', [[40, 13, 85, 75, 90], [39, 13, 33, 29, 14], [99, 88, 57, 69, 49]]), ('GERMANY', [[67, 93, 90, 57, 3], [9, 15, 20, 19], [77, 64, 46, 95, 48]])]
Way 3:
[('CANADA', '47 ,97 ,33 ,94 ,6'), ('CANADA', '59 ,98 ,24 ,83 ,3'), ('CANADA', '77 ,63 ,93 ,86 ,62'), ('CHINA', '86 ,71 ,72 ,23 ,27'), ('CHINA', '74 ,69 ,72 ,93 ,7'), ('CHINA', '58 ,99 ,90 ,93 ,41'), ('ENGLAND', '40 ,13 ,85 ,75 ,90'), ('ENGLAND', '39 ,13 ,33 ,29 ,14'), ('ENGLAND', '99 ,88 ,57 ,69 ,49'), ('GERMANY', '67 ,93 ,90 ,57 ,3'), ('GERMANY', '9 ,15 ,20 ,19'), ('GERMANY', '77 ,64 ,46 ,95 ,48'), ('INDIA', '90 ,49 ,91 ,14 ,70'), ('INDIA', '70 ,83 ,38 ,27 ,16'), ('INDIA', '86 ,21 ,19 ,59 ,4')]
Require same output for all 3 :
[('CANADA','183;258;150;263;71)]
[('CHINA','218,239,234,209,75')]
[('ENGLAND','178,114,175,173,153')]
[('GERMANY','144,166,151,172,70')]
[('INDIA','246,153,148,100,90')]
You want to combine the values for a given key by taking the sum. This is precisely what reduceByKey does. You just need to define an associative and commutative reduce function to combine the values as desired.
def myReducer(a, b):
a, b = map(int, a.split(";")), map(int, b.split(";"))
maxLength = max(len(a), len(b))
if len(a) < len(b):
a = a + [0]*(len(b)-len(a))
elif len(b) < len(a):
b = b + [0]*(len(a)-len(b))
return ";".join([str(a[i] + b[i]) for i in range(maxLength)])
The only real tricky part here is that your sample input lists are not all the same size. In this case, I defined the function to zero pad the shorter list.
Now call reduceByKey:
countryCountsRdd.reduceByKey(myReducer).collect()
#[('CANADA', '183;258;150;263;71'),
# ('CHINA', '218;239;234;209;75'),
# ('INDIA', '246;153;148;100;90'),
# ('ENGLAND', '178;114;175;173;153'),
# ('GERMANY', '153;172;156;171;51')]
So you can do it using a simple reduceByKey operation on RDD.
INPUT RDD - RDD[STRING, LIST]
Output RDD - input.reduceByKey(x,y -> addFunction(x,y))
addFunction (x,y) iterates over the input lists and add the elements index wise and returns the added list

I need to look up a value within a table, find the value within in a string and return the first 3 digit zip of the string of zips

I need to lookup a zip-3 e.g.'825' within a table of grouped zips in a single column
787, 733
790, 791
793, 794, 792
802, 805, 800
806, 807
809, 816, 810, 814, 813, 811
820, 829, 826, 824, 822, 825, 827, 823, 828, 830, 831, 821, 834
837, 836, 833, 838, 834, 832, 835
840, 841, 847, 844, 843, 846, 845, 842
852, 853
857, 851
859, 855
860, 864, 865
I then need to be able to return the first value in that string i.e. I need to return '820'
I am working on two separate worksheets one with a list of zip-3's and another with grouped zip-3's so I need to be able to match the leading zip in a string to the individual zip-3 on my first worksheet.
Something like this should work:
=LEFT(INDEX('another with grouped zip-3''s'!A:A,MATCH("*"&Z1&"*",'another with grouped zip-3''s'!A:A,0)),3)

How can I read an hex number with dlmread?

I'm trying to read a .csv file with Octave (I suppose it's equivalent on Matlab). One of the columns contains hexadecimal values identifying MAC addresses, but I'd like to have it parsed anyway, I don't mind if it's converted to decimal.
Is it possible to do this automatically with functions such as dlmread? Or do I have to create a custom function?
This is how the file looks like:
Timestamp, MAC, LastBsn, PRR, RSSI, ED, SQI, RxGain, PtxCoord, Channel: 26
759, 0x35c8cc, 127, 99, -307, 29, 237, 200, -32
834, 0x32d710, 183, 100, -300, 55, 248, 200, -32
901, 0x35c8cc, 227, 100, -300, 29, 238, 200, -32
979, 0x32d6a0, 22, 95, -336, 10, 171, 200, -32
987, 0x32d710, 27, 96, -328, 54, 249, 200, -32
1054, 0x35c8cc, 71, 92, -357, 30, 239, 200, -32
1133, 0x32d6a0, 122, 95, -336, 11, 188, 200, -32
I can accept any output value for the (truncated) MAC addresses, from sequence numbers (1-6) to decimal conversion of the value (e.g. 0x35c8cc -> 3524812).
My current workaround is to use a text editor to manually replace the MAC addresses with decimal numbers, but an automated solution would be handy.
The functions dlmread and csvread will handle numeric files. You can use textscan (which is also present in Matlab), but since you're using Octave, you're better off using csv2cell (part of Octave's io package). It basically reads a csv file and returns a cell array of strings and doubles:
octave-3.8.1> type test.csv
1,2,3,"some",1c:6f:65:90:6b:13
4,5,6,"text",0d:5a:89:46:5c:70
octave-3.8.1> plg load io; # csv2cell is part of the io package
octave-3.8.1> data = csv2cell ("test.csv")
data =
{
[1,1] = 1
[2,1] = 4
[1,2] = 2
[2,2] = 5
[1,3] = 3
[2,3] = 6
[1,4] = some
[2,4] = text
[1,5] = 1c:6f:65:90:6b:13
[2,5] = 0d:5a:89:46:5c:70
}
octave-3.8.1> class (data{1})
ans = double
octave-3.8.1> class (data{9})
ans = char
>> type mycsv.csv
Timestamp, MAC, LastBsn, PRR, RSSI, ED, SQI, RxGain, PtxCoord, Channel: 26
759, 0x35c8cc, 127, 99, -307, 29, 237, 200, -32
834, 0x32d710, 183, 100, -300, 55, 248, 200, -32
901, 0x35c8cc, 227, 100, -300, 29, 238, 200, -32
979, 0x32d6a0, 22, 95, -336, 10, 171, 200, -32
987, 0x32d710, 27, 96, -328, 54, 249, 200, -32
1054, 0x35c8cc, 71, 92, -357, 30, 239, 200, -32
1133, 0x32d6a0, 122, 95, -336, 11, 188, 200, -32
You can read the file with csv2cell. The values starting with "0x" will be automatically converted from hex to decimal values. See:
>> pkg load io % load io package for csv2cell
>> data = csv2cell ("mycsv.csv");
>> data(2,1)
ans =
{
[1,1] = 759
}
To access the cell values use:
>> data{2,1}
ans = 759
>> data{2,2}
ans = 3524812
>> data{2,5}
ans = -307

Matplotlib: Formatting date in day.month.year style

I want to plot some lines with the date on the x axis, but all examples I could find use the American style like 12-31-2012. But I want 31.12.2012, but it doesn't seem to work by just changing the date formatter from
dateFormatter = dates.DateFormatter('%Y-%m-%d')
to
dateFormatter = dates.DateFormatter('%d.%m.%y')
My date list works like this: I want to define a "firstDay" manually, and then generate X succeeding days. That works as I can see when I print that result list.
But when I want to plot that list (converted by num2date) I have totally different dates.
E.g. I set my firstDay to 734517.0 which is January the 15th in 2012. Then I print my dates on the axis and I get as a first date 01.01.87 ??
Here is my full code:
import numpy as np
import matplotlib.pyplot as plot
import matplotlib.ticker as mticker
from matplotlib import dates
import datetime
fig = plot.figure(1)
DAU = ( 2, 20, 25, 60, 190, 210, 18, 196, 212, 200, 160, 150, 185, 175, 316, 320, 294, 260, 180, 145, 135, 97, 84, 80, 60, 45, 37, 20, 20, 24, 39, 73, 99)
WAU = ( 50, 160, 412, 403, 308, 379, 345, 299, 258, 367, 319, 381, 461, 412, 470, 470, 468, 380, 290, 268, 300, 312, 360, 350, 316, 307, 289, 321, 360, 378, 344, 340, 346)
MAU = (760, 620, 487, 751, 612, 601, 546, 409, 457, 518, 534, 576, 599, 637, 670, 686, 574, 568, 816, 578, 615, 520, 499, 503, 529, 571, 461, 614, 685, 702, 687, 649, 489)
firstDay = 734517.0 #15. Januar 2012
#create an array with len(DAU) entries from given starting day...
dayArray = []
for i in xrange(len(DAU)):
dayArray.append(firstDay + i)
#...and fill them with the converted dates
dayLabels = [dates.num2date(dayArray[j]) for j in xrange(len(DAU))]
for k in xrange(len(DAU)):
print dayLabels[k]
spacing = np.arange(len(DAU)) + 1
line1 = plot.plot(spacing, DAU, 'o-', color = '#336699')
line2 = plot.plot(spacing, WAU, 'o-', color = '#993333')
line3 = plot.plot(spacing, MAU, 'o-', color = '#89a54e')
ax = plot.subplot(111)
plot.ylabel('', weight = 'bold')
plot.title('', weight = 'bold')
ticks, labels = plot.xticks(spacing, dayLabels)
plot.setp(labels, rotation = 90, fontsize = 11)
dateFormatter = dates.DateFormatter('%d.%m.%y')
ax.xaxis.set_major_formatter(dateFormatter)
#ax.fmt_xdata = dates.DateFormatter('%Y-%m-%d')
#fig.autofmt_xdate()
yMax = max(np.max(DAU), np.max(WAU), np.max(MAU))
yLimit = 100 - (yMax % 100) + yMax
plot.yticks(np.arange(0, yLimit + 1, 100))
plot.grid(True, axis = 'y')
plot.subplots_adjust(bottom = 0.5)
plot.subplots_adjust(right = 0.82)
legend = plot.legend((line1[0], line2[0], line3[0]),
('DAU',
'WAU',
'MAU'),
'upper left',
bbox_to_anchor = [1, 1],
shadow = True)
frame = legend.get_frame()
frame.set_facecolor('0.80')
for t in legend.get_texts():
t.set_fontsize('small')
plot.show()
It would be fine as well with this date formatter:
ax.fmt_xdata = dates.DateFormatter('%Y-%m-%d')
but that gives me the timestamps as well, e.g. 2012-01-15 00:00:00+00:00 .
If someone could tell me how to cut the time off, it would be really great!!
It seems to me that the easiest way is to use real Datetime objects. This way you can use the datetime.timedelta(days=i) to make your date range. And matplotlib automatically takes spacing into account in case your dates are not regular. It also allows you to use the default date formatting options from matplotlib.
I left some code out to keep it simpler but you should be able to mix this with your script:
![import numpy as np
import matplotlib.pyplot as plot
import matplotlib.ticker as mticker
from matplotlib import dates
import datetime
fig = plot.figure(1)
DAU = ( 2, 20, 25, 60, 190, 210, 18, 196, 212, 200, 160, 150, 185, 175, 316, 320, 294, 260, 180, 145, 135, 97, 84, 80, 60, 45, 37, 20, 20, 24, 39, 73, 99)
WAU = ( 50, 160, 412, 403, 308, 379, 345, 299, 258, 367, 319, 381, 461, 412, 470, 470, 468, 380, 290, 268, 300, 312, 360, 350, 316, 307, 289, 321, 360, 378, 344, 340, 346)
MAU = (760, 620, 487, 751, 612, 601, 546, 409, 457, 518, 534, 576, 599, 637, 670, 686, 574, 568, 816, 578, 615, 520, 499, 503, 529, 571, 461, 614, 685, 702, 687, 649, 489)
firstDay = datetime.datetime(2012,1,15) #15. Januar 2012
dayArray = [firstDay + datetime.timedelta(days=i) for i in xrange(len(DAU))]
ax = plot.subplot(111)
line1 = ax.plot(dayArray, DAU, 'o-', color = '#336699')
line2 = ax.plot(dayArray, WAU, 'o-', color = '#993333')
line3 = ax.plot(dayArray, MAU, 'o-', color = '#89a54e')
ax.xaxis.set_major_formatter(dates.DateFormatter('%d.%m.%Y'))]
The main difference with your script is the way dayArray is created (and used as the x-value in the plotting command) and the last line which sets the format of the x-axis.