Gnuplot: categorised data - match color to data add range lines - range

Given a set of values fitting into a category, I'd like to
a) plot the data values as dots (y axis) according to category (x axis)
b) match dot color to category
c) add a line ranging from minimum to maximum of each set
What I did, was using this code:
set terminal png
set output 'animals.png'
set ytics nomirror
unset key
set xrange [-0.5:5.5]
plot for [i=2:5] 'cat.dat' using i:xtic(1)
show xrange
That successfully labels by category on the x-axis, but colors are set according to column (not row) and I would not know how to add the range bars (note: not errorbars or percentiles, but the full min->max range)- especially since the data is called columnwise but would then need to be analysed rowwise. AFAIK gnuplot does columns only, though.
Any ideas?
Output with above code:
Example data (tab-delimited):
cat 0.26 0.4 0.23 0.16
dog 0.317 0.264 0.25 0.26
bat 0.33 0.42 0.32 0.48
rat 0.59 0.62 0.57 0.56
foo 0.59 0.67 0.71 0.70
bar 0.664 0.75 0.68 0.6

As you noticed, gnuplot doesn't like rows and unfortunately does not (yet?) offer a transpose function. In your solution, you are using Unix system calls/tools and sed, which are not necessarily platform independent. Furthermore, you are plotting points and separate arrows to connect, I guess you can simplify this by linespoints if you don't insist on a horizontal bar at the minimum and maximum values.
Let me show some "simplified" platform-independent gnuplot-only code.
General Procedure:
load file to datablock
transpose datablock
plot columns with linespoints
Datafile TAB-separated without header: Animals.dat
cat 0.26 0.4 0.23 0.16
dog 0.317 0.264 0.25 0.26
bat 0.33 0.42 0.32 0.48
rat 0.59 0.62 0.57 0.56
foo 0.59 0.67 0.71 0.70
bar 0.664 0.75 0.68 0.6
The code below requires a FileToDatablock routine and a DatablockTranspose routine.
Procedure to load file into datablock: FileToDatablock.gpp
### Load datafile "as is" into datablock for different platforms
# ARG1 = input filename
# ARG2 = output datablock
if (GPVAL_SYSNAME[:7] eq "Windows") { # "Windows_NT-6.1" is shown on a Win7 system
load '< echo '.ARG2.' ^<^<EOD & type "'.ARG1.'"'
}
if (GPVAL_SYSNAME eq "Linux") { # that's shown on a Raspberry
load '< echo "\$Data << EOD" & cat "'.ARG1.'"'
}
if (GPVAL_SYSNAME eq "Darwin") { # this was shown on a MacOS Sierra 10.12.6
load '< echo "\$Data << EOD" & cat "'.ARG1.'"' # identical to Linux
}
### end of code
gnuplot procedure for transposing a datablock: DatablockTranspose.gpp
### transpose datablock (requires whitespace as separator)
# ARG1 = Input datablock
# ARG2 = Output datablock
set print #ARG2
do for [DBT_i=1:words(#ARG1[1])] {
DBT_Line = ""
do for [DBT_j=1:|#ARG1|] {
DBT_Line = DBT_Line.word(#ARG1[DBT_j],DBT_i).\
(DBT_j < |#ARG1| ? "\t" : "")
}
print DBT_Line
}
set print
undefine DBT_*
### end of code
The actual code:
### plotting rows
reset session
# load file to datablock
call "FileToDatablock" "Animals.dat" "$Data"
# transpose datablock by gnuplot procedure
call "DatablockTranspose.gpp" "$Data" "$DataTransposed"
set palette defined ( 0 'purple', 1 'blue', 2 'green', \
3 'yellow', 4 'orange', 5 'red' , 6 'black' )
unset colorbox
set xrange[0.5:|$Data|+0.5]
plot for [i=1:|$Data|] $DataTransposed u (i):i:(i):xtic(columnhead(i)) w lp pt 7 ps 1.5 lc palette not
### end of code
The result:

This takes a few more steps, above all, each category is given a unique index number, and the data is transposed:
(I'll refer to GNU unix shell commands here)
$cat -n data_orig.dat | datamash transpose > data_trans.dat
$cat data_trans.dat #added spaces for readability
1 2 3 4 5 6
cat dog bat rat foo bar
0.26 0.317 0.33 0.59 0.59 0.664
0.4 0.264 0.42 0.62 0.67 0.75
0.23 0.25 0.32 0.57 0.71 0.68
0.16 0.26 0.48 0.56 0.70 0.6
Now the data can be properly analyzed in columns and colors be defined according to the index number.
The bars are made with arrows, where minimum and maximum are taken from the statistical analysis of each column.
The xticlabels are read into a 1D word array (this is an internal gnuplot function) with a system call and the array indices are made to match the unique indices of the data columns.
The script with very detailed explanations to better support new gnuplot users:
#output and style settings: make png-file, name it 'animals.png',
# yaxis tics on both sides, no legend
set terminal png
set output 'animals.png'
set ytics mirror
unset key
#data indices are integers from 1 to 6, a bit of space for the looks
set xrange [0.5:6.5]
#define color scheme for each data series
set palette defined ( 0 'purple', 1 'blue', 2 'green', \
3 'yellow', 4 'orange', 5 'red' , 6 'black' )
#hide color gradient bar of palette
unset colorbox
#define array names using word function:
# read in 2nd line of data by system call and run through words
# each space-delimited word is now an array element of names
names(n) = word( system("sed -n '2p' cat.dat_t" ) , n )
#create min->max bars
#loop over all data sets to create bars
do for [i=1:6] {
#among others this gives minimum and maximum values of the data set
#using i -> only handle column i in statistics
#every ::3 start with row 3 for statistical analysis
stats 'data_trans.dat' using i every ::3
#use min/max values for arrow y positions, index i for x positions
#heads = arrow head on both sides
#size 0.1,90 = 0.1 line lengths for arrow head
# and 90° arrow head line angles = T bar style
#lc palette cb i = use line color (lc) from palette value matching
# color bar (cb) value of index i
set arrow from i,STATS_min to i,STATS_max heads size 0.1,90 lc palette cb i
}
#plotting:
# for [i=1:6] loop over all 6 columns, use i as loop variable
# every ::3 start with row 3 for data plotting
# using (i):i:(i):xtic(names(i))
# syntax of using
# x-value:y-value:z-value:label_x_axis [:label_y_axis:label_z_axis]
# (i) -> literal value of i for x and z, z is used as color definition
# i -> y-values from column i
# xtic(names(i)) get element i of array names for xtic label
# lc palette -> coloring according to defined palette
# pt 7 ps 1.5 -> point style and size definition
plot for [i=1:6] 'data_trans.dat' every ::3 using (i):i:(i):xtic(names(i)) lc palette pt 7 ps 1.5
References:
coloring based on x-values
array from word function
Result:
EDIT:
As shown in #theozh 's answer, linespoints are far more practicable for showing the range. This allows skipping the whole bar/arrow creation block by just adding w lp in the plotting command line.

Related

add tabs (spaces) to strings in plots for Octave / Matlab

How can I add tabs (spaces) to strings for plots in Octave see code below. It doesn't create a tab (There should be a tab between Signal and Max Freq in the plot)
Also it produces warning messages
warning: text_renderer: skipping missing glyph for character '9'
warning: called from
annotation>update_textbox at line 1080 column 11
annotation at line 248 column 7
clf
plot(0:0)
var=456
t1='Signal ';
t2=[char(9), 'Max Freq'];
t3=[char(10), 'nextline',num2str(var)];
str=strcat(t1,t2,t3);
annotation('textbox',...
[0.15 0.65 0.3 0.15],...
'String',{str},...
'FontSize',14,...
'FontName','Arial',...
'LineStyle','--',...
'EdgeColor',[1 1 0],...
'LineWidth',2,...
'BackgroundColor',[0.9 0.9 0.9],...
'Color',[0.84 0.16 0]);
Ps: I'm using Octave 4.2.2 on Ubuntu 18.04 64bit
I added t4 for blanks...doesn't look very nice. Also note I am using Matlab, not Octave so I didn't get your error. Not sure about that.
clf
plot(0:0)
var=456
t1='Signal ';
t4 = blanks(5);
t2=[char(9),t4, 'Max Freq'];
t3=[char(10), 'nextline',num2str(var)];
str=strcat(t1,t2,t3);
annotation('textbox',...
[0.15 0.65 0.3 0.15],...
'String',{str},...
'FontSize',14,...
'FontName','Arial',...
'LineStyle','--',...
'EdgeColor',[1 1 0],...
'LineWidth',2,...
'BackgroundColor',[0.9 0.9 0.9],...
'Color',[0.84 0.16 0]);

Gnuplot find local maximum of 3D data

I have a problem in gnuplot...
I am making an splot from my data points which are discrete "lines" (see attached pic) among y values of 1,1/2,1/3 etc...
At every discrete "line" I would like to get the maximum Z value and its X and Y coordinates, and highlight them, or maybe fitting a function on them etc...
Here is my code:
set title "1/m vs mutation rate"
#set term pdfcairo size 6,4
set term x11
set xlabel "Mutation rate"
set ylabel "1/m"
set xrange[0.0001:0.05]
set yrange[1.0/30:1]
unset log x
set cbrange[0:0.35]
set zrange[0:1]
set palette defined ( 0 "green", 1 "blue", 2 "red")
#set view 78,348,1,1
set view map
set output "muemmeres500map.pdf"
splot 'muemmeres500.txt' u 1:2:3 with points pt 5 ps 1 palette, "muemmeres500.txt" every 30 using ($3==GPVAL_DATA_Z_MAX?$1:NaN):($3==GPVAL_DATA_Z_MAX?$2:NaN):3 title "max1" lc rgb'black' lw 4, "muemmeres500.txt" every 30::2 using ($3==GPVAL_DATA_Z_MAX?$1:NaN):($3==GPVAL_DATA_Z_MAX?$2:NaN):3 title "max2" lc rgb'black' lw 4, "muemmeres500.txt" every 30::3 using ($3==GPVAL_DATA_Z_MAX?$1:NaN):($3==GPVAL_DATA_Z_MAX?$2:NaN):3 title "max3" lc rgb'black' lw 4, "muemmeres500.txt" every 30::4 using ($3==GPVAL_DATA_Z_MAX?$1:NaN):($3==GPVAL_DATA_Z_MAX?$2:NaN):3 title "max4" lc rgb'black' lw 4, "muemmeres500.txt" every 30::5 using ($3==GPVAL_DATA_Z_MAX?$1:NaN):($3==GPVAL_DATA_Z_MAX?$2:NaN):3 title "max5" lc rgb'black' lw 4
unset output
And here is the data file: http://pastebin.com/umqGWtyy
As you can see in the picture, the "lines" data points correspond to each line in the datafile, so for instance the data points starting with the first then every 30 correspond to the "line" which has a y value 1, then from the second line every 30 corresponds to the "line" which y value is 1/2 etc...
Therefore I wanted to get the maximum Z value from just those data...
I tried sed as well, but I failed...
So my problem is, that it can just find the global maxima and not the other local ones...:( pls help me:)
Here is the picture:
I have no idea... hope it is understandable and sorry for my english...:)
GPVAL_DATA_Z_MAX doesn't seem to work for your problem but you can use stats instead to find all the local maxima and then plot them all in a looped plot.
#Do it before setting the ranges (the column will be handled as an x column and it might get out of xrange)
do for [i=0:28]{
#Give an indexed prefix to each stat (so they *all* become accessible from outside the loop, like "A12_max" or "A25_min")
stats 'muemmeres500.txt' every 30::i u 3 nooutput prefix "A".i
}
#set all the things you need for the plot (including ranges)
...
splot 'muemmeres500.txt' u 1:2:3 with points pt 5 ps 1 palette, \
for [i=0:28] '' every 30::i u 1:2:($3==value("A".i."_max") ? $3 : NaN) notitle #t "Max".(i+1)
Note: the indices used by every start from zero.
This only works for plotting, you have all the maxima but you don't have the X and Y coordinates yet.
You also have the indices of the maxima so if you can retrieve the X Y values from the A<n>_index_max row (actually its 30*index+i or the nth block's ith row ) you have the nth maximum position. To retrieve the nth row you can use stats again with every.
do for [i=0:28]{
stats 'muemmeres500.txt' every ::i:value("A".i."_index_max"):i:value("A".i."_index_max") u 1:2 nooutput prefix "P".i
}
If you do this right after getting the Ai_ stats you already have all the positions P<i>_max_x P<i>_max_y and Z values A<i>_max.
If you want you can print them to a file:
set print "maxima.dat"
do for [i=0:28]{
print value("P".i."_max_x"), (value("P".i."_max_y")), (value("A".i."_max"))
}
unset print

Collapse/mean data in Matlab with respect to a different set of data

I have two sets of data, but the sets have a different sizes.
Each set contains the measurements itself (MeasA and MeasB, both double) and the time point (TimeA and TimeB, datenum or julian date) when the measuring happened.
Now I want to match the smaller data set to the bigger one, and to do this, I want to mean the data points of the bigger set around the data resp. time points of the smaller set, to finally do some correlation analysis.
Edit:
Small Example how the data would look like:
MeasA = [2.7694 -1.3499 3.0349 0.7254 -0.0631];
TimeA = [0.2 0.4 0.7 0.8 1.3];
MeasB = [0.7147 -0.2050 -0.1241 1.4897 1.4090 1.4172 0.6715 -1.2075 0.7172 1.6302];
TimeB = [0.1 0.2 0.3 0.6 0.65 0.68 0.73 0.85 1.2 1.4];
And now I want to collapse MeasB and TimeB so that I get the mean of the measurement close to the timepoints in TimeA, so for example TimeB should look like this:
TimeB = [mean([0.1 0.2]) mean([0.3 0.6]) mean([0.65 0.68 0.73]) mean([0.85]) mean([1.2 1.4])]
TimeB = [0.15 0.4 0.69 0.85 1.3]
And then collapse MeasB like this too:
MeasB = [mean([0.7147 -0.2050]) mean([-0.1241 1.4897]) mean([1.4090 1.4172 0.6715]) mean([-1.2075]) mean([0.7172 1.6302])];
MeasB = [0.2549 0.6828 1.1659 -1.2075 1.1737]
The function interp1 is your friend.
You can get a new set of measurement for your set B, at the same time than set A by using:
newMeasB = interp1( TimeB , MeasB , TimeA ) ;
The first 2 parameters are your original Time and Measurements of the set you want to re interpolate, the last parameter is the new x axis (time in your example) on which you want the interpolated values to be calculated.
This way you do not end up with different sets of time between your 2 sets of measurements, you can compare them point by point.
Check the documentation of interp1 for more explanations and for options about the interpolation or any potential extrapolation.
edit:
Matlab doc used to have a great illustration of the function but I can't find it online so here goes:
So with the linear method, if the value is interpolated exactly between 2 points, the function will return the exact mean. If the interpolation is done closer to one point than another, the value returned will be proportionally closer to the value of the closest point.
The NaN can appear on the sides (beginning or end of returned vector) if the TimeA was not completely overlapped by timeB. The function cannot "interpolate" because there is no anchor point. However, the different options of interp1 allow you to "extrapolate" outside of the input range, or to assign another default value instead of the NaNs.

Extract only numerical data from MATLAB from a text file into a matrix

I have a code which is producing output files containing information about some mesh which I need to analyse using MATLAB.
The output files look like this.
Vertex 1 1.3 -2.1 0 {z=(1.3e+0 -2.1e+0) mu=(1.4e-3 2.0e-3) uv=(-0.6 0.4)}
Vertex 2 1.4 -2.1 0 {z=(1.4e+0 -2.1e+0) mu=(2.8e-3 1.5e-3) uv=(-0.6 0.4)}
Vertex 3 -1.9 1.9 0 {z=(-1.9e+0 1.9e+0) mu=(-8.9e-2 1.4e-1) uv=( 0.7 -0.2)}
.
.
.
I would like my MATLAB code to read in this data file and form a matrix containing all the numbers
in the order specified.
So e.g I would want the above 3 lines to be processed into the matrix
1 1.3 -2.1 0 1.3e+0 -2.1e+0 1.4e-3 2.0e-3 -0.6 0.4
2 1.4 -2.1 0 1.4e+0 -2.1e+0 2.8e-3 1.5e-3 -0.6 0.4
3 -1.9 1.9 0 -1.9e+0 1.9e+0 -8.9e-2 1.4e-1 0.7 -0.2
Is there some convenient MATLAB facility/command to do this?
I think you could use textscan for this:
Example date.txt:
Vertex 1 1.3 -2.1 0 {z=(1.3e+0 -2.1e+0) mu=(1.4e-3 2.0e-3) uv=(-0.6 0.4)}
Vertex 2 1.4 -2.1 0 {z=(1.4e+0 -2.1e+0) mu=(2.8e-3 1.5e-3) uv=(-0.6 0.4)}
Vertex 3 -1.9 1.9 0 {z=(-1.9e+0 1.9e+0) mu=(-8.9e-2 1.4e-1) uv=( 0.7 -0.2)}
Code:
fileID = fopen('data.txt');
C = textscan(fileID,'Vertex %f %f %f %f {z=(%f %f) mu=(%f %f) uv=(%f %f)}');
fclose(fileID);
mtxC = [C{:}];
Result:
mtxC =
1.0000 1.3000 -2.1000 0 1.3000 -2.1000 0.0014 0.0020 -0.6000 0.4000
2.0000 1.4000 -2.1000 0 1.4000 -2.1000 0.0028 0.0015 -0.6000 0.4000
3.0000 -1.9000 1.9000 0 -1.9000 1.9000 -0.0890 0.1400 0.7000 -0.2000
MATLAB Option (partly tested)
I had to do something similar with a CMM once and it was easy to do in Python (see below). You could use the MATLAB command regexp(text, expression) to match a regular expression that gets what you want. This will return string data though, which you can save to a data file and then load that data file, or convert to numbers using str2double.
To use this, you first have to get your data file into MATLAB as series of strings. You can do this with fgetl.
in_fid = fopen('my_input_file.txt', 'r');
out_fid = fopen('my_output_file.txt', 'w');
data = [];
line = fgetl(in_fid);
while ischar(line)
match = regexp(line, '[+-]?\d+\.?\d*e?[+-]?\d*', 'match'); % find all matches
% Write to text file
fprintf(out_fid, '%s\t', match); % write values to file with tabs between
fprintf(out_fid, '\n'); % write a new line to the file
% Or save to an array locally
data = [data; str2double(match)];
line = fgetl(in_fid); % grab the next line
end
fclose('all');
% If you wrote to a text file, retrieve the data
data = dlmread('my_output_file.txt', 'delimiter', '\t'); % not sure about this...
Note that this will not match numbers that begin with a decimal point with no preceding digit, i.e. .2. Also note that this will match numbers that match the pattern in any file that you feed it, so it is generalized. For how to match floating point numbers, see this site (I changed it a bit though to add the e portion for scientific notation).
I was able to test the regexp and str2double operations on a remote machine, and it looks like building your data array directly works. I was unable to test the file I/O portion, so there may be some bugs there still.
Python Option (my favorite)
I suggest using regular expressions in Python for this sort of thing. I had to do something similar with a CMM once and it was easy to do in Python with something like:
import re
# Make pattern to match scientific notation numbers
pattern = re.compile(r"[+-]?\d+\.?\d*e?[+-]?\d*")
with open("your_input_file.txt", "r") as in_file:
with open("your_output_file.txt", "w") as out_file:
for line in in_file:
match = pattern.findall(line) # find all matches in the line
out_file.write("\t".join(match) + "\n") # write the results to a line in your output
For a good introduction to regex in Python, see Dive Into Python 3, which I recommend just about everybody reads. I tested this on your example file and it gives me:
1 1.3 -2.1 0 1.3e+0 -2.1e+0 1.4e-3 2.0e-3 -0.6 0.4
2 1.4 -2.1 0 1.4e+0 -2.1e+0 2.8e-3 1.5e-3 -0.6 0.4
3 -1.9 1.9 0 -1.9e+0 1.9e+0 -8.9e-2 1.4e-1 0.7 -0.2
in your_output_file.txt, so I think it works! The last step then is to just dlmread('your_output_file.txt', 'delimeter', '\t') in MATLAB and you should be good to go.
If you want to get fancy, you could upgrade your Python script so that it can be called from the command line with your input and output files as arguments (look into the sys.argv method), but this gets a bit more complicated and it is easy enough to just open the script and change the filename manually. Unless you need to do this all the time on differently-named files, in which case arguments are a good route. There is a good example of this here.

How to read formatted file without delimiters in matlab?

I have a file with rows like this (first two rows with 25 columns in each):
1921.300 . . < 0.030 . . . . . 550 1.6 1 Mrr1922 Jm 5
1973.220 158. 3. 0.240 0.002 . . 1.5 0.5 620 5.1 1 Lab1974 S 4
and description (first 4 columns as example):
term columns format description
date 008-017 f10.5 Observation date, in years.
tflag 019-019 a1 Flag for theta (position angle) measure. Flags
include:
: = uncertain/estimated (old code U)
L = originally published as nf, sp, etc.
(old code L)
Q = quadrant flipped 180deg from published value
(old code Q)
V = measure is vector separation along this angle
vector (previously used only in
interferometric catalog)
theta 020-026 f7.3 position angle, in degrees
terr 028-033 f6.3 published formal theta error, in degrees
etc.
-----------------------------------------------------------------------------
How to read this file in Matlab? I have no delimiters (only positions and formats). The basic idea is to read this file in my SQLite database but (if I right) SQLite works only with delimiters.