average range of data and plot in gnuplot - range

I have this kind of data:
label-> 1 2 3 4 5
val1 1.67E+07 2.20E+07 3.04E+07 7.89E+07 1.24E+08
val2 1.71E+07 2.35E+07 2.70E+07 7.80E+07 1.31E+08
val3 1.48E+07 2.15E+07 2.74E+07 7.18E+07 1.17E+08
val4 1.57E+07 2.07E+07 2.49E+07 7.46E+07 1.27E+08
val5 1.32E+07 2.23E+07 3.07E+07 7.50E+07 1.16E+08
I need to plot the label vs the average of each val column, like this:
label-> 1 2 3 4 5
val1 1.67E+07 2.20E+07 3.04E+07 7.89E+07 1.24E+08
val2 1.71E+07 2.35E+07 2.70E+07 7.80E+07 1.31E+08
val3 1.48E+07 2.15E+07 2.74E+07 7.18E+07 1.17E+08
val4 1.57E+07 2.07E+07 2.49E+07 7.46E+07 1.27E+08
val5 1.32E+07 2.23E+07 3.07E+07 7.50E+07 1.16E+08
mean 1.55E+07 2.20E+07 2.81E+07 7.57E+07 1.23E+08
Is there any possibility of perform this operation in gnuplot or should I keep attached to Excel?

You could do it using awk and gnuplot. Assume your example data (without mean row) is in data.txt.
Then you could calculate the mean in each column starting from the second column (from i=2) and the second row (record, or row, #1 -- NR==1 -> do not summate, but fill auxiliary array a with zeroes: a[i]=0.0). For that purpose one could use awk condition: if (NR==1)... else {...calculate the means...}.
Awk reads the data row-by-row. In each row, you iterate over fields and summate the data from column with number i into array element a[i]:
{for(i=2;i<=NF;i++) a[i]+=$i;}
When iterating over the first row (NR==1), we would ;
At the END of awk script (all rows processed), just divide by number of columns in your data NF-1 to calculate the mean values. Note, the code below assumes you have rectangular-formatted data (NF=const).
Also, save row column labels into label array:
if (NR==1) {for(i=2;i<=NF;i++) label[i]=$i; ... }
Then print the labels and mean values into the rows, one row for one label.
for(i=2;i<=NF;i++) {printf label[i]" "; print a[i]/(NF-1)}
The final data table would look that way:
1 15500000
2 22000000
3 28080000
4 75660000
5 123000000
Then you could plot one column against the other.
Note, the final data for gnuplot should be formatted in columns, not rows.
The following code performs the described operations:
gnuplot> unset key
gnuplot> plot "<export LC_NUMERIC=C; awk '{if (NR==1) {for(i=2;i<=NF;i++) label[i]=$i; a[i]=0.0;} else {for(i=2;i<=NF;i++) a[i]+=$i;};} END {for(i=2;i<=NF;i++) {printf label[i]\" \"; print a[i]/(NF-1)}};' data.txt"
Note, that spaces should be escaped with backslash character \ in the gnuplot.

Related

How to strip extra spaces when writing from dataframe to csv

Read in multiple sheets (6) from an xlsx file and created individual dataframes. Want to write each one out to a pipe delimited csv.
ind_dim.to_csv (r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
Currently outputs like this:
1|value1 |value2 |word1 word2 word3 etc.
Want to strip trailing blanks
Suggestion
Include the method .apply(lambda x: x.str.rstrip()) to your output string (prior to the .to_csv() call) to strip the right trailing blank from each field across the DataFrame. It would look like:
Change:
ind_dim.to_csv(r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
To:
ind_dim.apply(lambda x: x.str.rstrip()).to_csv(r'/mypath/ind_dim_out.csv', index = None, header=True, sep='|')
It can be easily inserted to the output code string using '.' referencing. To handle multiple data types, we can enforce the 'object' dtype on import by including the argument dtype='str':
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
Or on the DataFrame itself by:
df = pd.DataFrame(df, dtype='str')
Proof
I did a mock-up where the .xlsx document has 5 sheets, with each sheet having three columns: The first column with all numbers except an empty cell in row 2; the second column with both a leading blank and a trailing blank on strings, an empty cell in row 3, and a number in row 4; and the third column * with all strings having a leading blank, and an empty value in row 4*. Integer indexes and integer columns have been included. The text in each sheet is:
0 1 2
0 11111 valueB1 valueC1
1 valueB2 valueC2
2 33333 valueC3
3 44444 44444
4 55555 valueB5 valueC5
This code reads in our .xlsx testing_xlsx_dtype.xlsx to the DataFrame dictionary ind_dim.
Next, it loops through each sheet using a for loop to place the sheet name variable as a key to reference the individual sheet DataFrame. It applies the .str.rstrip() method to the entire sheet/DataFrame by passing the lambda x: x.str.rstrip() lambda function to the .apply() method called on the sheet/DataFrame.
Finally, it outputs the sheet/DataFrame as a .csv with the pipe delimiter using .to_csv() as seen in the OP post.
# reads xlsx in
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
# loops through sheets, applies rstrip(), output as csv '|' delimit
for sheet in ind_dim:
ind_dim[sheet].apply(lambda x: x.str.rstrip()).to_csv(sheet + '_ind_dim_out.csv', sep='|')
Returns:
|0|1|2
0|11111| valueB1| valueC1
1|| valueB2| valueC2
2|33333|| valueC3
3|44444|44444|
4|55555| valueB5| valueC5
(Note our column 2 strings no longer have the trailing space).
We can also reference each sheet using a loop that cycles through the dictionary items; the syntax would look like for k, v in dict.items() where k and v are the key and value:
# reads xlsx in
ind_dim = pd.read_excel('testing_xlsx_nums.xlsx', header=0, index_col=0, sheet_name=None, dtype='str')
# loops through sheets, applies rstrip(), output as csv '|' delimit
for k, v in ind_dim.items():
v.apply(lambda x: x.str.rstrip()).to_csv(k + '_ind_dim_out.csv', sep='|')
Notes:
We'll still need to apply the correct arguments for selecting/ignoring indexes and columns with the header= and names= parameters as needed. For these examples I just passed =None for simplicity.
The other methods that strip leading and leading & trailing spaces are: .str.lstrip() and .str.strip() respectively. They can also be applied to an entire DataFrame using the .apply(lambda x: x.str.strip()) lambda function passed to the .apply() method called on the DataFrame.
Only 1 Column: If we only wanted to strip from one column, we can call the .str methods directly on the column itself. For example, to strip leading & trailing spaces from a column named column2 in DataFrame df we would write: df.column2.str.strip().
Data types not string: When importing our data, pandas will assume data types for columns with a similar data type. We can override this by passing dtype='str' to the pd.read_excel() call when importing.
pandas 1.0.1 documentation (04/30/2020) on pandas.read_excel:
"dtypeType name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion."
We can pass the argument dtype='str' when importing with pd.read_excel.() (as seen above). If we want to enforce a single data type on a DataFrame we are working with, we can set it equal to itself and pass it to pd.DataFrame() with the argument dtype='str like: df = pd.DataFrame(df, dtype='str')
Hope it helps!
The following trims left and right spaces fairly easily:
if (!require(dplyr)) {
install.packages("dplyr")
}
library(dplyr)
if (!require(stringr)) {
install.packages("stringr")
}
library(stringr)
setwd("~/wherever/you/need/to/get/data")
outputWithSpaces <- read.csv("CSVSpace.csv", header = FALSE)
print(head(outputWithSpaces), quote=TRUE)
#str_trim(string, side = c("both", "left", "right"))
outputWithoutSpaces <- outputWithSpaces %>% mutate_all(str_trim)
print(head(outputWithoutSpaces), quote=TRUE)
Starting Data:
V1 V2 V3 V4
1 "Something is interesting. " "This is also Interesting. " "Not " "Intereting "
2 " Something with leading space" " Leading" " Spaces with many words." " More."
3 " Leading and training Space. " " More " " Leading and trailing. " " Spaces. "
Resulting:
V1 V2 V3 V4
1 "Something is interesting." "This is also Interesting." "Not" "Intereting"
2 "Something with leading space" "Leading" "Spaces with many words." "More."
3 "Leading and training Space." "More" "Leading and trailing." "Spaces."

Overwrite Specific Columns in a CSV file

I have an existing .csv file - data.csv - with 2 columns of data shown below.
COL1 COL2
1/1/1991 00:00 65.4
1/1/1991 01:00 59.2
I need to use either dlmwrite, csvwrite or even fprintf to open this file (data.csv), KEEP the first column without deleting it, and just write a new column of data overwriting the original 2nd column in the same file.
The final result will have 2 columns of data in file data.csv: the first column of data is the original column and the 2nd column is the new overwritten data from a calculation in the main program.
Final result:
COL1 COL2
1/1/1991 00:00 101.4
1/1/1991 01:00 96.3
The row offset for writing/overwriting the 2nd column is row=1, col=1 or the 2nd row and 2nd column.
I have tried xlswrite, csvwrite, dlmwrite, fprintf with varying results. The closest attempt was using csvwrite and it wrote the data in the correct row and file offset but it deleted all the original data including the 1st column of data! Here is the code part that fails to run as expected without overwriting existing data - thank you!
R = 1; C = 1;
rows = 241777;
ii = 1:rows -1
% place datesall and data2 into cell array to write to file
for jj =1:2
data(ii,jj) =csvread(strcat(dirstr,files(jj).name),R,C);
data2(ii,jj) = ((data(ii,jj)/100)*(1/24)*24*newTcaps(jj));
f = strcat(dirstr2,files2(jj).name);
%f=fopen(strcat(dirstr2,files2(jj).name),'a+');
%dlmwrite(f, data2(ii,jj) ,'delimiter', ',', 'roffset',1,'coffset',0)
csvwrite(f, data2(ii,jj),R,C);
%fprintf(f,[ ' ',data2(ii,jj), '\n' ]);
end

Converting a .txt file with 1 million digits of "e" into a vector in matlab

I have a text file with 1 million decimal digits of "e" number with 80 digits on each line excluding the first and the last line which have 76 and 4 digits and the file has 12501 lines. I want to convert it into a vector in matlab with each digit on each row. I tried num2str function, but the problem is that it gets converted like for example '7.1828e79' (13 characters). What can I do?
P.S.1: The first two lines of the text file (76 and 80 digits) are:
7182818284590452353602874713526624977572470936999595749669676277240766303535 47594571382178525166427427466391932003059921817413596629043572900334295260595630
P.S.2: I used "dlmread" and got a 12501x1 vector, with the first and second row of 7.18281828459045e+75 and 4.75945713821785e+79 and the problem is that when I use num2str for example for the first row value, I get: '7.182818284590453e+75' as a string and not the whole 76 digits. My aim was to do something like this:
e1=dlmread('e.txt');
es1=num2str(e1);
for i=1:12501
for j=1:length(es1(1,:))
a1((i-1)*length(es1(1,:))+j)=es1(i,j);
end
end
e_digits=a1.';
but I get a string like this:
a1='7.182818284590453e+754.759457138217852e+797.381323286279435e+799.244761460668082e+796.133138458300076e+791.416928368190255e+79 5...'
with 262521 characters instead of 1 million digits.
P.S.3: I think the problem might be solved if I can manipulate the text file in a way that I have one digit on each line and simply use dlmread.
Well, this is not hard, there are many ways to do it.
So first you want to load in your file as a Char Array using something simple like (you want a Char Array so that you can easily manipulate it to forget about the lines breaks) :
C = fileread('yourfile.txt'); %loads file as Char Array
D = C(~isspace(C)); %Removes SPACES which are line-breaks
Next, you want to actually append a SPACE between each char (this is because you want to use the num2str transform - and matlab needs to see the space), you can do this using a RESHAPE, a STRTRIM or simply a REGEX:
E = strtrim(regexprep(D, '.{1}', '$0 ')); %Add a SPACE after each Numeric Digit
Now you can transform it using str2num into a Vector:
str2num(E)'; %Converts the Char Array back to Vector
which will give you a single digit each row.
Using your example, I get a vector of 156 x 1, with 1 digit each row as you required.
You can get a digit per row like this
fid = fopen('e.txt','r');
c = textscan(fid,'%s');
c=cat(1,c{:});
c = cellfun(#(x) str2num(reshape(x,[],1)),c,'un',0);
c=cat(1,c{:});
And it is not the only possible way.
Could you please tell what is the final task, how do you plan using the array of e digits?

How to separate the elements of a matrix with comma in Matlab

I would like to separate each element in the matrix below with a comma.
1 2 3
4 5 6
7 8 9
Here's my attempt:
s= sprintf('%.17g,',matrix)
Ouput=1,2,3,4,5,6,7,8,9,
Desired output:
1, 2, 3
4, 5, 6
7, 8, 9
Thanks in advance for your suggestions.
You just need to specify the formatting of the entire first line:
s = sprintf('%.17g, %.17g, %.17g\n',matrix.')
MATLAB keeps re-using the formatting string as long as there are elements left in matrix.
To generalize this process, use the following expression:
s = sprintf([strjoin(repmat({'%.17g'},1,size(matrix,2)), ', ') '\n'], matrix.')
So there's a lot going on in this one line - let's unpack it from inside out:
repmat({'%.17g'},1,size(matrix,2))
This sub-expression takes a single cell array of size 1x1, containing the string %.17g, and duplicates it into a cell array with dimensions specified by the next two arguments. We want to construct a cell array with a single row (hence the argument 1) representing all the format specifiers (%...) we need. Since we want one instance of %.17g for each column, we use size(matrix,2) as the last argument to repmat, since that returns the number of columns of the matrix.
As an example, if you have 5 columns, you get this:
>> repmat({'%.17g'},1,5)
ans =
'%.17g' '%.17g' '%.17g' '%.17g' '%.17g'
Next, since you want columns delimited by commas and spaces, you can use strjoin():
>> strjoin(repmat({'%.17g'},1,5), ', ')
ans =
%.17g, %.17g, %.17g, %.17g, %.17g
Note the use of a comma and several spaces as the second argument (the delimiting string) to strjoin(). Adjust the number of spaces according to your display needs. We need one more thing to be able to print a multi-line matrix - a carriage return. To do this, we use the fact that two strings in square brackets [] are concatenated by MATLAB:
[strjoin(repmat({'%.17g'},1,size(matrix,2)), ', ') '\n']
This produces the final formatting string that we need. All that is left, is to add the sprintf and pass in the matrix argument. As Rijul Sudhir pointed out, you do have to transpose your matrix because MATLAB will walk down a column to pair the matrix elements with the format specifiers.
EDIT: Stewie Griffin was correct about the transpose operation (.') - code has been corrected.

How do I compute the number of times a character appears in a character array in MATLAB? [duplicate]

This question already has answers here:
how to count unique elements of a cell in matlab?
(2 answers)
Closed 7 years ago.
I want to determine the number of times a character appears in a character array, excluding the time it appears at the last position.
How would I do this?
In Matlab computing environment, all variables are arrays, and strings are of type char (character arrays). So your Character Array is actually a string (Or in reality the other way around). Which means you can apply string methods on it to achieve your results. To find total count of occurrence of a character except on last place in a String/Character Array named yourStringVar you can do this:
YourSubString = yourStringVar(1:end-1)
//Now you have substring of main string in variable named YourSubString without the last character because you wanted to ignore it
numberOfOccurrence = length(find(YourSubString=='Character you want to count'))
It has been pointed out by Ray that length(find()) is not a good approach due to various reasons. Alternatively you could do:
numberOfOccurrence = nnz(YourSubString == 'Character you want to count')
numberOfOccurrence will give you your desired result.
What you can do is map each character into a unique integer ID, then determine the count of each character through histcounts. Use unique to complete the first step. The first output of unique will give you a list of all possible unique characters in your string. If you want to exclude the last time each character occurs in the string, just subtract 1 from the total count. Assuming S is your character array:
%// Get all unique characters and assign them to a unique ID
[unq,~,id] = unique(S);
%// Count up how many times we see each character and subtract by 1
counts = histcounts(id) - 1;
%// Show table of occurrences with characters
T = table(cellstr(unq(:)), counts.', 'VariableNames', {'Character', 'Counts'});
The last piece of code displays everything in a nice table. We ensure that the unique characters are placed as individual cells in a cell array.
Example:
>> S = 'ABCDABABCDEFFGACEG';
Running the above code, we get:
>> T
T =
Character Counts
_________ ______
'A' 3
'B' 2
'C' 2
'D' 1
'E' 1
'F' 1
'G' 1