How do I add category names to my seaborn boxplot when my data is from a python dictionary? - boxplot

I have some data that is sitting in a python dictionary of lists.
How can I use the keys from the dictionary as category labels for this boxplot?
Here is a sample of the dictionary, plot_data:
plot_data {
'Group1': [0.02339976, 0.03235323, 0.12835462, 0.10238375, 0.04223188],
'Group2': [0.02339976, 0.03235323, 0.12835462, 0.10238375, 0.04223188]
}
This code is probably a mess, but here it is:
data = plot_data.values()
#Get data in proper format
fixed_data = list(sorted(data))
#Set up the graph parameters
sns.set(context='notebook', style='whitegrid')
sns.axlabel(xlabel="Groups", ylabel="Y-Axis", fontsize=16)
#Plot the graph
sns.boxplot(data=fixed_data, whis=np.inf, width=.18)
sns.swarmplot(data=fixed_data, size=6, edgecolor="black", linewidth=.9)

Here how to add category labels "manually":
import seaborn as sns, matplotlib.pyplot as plt, operator as op
plot_data = {
'Group1': range(10,16),
'Group2': range(5,15),
'Group3': range(1,5)
}
# sort keys and values together
sorted_keys, sorted_vals = zip(*sorted(plot_data.items(), key=op.itemgetter(1)))
# almost verbatim from question
sns.set(context='notebook', style='whitegrid')
sns.axlabel(xlabel="Groups", ylabel="Y-Axis", fontsize=16)
sns.boxplot(data=sorted_vals, width=.18)
sns.swarmplot(data=sorted_vals, size=6, edgecolor="black", linewidth=.9)
# category labels
plt.xticks(plt.xticks()[0], sorted_keys)
plt.show()
And here the output:

Related

How do you use BeautifulSoup to fetch data in specific format?

I would create a python code to fetch the average volume of a given link stock using BeautifulSoup.
What I have done so far:
import bs4
import requests
from bs4 import BeautifulSoup
r=requests.get('https://finance.yahoo.com/quote/M/key-statistics?p=M')
soup=BeautifulSoup(r.content,"html.parser")
# p = soup.find_all(class_="Fw(500) Ta(end) Pstart(10px) Miw(60px)")[1].get_text
# p = soup.find_all('td')[2].get_text
# p = soup.find_all('table', class_='W(100%) Bdcl(c)')[70].tr.get_text
Anyway, I was able to get that number directly from google console using this command:
Document.querySelectorAll('table tbody tr td')[71].innerText
"21.07M"
Please, help with the basic explanation, I know a few about DOM.
You can use this logic to make it easier:
Use find_all to find all the spans in the html file
Search the spans for the correct label (Avg Vol...)
Use parent to go up the hierarchy to the full table row
Use find_all again from the parent to get the last cell which contains the value
Here is the updated code:
import bs4
import requests
from bs4 import BeautifulSoup
r=requests.get('https://finance.yahoo.com/quote/M/key-statistics?p=M')
soup=BeautifulSoup(r.content,"html.parser")
p = soup.find_all('span')
for s in p: # each span
if s.text == 'Avg Vol (10 day)': # starting cell
pnt = s.parent.parent # up 2 levels, table row
print(pnt.find_all('td')[-1].text) # last table cell
Output
21.76M

Trouble importing csv-data within MATLAB

I am trying to read in a csv-file that contains daily data on EUR/USD exchange rates including the dates specifying year, month and day. The problem is that using readtable(filename) puts single quotes around all table-entries and therefore hinders me using the data at all.
Detect import options:
opts = detectImportOptions('EUR_USD Historische Data.csv');
Read in the data:
EUR_USD = readtable('EUR_USD Historische Data.csv');
Substract dates and transform to datetime variable:
dt = EUR_USD(:,1);
dates = datetime(dt,'InputFormat','yyyyMMdd');
% Does not work because of single quotes
I was able to subtract closing prices and make them workable, but I am not sure if this is an elegant way of doing so:
closing_prices = str2double(table2array(EUR_USD(:,5)));
Ultimately the goal is to make the data workable. I need to compare two columns with datetime-variables and if dates do not match between the two columns I need to remove that entry such that in the end both columns match.
This is the vector with dates:
Dates vector wrong
I need it to look like this:
Dates vector correct
I think all you need to do is remove the ' character in order to read the data into datetime correctly. Look at the following example:
%stringz is the same as dt here: just the string data
T = table;
T.stringz = string(['''string1'''; '''string2'''; '''string3''']);
stringz = T.stringz;
%Run the for loop to remove the ' chars
for i = 1:length(stringz)
strval = char(stringz(i,1));
strval = strval(2:end-1);
strmat(i,1) = string(strval);
end
%Then load data into datetime after this for loop
dates = datetime(strmat,'InputFormat','yyyyMMdd');
strmat return a 3x1 string array with no ' characters on the outside of the string.

dicom header personal information conversion to a .txt file

I have a series of DICOM Images which I want to anonymize, I found few Matlab codes and some programs which do the job, but none of them export a .txt file of removed personal information. I was wondering if there is a function which can also save removed personal information of a DICOM images in .txt format for features uses. Also, I am trying to create a table which shows the corresponding new images ID to their real name.(subjects real name = personal-information-removed image ID)
Any thoughts?
Thanks for considering my request!
I'm guessing you only want to output to your text file the fields that are changed by anonymization (either modified, removed, or added). First, you may want to modify some dicomanon options to reduce the number of changes, in particular passing the arguments 'WritePrivate', true to ensure private extensions are kept.
First, you can perform the anonymization, saving structures of pre- and post-anonymization metadata using dicominfo:
preAnonData = dicominfo('input_file.dcm');
dicomanon('input_file.dcm', 'output_file.dcm', 'WritePrivate', true);
postAnonData = dicominfo('output_file.dcm');
Then you can use fieldnames and setdiff to find fields that are removed or added by anonymization, and add them to the post-anonymization or pre-anonymization data, respectively, with a nan value as a place holder:
preFields = fieldnames(preAnonData);
postFields = fieldnames(postAnonData);
removedFields = setdiff(preFields, postFields);
for iField = 1:numel(removedFields)
postAnonData.(removedFields{iField}) = nan;
end
addedFields = setdiff(postFields, preFields);
for iField = 1:numel(addedFields)
preAnonData.(addedFields{iField}) = nan;
end
It will also be helpful to use orderfields so that both data structures have the same ordering for their field names:
postAnonData = orderfields(postAnonData, preAnonData);
Finally, now that each structure has the same fields in the same order we can use struct2cell to convert their field data to a cell array and use cellfun and isequal to find any fields that have been modified by the anonymization:
allFields = fieldnames(preAnonData);
preAnonCell = struct2cell(preAnonData);
postAnonCell = struct2cell(postAnonData);
index = ~cellfun(#isequal, preAnonCell, postAnonCell);
modFields = allFields(index);
Now you can create a table of the changes like so:
T = table(modFields, preAnonCell(index), postAnonCell(index), ...
'VariableNames', {'Field', 'PreAnon', 'PostAnon'});
And you could use writetable to easily output the table data to a text file:
writetable(T, 'anonymized_data.txt');
Note, however, that if any of the fields in the table contain vectors or structures of data, the formatting of your output file may look a little funky (i.e. lots of columns, most of them empty, except for those few fields).
One way to do this is to store the tags before and after anonymisation and use these to write your text file. In Matlab, dicominfo() will read the tags into a structure:
% Get tags before anonymization
tags_before = dicominfo(file_in);
% Anoymize
dicomanon(file_in, file_out); % Need to set tags values where required
% Get tags after anonymization
tags_after = dicominfo(file_out);
% Do something with the two structures
disp(['Patient ID:', tags_before.PatientID ' -> ' tags_after.PatientID]);
disp(['Date of Birth:', tags_before.PatientBirthDate ' -> ' tags_after.PatientBirthDate]);
disp(['Family Name:', tags_before.PatientName.FamilyName ' -> ' tags_after.PatientName.FamilyName]);
You can then write out the before/after fields into a text file. You'd need to modify dicomanon() to choose your own values for the removed fields, since by default they are set to empty.

How to display selected entries of an array of structures in MATLAB

Suppose we have an array of structure. The structure has fields: name, price and cost.
Suppose the array A has size n x 1. If I'd like to display the names of the 1st, 3rd and the 4th structure, I can use the command:
A([1,3,4]).name
The problem is that it prints the following thing on screen:
ans =
name_of_item_1
ans =
name_of_item_3
ans =
name_of_item
How can I remove those ans = things? I tried:
disp(A([1,3,4]).name);
only to get an error/warning.
By doing A([1,3,4]).name, you are returning a comma-separated list. This is equivalent to typing in the following in the MATLAB command prompt:
>> A(1).name, A(3).name, A(4).name
That's why you'll see the MATLAB command prompt give you ans = ... three times.
If you want to display all of the strings together, consider using strjoin to join all of the names together and we can separate the names by a comma. To do this, you'll have to place all of these in a cell array. Let's call this cell array names. As such, if we did this:
names = {A([1,3,4]).name};
This is the same as doing:
names = {A(1).name, A(3).name, A(4).name};
This will create a 1 x 3 cell array of names and we can use these names to join them together by separating them with a comma and a space:
names = {A([1,3,4]).name};
out = strjoin(names, ', ');
You can then show what this final string looks like:
disp(out);
You can use:
[A([1,3,4]).name]
which will, however, concatenate all of the names into a single string.
The better way is to make a cell array using:
{ A([1,3,4]).name }

Writing columns of data into a text file

I'm using the following code to write the vectors sortedthresh_strain and probofdetectionanddelamprop1 into a text file. However, the text file output is as follows:
0.0030672 1.6592e-080.0033489 5.1721e-080.0034143
where 0.0033489 5.1721e-08 should be on the next line of the text file. i.e. It should be:
0.0030672 1.6592e-08
0.0033489 5.1721e-08
I am unsure of how to do this.
Edit: Using the proposed answer:
0.0049331 0.0049685 0.0049894 0.0050094 0.005156 0.0051741 0.0052139 0.0053399 0.0054486 0.0056022 7.0711e-21 3.0123e-19
The 2nd column is required to contain:
7.0711e-21
3.0123e-19
And,
dlmwrite('THRESHUNCERTAINTYFINALPLOTLSIGMA5.dat'[sortedthresh_strain,probofdetectionanddelamprop1],'delimiter', '\t');
If you have R2013b or later, see this answer. If you have an earlier version but have the statistics toolbox you can use the dataset object to do this very easily just like tables in R2013b. Using dataset:
data1 = {'a','b','c'}'
data2 = [1, 2, 3]'
ds = dataset(data1, data2)
export(ds, 'file', 'data.txt')
If you don't want the variable names in the result text file you can use 'WriteVarNames', false in your call to export.
Good luck!
I think your data is in row vectors, but should be column vectors for it to work like you want.
Just add a transpose with '.
dlmwrite('THRESHUNCERTAINTYFINALPLOTLSIGMA5.dat',[sortedthresh_strain',probofdetectionanddelamprop1'],'delimiter', '\t');