How to draw box plot of columns of a table using dc.js - boxplot

I have a table as follows:
The number of experiments are arbitrary but the column name's prefix is "client_" following by the client number.
I want to draw a box plot of values against the "client_#" using dc.js. The table is a csv file which is loaded using d3.csv().
There are examples using ordinary groups, however I need each column to be displayed as its own boxplot and none of the examples do this. How can I create a boxplot from each column?

This is very similar to this question:
dc.js - how to create a row chart from multiple columns
Many of the same caveats apply - it will not be possible to filter (brush) using this chart, since every row contributes to every box plot.
The difference is that we will need all the individual values, not just the sum total.
I didn't have an example to test with, but hopefully this code will work:
function column_values(dim, cols) {
var _groupAll = dim.groupAll().reduce(
function(p, v) { // add
cols.forEach(function(c) {
p[c].splice(d3.bisectLeft(p[c], v[c]), 0, v[c]);
});
return p;
},
function(p, v) { // remove
cols.forEach(function(c) {
p[c].splice(d3.bisectLeft(p[c], v[c]), 1);
});
return p;
},
function() { // init
var p = {};
cols.forEach(function(c) {
p[c] = [];
});
return p;
});
return {
all: function() {
// or _.pairs, anything to turn the object into an array
return d3.map(_groupAll.value()).entries();
}
};
}
As with the row chart question, we'll need to group all the data in one bin using groupAll - ordinary crossfilter bins won't work since every row contributes to every bin.
The init function creates an object which will be keyed by column name. Each entry is an array of the values in that column.
The add function goes through all the columns and inserts each column's value into each array in sorted order.
The remove function finds the value using binary search and removes it.
When .all() is called, the {key,value} pairs will be built from the object.
The column_values function takes either a dimension or a crossfilter object for the first parameter, and an array of column names for the second parameter. It returns a fake group with a bin for each client, where the key is the client name and the value is all of the values for that client in sorted order.
You can use column_values like this:
var boxplotColumnsGroup = column_values(cf, ['client_1', 'client_2', 'client_3', 'client_4']);
boxPlot
.dimension({}) // no valid dimension as explained in earlier question
.group(boxplotColumnsGroup);
If this does not work, please attach an example so we can debug this together.

Related

Equivalent functionality of Matlab sortrows() in MathNET.Numerics?

Is there a MathNET.Numerics equivalent of Matlab’s sortrows(A, column), where A is a Matrix<double>?
To recall Matlab's documentation:
B = sortrows(A,column) sorts A based on the columns specified in the
vector column. For example, sortrows(A,4) sorts the rows of A in
ascending order based on the elements in the fourth column.
sortrows(A,[4 6]) first sorts the rows of A based on the elements in
the fourth column, then based on the elements in the sixth column to
break ties.
Similar to my answer to your other question, there's nothing inbuilt but you could use Linq's OrderBy() method on an Enumerable of the matrix's rows. Given a Matrix<double> x,
x.EnumerateRows() returns an Enumerable<Vector<double>> of the matrix's rows. You can then sort this enumerable by the first element of each row (if that's what you want).
In C#,
var y = Matrix<double>.Build.DenseOfRows(x.EnumerateRows().OrderBy(row => row[0]));
Example
Writing this as an extension method:
public static Matrix<double> SortRows(this Matrix<double> x, int sortByColumn = 0, bool desc = false) {
if (desc)
return Matrix<double>.Build.DenseOfRows(x.EnumerateRows().OrderByDescending(row => row[sortByColumn]));
else
return Matrix<double>.Build.DenseOfRows(x.EnumerateRows().OrderBy(row => row[sortByColumn]));
}
which you can then call like so:
var y = x.SortRows(0); // Sort by first column
Here's a big fiddle containing everything

Copy data from one sheet, add current date to each new row, and paste

I've done some reading but my limited knowledge on scripts is making things difficult. I want to:
Copy a variable number of rows data range, known colums, from one sheet titled 'Download'
Paste that data in a new sheet titled 'Trade History' from Column B
In the new sheet, add today's date formatted (DD/MM/YYYY) in a new column A for each record copied
The data in worksheet 'Download' uses IMPORTHTML
The data copied from Download to store a historical record needs a date in Column A
I've managed to get 1 and 2 working, but can't work out the 3rd. See current script below.
function recordHistory() {
var ss = SpreadsheetApp.getActive(),
sheet = ss.getSheetByName('Trade_History');
var source = sheet.getRange("a2:E2000");
ss.getSheetByName('Download').getRange('A2:E5000').copyTo(sheet.getRange(sheet.getLastRow()+1, 2))
}
You need to use Utilities.formatDate() to format today's date to DD/MM/YYYY.
Because you're copying one set of values, and then next to it (in column A), pasting another, I altered your code a bit as well.
function recordHistory() {
var ss = SpreadsheetApp.getActive(),
destinationSheet = ss.getSheetByName('Trade_History');
var sourceData = ss.getSheetByName('Download').getDataRange().getValues();
for (var i=0; i<sourceData.length; i++) {
var row = sourceData[i];
var today = Utilities.formatDate(new Date(), 'GMT+10', 'dd/MM/yyyy'); // AEST is GMT+10
row.unshift(today); // Places data at the beginning of the row array
}
destinationSheet.getRange(destinationSheet.getLastRow()+1, // Append to existing data
1, // Start at Column A
sourceData.length, // Number of new rows to be added (determined from source data)
sourceData[0].length // Number of new columns to be added (determined from source data)
).setValues(sourceData); // Printe the values
}
Start by getting the values of the source data. This returns an array that can be looped through to add today's date. Once the date has been added to all of the source data, determine the range boundaries for where it will be printed. Rather than simply selecting the start cell as could be done with the copyTo() method, the full dimensions now have to be defined. Finally, print the values to the defined range.

JSCalc. Return values from a list

Im trying to return the values from a 'repeating item' input. But 'inputs.value' doesn't work. I think I need to creat a loop and index for every item on the list but not sure.
I think I need to create a loop and index for every item on the list..
Yes. You are right on that. The 'Repeating Item' is internally stored as an object array. So you need to iterate that array to process it. The individual items are objects and hence you will not be available on the inputs object directly, but via inputs.lineitems, where lineitems is the property name of the repeating item prototype.
For example:
You are creating a repeating items list of items which you want to order. So, you have two inputs inside the repeating items prototype, say itemName and itemQuantity. You name the repeating items property name as LineItems. You want to display it as an output table and also display total quantity ordered. The output table is named Orders and the total label is named Total.
You could then iterate this to process it further, like this:
var result = [], totalItems = 0;
Where result is an array that you would want to map to your output table, and totalItems is where you would cache the total quantity.
inputs.LineItems.forEach(function(item, idx) {
totalItems += item.itemQuantity;
result.push({
'ItemNumber': idx + 1,
'Item': item.itemName,
'Quantity': item.itemQuantity
});
});
Where, you are iterating the repeating items via inputs.LineItems and increment total accordingly . You also prepare the result array to map to the Orders table later on.
This is what you return:
return {
Total: totalItems,
Orders: result
};
Where, Total is the output label you defined earlier, and Orders is the out put table name you defined earlier.
Here is a demo for you to understand it better:
https://jscalc.io/calc/YicDJYCSlYTGYFMS
To see the source, just click on the ellipsis (three dots shown after the 'Powered by JSCalc.io' text) and click "make a copy".
Hope this helps.

how to get a parentNode's index i using d3.js

Using d3.js, were I after (say) some value x of a parent node, I'd use:
d3.select(this.parentNode).datum().x
What I'd like, though, is the data (ie datum's) index. Suggestions?
Thanks!
The index of an element is only well-defined within a collection. When you're selecting just a single element, there's no collection and the notion of an index is not really defined. You could, for example, create a number of g elements and then apply different operations to different (overlapping) subsets. Any individual g element would have several indices, depending on the subset you consider.
In order to do what you're trying to achieve, you would have to keep a reference to the specific selection that you want to use. Having this and something that identifies the element, you can then do something like this.
var value = d3.select(this.parentNode).datum().x;
var index = -1;
selection.each(function(d, i) { if(d.x == value) index = i; });
This relies on having an attribute that uniquely identifies the element.
If you have only one selection, you could simply save the index as another data attribute and access it later.
var gs = d3.selectAll("g").data(data).append("g")
.each(function(d, i) { d.index = i; });
var something = gs.append(...);
something.each(function() {
d3.select(this.parentNode).datum().index;
});

Using cell value as reference to sheet in formulas

I have a spreadsheet with three sheets. Two are called 2012 and 2011 and have a bunch of similar data. The last sheet does comparisons between the data.
To be able to choose year, I'm using a cell (D1) where I can I can write either 2011 or 2012. The formulas then use the INDIRECT function to include this cell as part of the reference.
INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!F:F")
This is not a pretty solution and makes the formula quite long and complex.
=IFERROR(SUM(FILTER( INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!M:M") ; (INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A4)+(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A5)+(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A6)+(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A7)+(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B")=$A8); MONTH(INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!D:D"))=$B$1 ; INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!F:F")=D$3));0)
Is there a better way of doing this?
I've tried to create a separate spreadsheet for the calculations sheet and importing (IMPORTRANGE) the data from the two sheets together on one sheet with VMERGE (custom function from the script gallery) but there is quite a lot of of data in these two sheets and the import takes a long time. Any changes (like changing year) also take a long time to recalculate.
Database functions tend to be cleaner when doing this kind of thing.
https://support.google.com/docs/bin/static.py?hl=en&topic=25273&page=table.cs&tab=1368827
Database functions take a while to learn, but they are powerful.
Or
You could put INDIRECT(CHAR(39)&$D$1&CHAR(39)&"!B:B") in a cell on its own.
I think that you have two years of information where the schema is identical (column C has the same type of information on both sheets). Also, I'm assuming that column B tracks the year.
If so, consider holding all of your information on one sheet and and use the spreadsheet function "QUERY" to create views.
For instance, this formula returns all the cells between A1:E from a sheet named "DataSheet" where the values in column B = 2010.
=QUERY(DataSheet!A1:E; "SELECT * WHERE B = 2010";1)
Sometimes there is a really good reason to have the data stored on two sheets. If so, use one of the vMerge functions in the script gallery to assemble a working sheet. Then create views and reports from the working sheet.
function VMerge() {
var maxw=l=0;
var minw=Number.MAX_VALUE;
var al=arguments.length ;
for( i=0 ; i<al ; i++){
if( arguments[i].constructor == Array )l =arguments[i][0].length ;
else if (arguments[i].length!=0) l = 1 ; // literal values count as array with a width of one cell, empty cells are ignored!
maxw=l>maxw?l:maxw;
minw=l<minw?l:minw;
}
if( maxw==minw) { /* when largest width equals smallest width all are equal */
var s = new Array();
for( i=0 ; i<al ; i++){
if( arguments[i].constructor == Array ) s = s.concat( arguments[i].slice() )
else if (arguments[i].length!=0) s = s.concat( [[arguments[i]]] )
}
if ( s.length == 0 ) return null ; else return s //s
}
else return "#N/A: All data ranges must be of equal width!"
}
Hope this helps.