map and sort from alphanumeric data - alphanumeric

I have the file "global power plants" with a column "capacity_in_mw" (with numbers 30, 100, 45, ...) and another column is "primary_fuel" (Coal, Hydro, Oil, Solar, Nuclear, Wind, Coal).
I can generate a map in function of "capacity_in_mw" by setting the condition
plotdata = data.query('capacity_in_mw > 50')
Now, I would like to generate a map in function of "primary_fuel". Because data is alphanumeric, how do I set up the condition?
Furthermore, when making the map, to assign color='black' for Coal, color='green' for Wind, color='yellow' for Solar, ... etc.

Python.
I am a novice, but I think I found the solution. It is more of an issue of syntax, to use the == in the query to identify the alphanumeric value.
plotdata = data.query('primary_fuel == "Hydro"')
Also, lesson learned in the future to dig more before posting a question.

Related

Is there a range data structure in Scala?

I'm looking for a way to handle ranges in Scala.
What I need to do is:
given a set of ranges and a range(A) return the range(B) where range(A) intersect range (B) is not empty
given a set of ranges and a range(A) remove/add range(A) from/to the set of ranges.
given range(A) and range(B) create a range(C) = [min(A,B), max(A,B)]
I saw something similar in java - http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/collect/RangeSet.html
Though subRangeSet returns only the intersect values and not the range in the set (or list of ranges) that it intersects with.
RangeSet rangeSet = TreeRangeSet.create();
rangeSet.add(Range.closed(0, 10));
rangeSet.add(Range.closed(30, 40));
Range range = Range.closed(12, 32);
System.out.println(rangeSet.subRangeSet(range)); //[30,32] (I need [30,40])
System.out.println(range.span(Range.closed(30, 40))); //[12,40]
There is an Interval[A] type in the spire math library. This allows working with ranges of arbitrary types that define an Order. Boundaries can be inclusive, exclusive or omitted. So e.g. (-∞, 0.0] or [0.0, 1.0) would be possible intervals of doubles.
Here is a library intervalset for working with sets of non-overlapping intervals (IntervalSeq or IntervalTrie) as well as maps of intervals to arbitrary values (IntervalMap).
Here is a related question that describes how to use IntervalSeq with DateTime.
Note that if the type you want to use is 64bit or less (basically any primitive), IntervalTrie is extremely fast. See the Benchmarks.
As Tzach Zohar has mentioned in the comment, if all you need is range of Int - go for scala.collection.immutable.Range:
val rangeSet = Set(0 to 10, 30 to 40)
val r = 12 to 32
rangeSet.filter(range => range.contains(r.start) || range.contains(r.end))
If you need it for another underlying type - implement it by yourself, it's easy for your usecase.

How to assign -INFINITY amongst others to multiple variables?

I have this line of pseudocode that I am trying to translate in Matlab:
(maxSum, maxStartIndex, maxEndIndex) := (-INFINITY, 0, 0)
I have translated the second and third variables simply assigning a 0:
maxStartIndex=0;
maxEndIndex=0;
How should I translate this line?
maxSum= -INFINITY
I have not find reference for this.
Probably you're just looking for deal:
[maxSum, maxStartIndex, maxEndIndex] = deal(-inf,0,0)

Look for multiple values in a cell array at the same time in Matlab

I have CellArray1 with 50 unique strings and CellArray2 with 2000 unique strings (50 of which are the same as the ones in CellArray1). Is there a way to find the positions of all 50 unique strings from the first cell array in the second cell array without using loops?
Yes - the following code demonstrates this:
cellArray1 = {'hello', 'world'};
cellArray2 = {'good', 'morning', 'world'};
overlap = find(ismember(cellArray2, cellArray1)};
This will return the value 3 in overlap since cellArray2{3} appears in cellArray1.
UPDATE
The above code returns the indices, but not in the order of the original. If you need the original order, you can do the following
overlap = cellfun(#(x)find(ismember(cellArray2, x)), cellArray1, 'uniformOutput', false);
overlapSorted = cell2mat(overlap);
It could be argued that cellfun actually has an implicit loop in it (but then all vector operations have implicit loops, really); but one of these constructions will do what you asked for. If you don't need it sorted, the first will be significantly faster, I imagine.

Function equivalent to SUM() for multiplication in SQL Reporting

I'm looking for a function or solution to the following:
For the chart in SQL Reporting i need to multiply values from a Column A. For summation i would use =SUM(COLUMN_A) for the chart. But what can i use for multiplication - i was not able to find a solution so far?
Currently i am calculating the value of the stacked column as following:
=ROUND(SUM(Fields!Value_Is.Value)/SUM(Fields!StartValue.Value),3)
Instead of SUM i need something to multiply the values.
Something like that:
=ROUND(MULTIPLY(Fields!Value_Is.Value)/MULTIPLY(Fields!StartValue.Value),3)
EDIT #1
Okay tried to get this thing running.
The expression for the chart looks like this:
=Exp(Sum(Log(IIf(Fields!Menge_Ist.Value = 0, 10^-306, Fields!Menge_Ist.Value)))) / Exp(Sum(Log(IIf(Fields!Startmenge.Value = 0, 10^-306, Fields!Startmenge.Value))))
If i calculate my 'needs' manually i have to get the following result:
In my SQL Report i get the following result:
To make it easier, these are the raw values:
and you have the possibility to group the chart by CW, CQ or CY
(The values from the first pictures are aggregated Sum values from the raw values by FertStufe)
EDIT #2
Tried your expression, which results in this:
Just to make it clear:
The values in the column
=Value_IS / Start_Value
in the first picture are multiplied against each other
0,9947 x 1,0000 x 0,59401 = 0,58573
Diffusion Calenderweek 44 Sums
Startvalue: 1900,00 Value Is: 1890,00 == yield:0,99474
Waffer unbestrahlt Calenderweek 44 Sums
Startvalue: 620,00 Value Is: 620,00 == yield 1,0000
Pellet Calenderweek 44 Sums
Startvalue: 271,00 Value Is: 160,00 == yield 0,59041
yield Diffusion x yield Wafer x yield Pellet = needed Value in chart = 0,58730
EDIT #3
The raw values look like this:
The chart ist grouped - like in the image - on these fields
CY (Calendar year), CM (Calendar month), CW (Calendar week)
You can download the data as xls here:
https://www.dropbox.com/s/g0yrzo3330adgem/2013-01-17_data.xls
The expression i use (copy / past from the edit window)
=Exp(Sum(Log(Fields!Menge_Ist.Value / Fields!Startmenge.Value)))
I've exported the whole report result to excel, you can get it here:
https://www.dropbox.com/s/uogdh9ac2onuqh6/2013-01-17_report.xls
it's actually a workaround. But I am pretty sure is the only solution for this infamous problem :D
This is how I did:
Exp(∑(Log(X))), so what you should do is:
Exp(Sum(Log(Fields!YourField.Value)))
Who said math was worth nothing? =D
EDIT:
Corrected the formula.
By the way, it's tested.
Addressing Ian's concern:
Exp(Sum(Log(IIf(Fields!YourField.Value = 0, 10^-306, Fields!YourField.Value))))
The idea is change 0 with a very small number. Just an idea.
EDIT:
Based on your updated question this is what you should do:
Exp(Sum(Log(Fields!Value_IS.Value / Fields!Start_Value.Value)))
I just tested the above code and got the result you hoped for.

PyBrain how to interpret the results from net.activate?

I've trained a network on PyBrain for purpose of classification and am ready to fire away with specific input. However, when I do
classes = ['apple', 'orange', 'peach', 'banana']
data = ClassificationDataSet(len(input), 1, nb_classes=len(classes), class_labels=classes)
data._convertToOneOfMany( ) # recommended by PyBrain
fnn = buildNetwork( data.indim, 5, data.outdim, outclass=SoftmaxLayer )
trainer = BackpropTrainer( fnn, dataset=data, momentum=m, verbose=True, weightdecay=wd)
trainer.trainUntilConvergence(maxEpochs=80)
# stop training and start using my trained network here
output = fnn.activate(input)
As expected, I get a numeric value for "output", but is there a way to determine the predicted class label directly? Even if there's not one, how can I map the value of "output" to my class label? Thank you for your help.
When you say you get a numeric value for "output" do you mean a scalar (that is, not an array)? From my understanding of it, you should have gotten an array of four values (ie. as many as possible output classes you have). The biggest value in that array corresponds to the index of the class. I don't know if PyBrain provides an utility function to extract that, but you can do it like this:
class_index = max(xrange(len(output)), key=output.__getitem__)
class_name = classes[class_index]
Incidentally, you omitted the step in which you actually fill the data in the dataset.