Perl - determining the intersection of several numeric ranges - perl

I would like to be able to load long list of positive integer ranges and create a new "summary" range list that is the union of the intersections of each pairs of ranges. And, I want to do this in Perl. For example:
Sample ranges: (1..30) (45..90) (15..34) (92..100)
Intersection of ranges: (15..30)
The only way I could think of was using a bunch of nested if statements to determine the starting point of sample A, sample B, sample C, etc. and figure out the overlap this way, but it's not possible to do that with hundreds of sample, each containing numerous ranges.
Any suggestions are appreciated!

The first thing you should do when you need to do some thing is take a look at CPAN to see what tools are available of if someone has solved your problem for you already.
Set::IntSpan and Set::IntRange are on the first page of results for "set" on CPAN.
What you want is the union of the intersection of each pair of ranges, so the algorithm is as follows:
Create an empty result set.
Create a set for each range.
For each set in the list,
For each later set in the list,
Find the intersection of those two sets.
Find the union of the result set and this intersection. This is the new result set.
Enumerate the elements of the result set.

I don't have code to share, but I would expand each range into hash, or use a Set module, and then use intersection operations on the sets.

Related

Look up an account then average associated values excluding zeros

On one sheet, I have account code and in the cell next to it, I need to look up the account code on the next sheet to average the cost excluding those cells that are zero in col. b from the average calculation.
The answer for London should be: £496.33 but having tried various sumifs / countifs I cannot get it to work.
You probably need COUNTIFS which -- similar to the SUMIFS you are already using -- allows to define multiple critera and ranges.
So, if the column R contains the values, you want to build the average upon, and the column H in the respective row must equal $B$28 to be included in the sum, the respective COUNTIFS looks as follows
=SUMIFS('ESL Info'!$R:$R,'ESL Info'!H:H,$B$28)/COUNTIFS('ESL Info'!$H:$H,$B$28, 'ESL Info'!$R:$R, "<>0")
ie additionally to the value in the H-column to equal B28 it also requires the value R-column (ie the actual number you are summing up) to be different from 0
You could also add the same criteria 'ESL Info'!$R:$R, "<>0" to your SUMIFS, but that isn't necessary, because a 0 doesn't provide anything to you sum, thus it doesn't matter if it's included in the sum or not ...
And depending on the Excel version you are using, you may even have the AVERAGEIFS function available, which does exactly what you want
=AVERAGEIFS('ESL Info'!$R:$R,'ESL Info'!$H:$H;$B$28,'ESL Info'!$R:$R,"<>0")

How to divide an Aggregate and Sum function

I am working in Tableau and trying to create a formula that will return me the value of each customer that walks into a store by dividing Net Sales / Traffic. When I try to combine the two separate formulas, it gives me the following error: Cannot mix aggregate and non-aggregate arguments with this function. The two functions I created that I'm trying to divide are:
SOT = (SUM([Sales Net])-SUM([Sales Gcard Net]))/SUM([Traffic Perday]) and SOT Goal
When I look at it in Tableau, it's stating that SOT is an aggregate function. How do I work around this to be able to get
SOT / SOT Goal
Aggregate variables are values that are calculated in the view, and depend on the level of aggregation in Tableau. e.g. sum(Sales) will show different values in Tableau if it’s next to a Region dimension, or if it’s next to a Category dimension.
In order to avoid the errors you can use many solutions. My favorite is indeed LOD expressions. In your view, though I do not have required sample data and therefore, I cannot try my hands on different possibilities here, I suggest that this should work-
SOT = ({SUM([Sales Net])}-{SUM([Sales Gcard Net])})/{SUM([Traffic Perday])}
Do remember that this solution will over-ride your filters and if you are using filters you have to add all those to Context.
EDIT
While trying different possibilities remember these things...
{SUM([Sales])} will sum the sales over entire data and {} i.e. curly braces wrapped around the sum function will cause to return the value as non-aggregate. In other words, this will work as LOD and if you'll add this field to view, the sum of entire sales will be shown against each row.
{FIXED [DIMENSION NAME] : sum([Sales])} will sum sales separately for each Dimension value. Fixed statement (LOD) again returns the value as non-aggregate value. if you'll add this field to view, the sum of entire sales for that dimension will be shown against each dimension.

How to merge two lists(or arrays) while keeping the same relative order?

For example,
A=[a,b,c,d]
B=[1,2,3,4]
my question is: how to generate all possible ways to merge A and B, such that in the new list we can have a appears before b, b appears before c,etc., and 1 appears before 2, 2 appears before 3,etc.?
I can think of one implementation:
We choose 4 slots from 8,then for each possible selection, there are 2 possible ways--A first or B first.
I wonder is there a better way to do this?
EDIT:
I've just learned a more intuitive way--use recursion.
For each spot, there are two possible cases, either taken from A or taken from B; keep recursing until A or B is empty, and concatenate the remaining.
If the relative order is different than what constitutes a sorted list (I assume it is, because otherwise it would not be a problem), then you need to formalize the initial order. Multiple ways to do that. the easiest being remembering the index of each element in each list. Example: valid position for a is 1 in the first array [...]
Then you could just go ahead and join the lists, then generate all the permutations of elements. Any valid permutation is one that keeps the order relationship of the new indexes with the order you have stored
Example of one valid permutation array
a12b3cd4
You can know and check that this is valid permutation because the index of element 'a' is smaller than the index of b, and so on. and you know the indexes must be smaller because this is what you have formulated at the first step
Similarly an invalid permutation array is
ba314cd2
same way of checking

Minizinc, counting occurrences in array of pairs

I'm new to constraint programming and toying around with some basic operations. I want to count the number of occurrences of an arbitrary element x in an array of pairs.
For instance, the following array has 2 eights, and 1 of every other element.
sampleArray = [{8,13}, {21,34}, {8,55}]
I wonder how I am to extract this information, possibly using built-in functions.
I'm not sure I understand exactly what you want to do here. Do you want to count only the first element in the pair?
Note that the example you show is an array of sets, not a 2 dimensional matrix. Extracting and count the first(?) element in each pair is probably easier if you have a two dimensional matrix (constructed with array2d).
In general there are at least two global constraints that you can use for this: "count" and perhaps also "global_cardinality". See http://www.minizinc.org/2.0/doc-lib/doc-globals-counting.html

how to find all the possible longest common subsequence from the same position

I am trying to find all the possible longest common subsequence from the same position of multiple fixed length strings (there are 700 strings in total, each string have 25 alphabets ). The longest common subsequence must contain at least 3 alphabets and belong to at least 3 strings. So if I have:
String test1 = "abcdeug";
String test2 = "abxdopq";
String test3 = "abydnpq";
String test4 = "hzsdwpq";
I need the answer to be:
String[] Answer = ["abd", "dpq"];
My one problem is this needs to be as fast as possible. I am trying to find the answer with suffix tree, but the solution of suffix tree method is ["ab","pq"].Suffix tree can only find continuous substring from multiple strings.The common longest common subsequence algorithm cannot solve this problem.
Does anyone have any idea on how to solve this with low time cost?
Thanks
I suggest you cast this into a well known computational problem before you try to use any algorithm that sounds like it might do what you want.
Here is my suggestion: Convert this into a graph problem. For each position in the string you create a set of nodes (one for each unique letter at that position amongst all the strings in your collection... so 700 nodes if all 700 strings differ in the same position). Once you have created all the nodes for each position in the string you go through your set of strings looking at how often two positions share more than 3 equal connections. In your example we would look first at position 1 and 2 and see that three strings contain "a" in position 1 and "b" in position 2, so we add a directed edge between the node "a" in the first set of nodes of the graph and "b" in the second group of nodes (continue doing this for all pairs of positions and all combinations of letters in those two positions). You do this for each combination of positions until you have added all necessary links.
Once you have your final graph, you must look for the longest path; I recommend looking at the wikipedia article here: Longest Path. In our case we will have a directed acyclic graph and you can solve it in linear time! The preprocessing should be quadratic in the number of string positions since I imagine your alphabet is of fixed size.
P.S: You sent me an email about the biclustering algorithm I am working on; it is not yet published but will be available sometime this year (fingers crossed). Thanks for your interest though :)
You may try to use hashing.
Each string has at most 25 characters. It means that it has 2^25 subsequences. You take each string, calculate all 2^25 hashes. Then you join all the hashes for all strings and calculate which of them are contained at least 3 times.
In order to get the lengths of those subsequences, you need to store not only hashes, but pairs <hash, subsequence_pointer> where subsequence_pointer determines the subsequence of that hash (the easiest way is to enumerate all hashes of all strings and store the hash number).
Based on the algo, the program in the worst case (700 strings, 25 characters each) will run for a few minutes.