Power Automate - calculating azure devops data - azure-devops

I’m looking for some help creating a flow to calculate work item progress in Azure Devops.
When a task’s state gets updated, I want to get data from the task Parent (Feature) and find all the children (siblings) tasks. (working).
With the list of child/sibling task ID’s I want to get a total number of siblings, and the state value for each task. (working)
With the total number of child/siblings, I want to get the total possible score (working)
Then I want to split/parse the state value so it’s only the first two characters, convert the value to an integer, and get a sum of the state value. (this is where I need help)
Note: When the state value is 07 it’s considered 100% complete. For Example, if I have 5 items and everything is 07/complete the total score would be 35.
Example: 1 feature found 5 child tasks.
Example
example output from "Create HTML Table"
Example
I'd like help with an expression to take the output from the table, and Then Sum the actual score, EG 03 + 05 + 01 + 07 + 03 = (19)
then divide the actual score (19) by the total possible score (35) = .54 then *100 to get 54%
Example
Feels like there should be a simple way to do this.
Any help is greatly appreciated.
Cheers,
JG

Related

I need help in data sanitization problem in tableau

I trying doing the manual sanitization, however I am getting a type mismatch error in performing the calculations.
I also need help in sanitizing the data and getting the insight as per the below instructions:
The column sellerproductcount gives you the count of products in the
form '1-16 of over 100,000 results' , and you can parse out the product count 100,000.
sellerratings - this columns gives you the % and count of positive ratings (e.g. 88% positive
in the last 12 months (118 ratings) ) if parsed correctly
sellerdetails - you can use this text to parse out phone numbers, and email IDs of
merchants, where available, so our team can reach out to them.
businessaddress - this will give you the business locations of the sellers. You can parse them
to identify if a seller is registered in the US , Germany (DE), or China (CN).
Hero Product 1 #ratings and Hero Product 2 #ratings - these 2 columns give you the number of
ratings of the 2 'hero products' or bestselling products of this seller.
I have attached the dataset for the same.
https://docs.google.com/spreadsheets/d/1PSqRCnmFgq7v7RzZaCXXoV0Edp_vM7QO/edit?usp=sharing&ouid=115547990006782902200&rtpof=true&sd=true
Most of this type of data prep can be done with string & RegEx functions like REGEX_MATCH(). Here are a few examples based on the data you shared:
Seller Product Count
INT(REGEXP_EXTRACT([Sellerproductcount], '(\d*,?\d*) results'))
1-16 of over 6,000 results >> 6000
Seller Rating (Percentage)
INT(REGEXP_EXTRACT([Sellerratings], '(\d*)% positive'))
92% positive in the last 12 months (181 ratings) >> 92
Seller Rating (Count)
INT(REGEXP_EXTRACT([Sellerratings], '(\d*) (?:total )?ratings'))
92% positive in the last 12 months (181 ratings) >> 181
Business Country Code
RIGHT([Businessaddress],2)
AM Treptower Park28-30Berlin12435DE >> DE
These examples all have very straightforward patterns that are present in all rows so they can be done pretty easily with one simple calculation. However, something like sellerdetails which is unstructured, inconsistent, and sometimes incomplete will be a bit more of a challenge. You will need to use a couple of different calculations and techniques combined together to find what you are looking for, as well as some manual data prep. Here's an example of how you can pull out email but it won't work for everything:
Email
REGEXP_EXTRACT([Sellerdetails], '([a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*)')
Good luck with your data cleaning, I suggest using sites like https://regex101.com/ and https://regexr.com/ to learn more about and help test regular expressions.

Orange3 Frequent Itemsets performance

I just realized that the performance of Frequent Itemsets is strongly correlated with number of item per basket. I run the following code:
import datetime
from datetime import datetime
from orangecontrib.associate.fpgrowth import *
%%time
T = [[1,3, 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]]
itemsets = frequent_itemsets(T, 1)
a=list(itemsets)
As I increased the number of item in T the running time increased as follow:
Item running time
21 3.39s
22 9.14s
23 15.8s
24 37.4s
25 1.2 min
26 10 min
27 35 min
28 95 min
For 31 item sets it took 10 hours without returning any results. I am wondering if there is anyway to run it for more than 31 items in reasonable time? In this case I just need pairwise item set (A-->B) while my understanding is frequent_itemsets count all possible combination and that is probably why it is running time is highly correlated with number of items. Is there any way to tell the method to limit the number item to count like instead of all combination just pairwise?
You could use other software that allows to specify constraints on the itemsets such as a length constraints. For example, you can consider the SPMF data mining library (disclosure: I am the founder), which offers about 120 algorithms for itemset and pattern mining. It will let you use FPGrowth with a length constraint. So you could for example mine only the patterns with 2 items or 3 items. You could also try other features such as mining association rules. That software works on text file and can be called for the command line, and is quite fast.
A database of a one single transaction of 21 items results in 2097151 itemsets.
>>> T = [list(range(21))]
>>> len(list(frequent_itemsets(T, 1)))
2097151
Perhaps instead of absolute support set as low as a single transaction (1), choose the support to be e.g. 5% of all transactions (.05).
You should also limit the returned itemsets to contain exactly two items (antecedent and consequent for later association rule discovery), but the runtime will still be high due to, as you understand, sheer combinatorics.
len([itemset
for itemset, support in frequent_itemsets(T, 1)
if len(itemset) == 2])
At the moment, there is no such filtering available inside the algorithm, but the source is open to tinkering.

reshape and merge in stata

I have three data sets:
First, called education.dta. It contains individuals(students) over many years with their achieved educations from yr 1990-2000. Originally it is in wide format, but I can easily reshape it to long. It is presented as wide under:
id educ_90 educ_91 ... educ_00 cohort
1 0 1 1 87
2 1 1 2 75
3 0 0 2 90
Second, called graduate.dta. It contains information of when individuals(students) have finished high school. However, this data set do not contain several years only a "snapshot" of the individ when they finish high school and characteristics of the individual students such as backgroung (for ex parents occupation).
id schoolid county cohort ...
1 11 123 87
2 11 123 75
3 22 243 90
The third data set is called teachers.dta. It contains informations about all teachers at high school such as their education, if they work full or part time, gender... This data set is long.
id schoolid county year education
22 11 123 2011 1
21 11 123 2001 1
23 22 243 2015 3
Now I want to merge these three data sets.
First, I want to merge education.dta and graduate.dta on id.
Problem when education.dta is wide: I manage to merge education and graduation.dta. Then I make a loop so that all the variables in graduation.dta takes the same over all years, for eksample:
forv j=1990/2000 {
gen county j´=.
replace countyj´=county
}
However, afterwards when reshaping to long stata reposts that variable id does not uniquely identify the observations.
further, I have tried to first reshape education.dta to long, and thereafter merge either 1:m or m:1 with education as master, using graduation.dta.
However stata again reposts that id is not unique. How do I deal with this?
In next step I want to merge the above with teachers.dta on schoolid.
I want my final dataset in long format.
Thanks for your help :)
I am not certain that I have exactly the format of your data, it would be helpful if you gave us a toy dataset to look at using dataex (and could even help you figure out the problem yourself!)
But to start, because you are seeing that id is not unique, you need to figure out why there might be multiple ids in any of the datasets. Can someone in graduate.dta or education.dta appear more than once? help duplicates will probably be useful to explore the data in this way.
Because you want your dataset in long format I suggest reshaping education.dta to long first, then doing something like merge m:1 id using "graduate.dta" (once you figure out why some observations are showing up more than once) and then, finally something like merge 1:1 schoolid year using "teacher.dta" and you will have your final dataset.

BIRT Mathematics on data set

I am pretty new to Birt reporting, i can make simple charts and tables, bet when it comes to some calculation of values from data sets - i have no clue in which direction should i watch.
For example i have this simple data set:
count1 count2 max type lenght
616 3858 21 STEEL 20
723 4432 14 STEEL 40
854 5869 21 ALL 20
838 5225 14 ALL 40
And i would like to have Birt to calculate approximately this:
SUM(count2)/SUM(count1) WHERE type=ALL
so this
((5869+5225)/(854+838))
My question would be how could i get there. At this point i think i would need just the right direction how these kind of operations could be made.
Thanks in advance.
I presume you want to display this value somewhere on the report, like at the bottom of a table where this data is presented. If this is the case then add an aggregation data element to the footer of the table where the data is displayed. In the aggregation data element you will have the opportunity to specify an expression SUM(count2) and a filter condition [type=ALL]. You can then do the division in another data element.
If you want to simply compute the value you described you you could do it with SQL, i.e. SELECT SUM(count2)/SUM(count1) FROM myTable WHERE type='ALL'. You would have to provide the data to the report as a data source which could be in the form of an excel spreadsheet, *.csv, database connection, etc.
So there are a few ways to do this depending on what your requirements are.

Summarizing each subgroup in Group Footer - Crystal Reports

I'm trying to work on a report for a client. Basically I need something like such
Group 1: Customer ID
Group 2: Truck ID
CustID Vehicle ID Detention Time
------ ---------- --------------
ABX 100 60
35
20
TOTAL: 115
200 80
15
TOTAL: 95
300 10
TOTAL: 10
TOTALS FOR CUSTOMER ABX
100 115
200 95
300 10
Is there anyway to accomplish this without a subreport? I was hoping for a "summary field" that I could summarize more than just a single value.
Thanks!
(FYI using Crystal Reports 2008)
Use a crosstab; place it in the report-footer section.
There might be a better way to do this, but the one that comes to mind is to use two arrays: One to store the truck ID and another to store the corresponding total. In each inner grouping (TruckID), just tack on another array element and store its total time. To display, you could cast the values to strings, attach a newline character after each entry, and set the field to "Can Grow". So altogether, you'd need three formulas: one to initialize the arrays (in GH1), one to update the arrays with sum({truck.time},{truck.ID}) (in GF2), and one to display each entry (in GF1).
With that being said, CR has terrible support for containers... You're limited to 1-dimensional, non-dynamic arrays that are gimped at 1000 items max. It doesn't sound like these would be big problems for what you're trying to do, but you will need to redim preserve the arrays unless you know ahead of time how many trucks you'll have per customer.