Count Aggregates in clingo - aggregate

Test data
addEmployee(EmplID, Name1, Name2, TypeOfWork, Salary, TxnDate)
addEmployee("tjb1998", "eva", "mcdowell", "ra", 55000, 20).
addEmployee("tjb1987x", "ben", "xena", "cdt", 68000, q50).
addEmployee("tjb2112", "ryoko", "hakubi", "ra", 63000, 60).
addEmployee("tjb1987", "ben", "croshaw", "cdt", 68000, 90).
addEmployee("tjb3300m", "amane", "mauna", "ma", 61000, 105).
I want to group by employees as per the type of work and count of employees for the particular type of work.
e.g:
ra 4
cdt 2
ma 1
below is the query I am trying to run
employee(TOW) :- addEmployee(_,_,_,TOW,_,_).
nmbrEmployeesOfSameType (N) :- N = #count { employee(TOW) }.
Please advise, I am at the beginner level for Clingo

Try this:
addEmployee("tjb1998", "eva", "mcdowell", "ra", 55000, 20).
addEmployee("tjb1987x", "ben", "xena", "cdt", 68000, q50).
addEmployee("tjb2112", "ryoko", "hakubi", "ra", 63000, 60).
addEmployee("tjb1987", "ben", "croshaw", "cdt", 60000, 90).
addEmployee("tjb3300m", "amane", "mauna", "ma", 61000, 105).
getType(P, X) :- addEmployee(X, _, _, P, _, _).
type(P) :- addEmployee(_, _, _, P, _, _).
result(P, S) :- S = #count{ I : getType(P,I)}, type(P).
#show result/2.
And the output will look like:
clingo version 4.5.3
Reading from test.lp
Solving...
Answer: 1
result("ra",2) result("cdt",2) result("ma",1)
SATISFIABLE
You can also copy my code and run it here to see if it works.

Related

Tbl_Strata count by distinct individual vs rows

How can I use tbl_strata and get the output to show counts by distinct individual rather than rows?
Also, how can I change the order that displays for the variable I am putting in the by= section in tbl_summary?
I have a long table AND a wide table with one row per patient. Not sure how to apply the wide table to this code. I can apply the long table but getting row counts instead of distinct patient counts.
I have included an example of the Long table I have and the wide table and what I would like the output to look like in the picture.
Example code:
#Wide Table Example
df_Wide <- data.frame(patientICN =c(1, 2, 3, 4, 5)
,testtype =c("liquid", "tissue", "tissue", "liquid", "liquid")
,gene1 =c("unk", "pos", "neg", "neg", "unk")
,gene2 =c("pos", "neg", "pos", "unk", "neg")
,gene3 =c("neg", "unk", "unk", "pos", "pos"))
#Long Table Example
df_Long <- data.frame(patientICN =c(1, 1, 2, 2, 3)
,testtype =c("liquid", "tissue", "tissue", "liquid", "liquid")
,gene =c("Gene1", "Gene2", "Gene3", "Gene1", "Gene2")
,result=c("Positve", "Negative", "Unknown","Positive","Unknown"))
#Table Categorized by testtype and result for long table
df_Long %>%
select (result, gene, testtype)%>%
mutate(testcategory=paste("TestType",testtype))%>%
tbl_strata(
strata=testtype,
.tbl_fun =
~.x %>%
tbl_summary(by=result,missing="no")%>%
add_n(),
.header= "**{strata}**, N={n}"
)
##above is giving multiple Rows per patient counts
Is this what you're after? You can install the bstfun pkg from my R-universe: https://ddsjoberg.r-universe.dev/ui#packages
library(gtsummary)
library(dplyr)
#Long Table Example
df_Long <- data.frame(patientICN =c(1, 1, 2, 2, 3)
,testtype =c("liquid", "tissue", "tissue", "liquid", "liquid")
,gene =c("Gene1", "Gene2", "Gene3", "Gene1", "Gene2")
,result=c("Positive", "Negative", "Unknown","Positive","Unknown"))
tbl <-
df_Long %>%
tidyr::pivot_wider(
id_cols = c(patientICN, testtype),
names_from = gene,
values_from = result,
values_fill = "Unknown"
) %>%
mutate(across(starts_with('Gene'), ~factor(.x, levels = c("Positive", "Negative", "Unknown")))) %>%
tbl_strata(
strata = testtype,
~ .x %>%
bstfun::tbl_likert(
include = starts_with("Gene")
)
)
Created on 2022-10-06 with reprex v2.0.2

How to do handle this use-case (running-window data) in spark

I am using spark-sql-2.4.1v with java 1.8.
Have source data as below :
val df_data = Seq(
("Indus_1","Indus_1_Name","Country1", "State1",12789979,"2020-03-01"),
("Indus_1","Indus_1_Name","Country1", "State1",12789979,"2019-06-01"),
("Indus_1","Indus_1_Name","Country1", "State1",12789979,"2019-03-01"),
("Indus_2","Indus_2_Name","Country1", "State2",21789933,"2020-03-01"),
("Indus_2","Indus_2_Name","Country1", "State2",300789933,"2018-03-01"),
("Indus_3","Indus_3_Name","Country1", "State3",27989978,"2019-03-01"),
("Indus_3","Indus_3_Name","Country1", "State3",30014633,"2017-06-01"),
("Indus_3","Indus_3_Name","Country1", "State3",30014633,"2017-03-01"),
("Indus_4","Indus_4_Name","Country2", "State1",41789978,"2020-03-01"),
("Indus_4","Indus_4_Name","Country2", "State1",41789978,"2018-03-01"),
("Indus_5","Indus_5_Name","Country3", "State3",67789978,"2019-03-01"),
("Indus_5","Indus_5_Name","Country3", "State3",67789978,"2018-03-01"),
("Indus_5","Indus_5_Name","Country3", "State3",67789978,"2017-03-01"),
("Indus_6","Indus_6_Name","Country1", "State1",37899790,"2020-03-01"),
("Indus_6","Indus_6_Name","Country1", "State1",37899790,"2020-06-01"),
("Indus_6","Indus_6_Name","Country1", "State1",37899790,"2018-03-01"),
("Indus_7","Indus_7_Name","Country3", "State1",26689900,"2020-03-01"),
("Indus_7","Indus_7_Name","Country3", "State1",26689900,"2020-12-01"),
("Indus_7","Indus_7_Name","Country3", "State1",26689900,"2019-03-01"),
("Indus_8","Indus_8_Name","Country1", "State2",212359979,"2018-03-01"),
("Indus_8","Indus_8_Name","Country1", "State2",212359979,"2018-09-01"),
("Indus_8","Indus_8_Name","Country1", "State2",212359979,"2016-03-01"),
("Indus_9","Indus_9_Name","Country4", "State1",97899790,"2020-03-01"),
("Indus_9","Indus_9_Name","Country4", "State1",97899790,"2019-09-01"),
("Indus_9","Indus_9_Name","Country4", "State1",97899790,"2016-03-01")
).toDF("industry_id","industry_name","country","state","revenue","generated_date");
Query :
val distinct_gen_date = df_data.select("generated_date").distinct.orderBy(desc("generated_date"));
For each "generated_date" in list distinct_gen_date , need to get all unique industry_ids for 6 months data
val cols = {col("industry_id")}
val ws = Window.partitionBy(cols).orderBy(desc("generated_date"));
val newDf = df_data
.withColumn("rank",rank().over(ws))
.where(col("rank").equalTo(lit(1)))
//.drop(col("rank"))
.select("*");
How to get moving aggregate (on unique industry_ids for 6 months data ) for each distinct item , how to achieve this moving aggregation.
more details :
Example, in the given sample data given , assume, is from "2020-03-01" to "2016-03-01". if some industry_x is not there in "2020-03-01", need to check "2020-02-01" "2020-01-01","2019-12-01","2019-11-01","2019-10-01","2019-09-01" sequentically whenever we found thats rank-1 is taken into consider for that data set for calculating "2020-03-01" data......we next go .."2020-02-01" i.e. each distinct "generated_date".. for each distinct date go back 6 months get unique industries ..pick rank 1 data...this data for ."2020-02-01"...next pick another distinct "generated_date" and do same so on .....here dataset keep changing....using for loop I can do but it is not giving parallesm..how to pick distinct dataset for each distinct "generated_date" parallell ?
I don't know how to do this with window functions but a self join can solve your problem.
First, you need a DataFrame with distinct dates:
val df_dates = df_data
.select("generated_date")
.withColumnRenamed("generated_date", "distinct_date")
.distinct()
Next, for each row in your industries data you need to calculate up to which date that industry will be included, i.e., add 6 months to generated_date. I think of them as active dates. I've used add_months() to do this but you can think of different logics.
import org.apache.spark.sql.functions.add_months
val df_active = df_data.withColumn("active_date", add_months(col("generated_date"), 6))
If we start with this data (separated by date just for our eyes):
industry_id generated_date
(("Indus_1", ..., "2020-03-01"),
("Indus_1", ..., "2019-12-01"),
("Indus_2", ..., "2019-12-01"),
("Indus_3", ..., "2018-06-01"))
It has now:
industry_id generated_date active_date
(("Indus_1", ..., "2020-03-01", "2020-09-01"),
("Indus_1", ..., "2019-12-01", "2020-06-01"),
("Indus_2", ..., "2019-12-01", "2020-06-01")
("Indus_3", ..., "2018-06-01", "2018-12-01"))
Now proceed with self join based on dates, using the join condition that will match your 6 month period:
val condition: Column = (
col("distinct_date") >= col("generated_date")).and(
col("distinct_date") <= col("active_date"))
val df_joined = df_dates.join(df_active, condition, "inner")
df_joined has now:
distinct_date industry_id generated_date active_date
(("2020-03-01", "Indus_1", ..., "2020-03-01", "2020-09-01"),
("2020-03-01", "Indus_1", ..., "2019-12-01", "2020-06-01"),
("2020-03-01", "Indus_2", ..., "2019-12-01", "2020-06-01"),
("2019-12-01", "Indus_1", ..., "2019-12-01", "2020-06-01"),
("2019-12-01", "Indus_2", ..., "2019-12-01", "2020-06-01"),
("2018-06-01", "Indus_3", ..., "2018-06-01", "2018-12-01"))
Drop that auxiliary column active_date or even better, drop duplicates based on your needs:
val df_result = df_joined.dropDuplicates(Seq("distinct_date", "industry_id"))
Which drops the duplicated "Indus_1" in "2020-03-01" (It appeared twice because it's retrieved from two different generated_dates):
distinct_date industry_id
(("2020-03-01", "Indus_1"),
("2020-03-01", "Indus_2"),
("2019-12-01", "Indus_1"),
("2019-12-01", "Indus_2"),
("2018-06-01", "Indus_3"))

Multiple OR clauses

I have a list that I'd like to turn onto a where clause using ors:
my_list = [
%{min: 1, max: 10},
%{min: 4, max: 16},
%{min: 5, max: 21}
]
The where clause I'm intended to construct is going to be a part of another, bigger query.
Right now I have to write the following code (note the duplication):
query =
my_list
|> Enum.with_index()
|> Enum.reduce(bigger_query, fn {item, index}, q ->
if index == 0 do
where(q, [table], fragment("? BETWEEN ? AND ?", table.age, ^item["min"], ^item["max"]))
else
or_where(q, [table], fragment("? BETWEEN ? AND ?", table.age, ^item["min"], ^item["max"]))
end
end)
Can this query be rewritten simpler in Ecto terms? If yes, how?
Update. Consider this example:
query = from posts in Post, where: posts.author_id == ^12
query =
my_list
|> Enum.with_index()
|> Enum.reduce(bigger_query, fn {item, index}, q ->
if index == 0 do
where(q, [posts], fragment("? BETWEEN ? AND ?", posts.author_age, ^item["min"], ^item["max"]))
else
or_where(q, [posts], fragment("? BETWEEN ? AND ?", posts.author_age, ^item["min"], ^item["max"]))
end
end)
query = where(q, [posts], posts.published == ^true)
The query is generated correctly:
SELECT * FROM posts WHERE author_id = 12 AND ((age BETWEEN 1 AND 10) OR (age BETWEEN 4 AND 16) AND (age BETWEEN 5 AND 21))
What would be the most compact Ecto code to achieve this?
You don't need to put the first condition in where. or_where works just fine even if there are no where already in the query. The following should work just fine:
query = Enum.reduce(my_list, bigger_query, fn item, q ->
or_where(q, [table], fragment("? BETWEEN ? AND ?", table.age, ^item["min"], ^item["max"]))
end)
You can simplify this further using pattern matching (assuming the keys are strings and you put atoms as keys in the first code snippet in the question by mistake):
query = Enum.reduce(my_list, bigger_query, fn %{"min": min, "max": max}, q ->
or_where(q, [table], fragment("? BETWEEN ? AND ?", table.age, ^min, ^max))
end)

F# Navigate object graph to return specific Nodes

I'm trying to build a list of DataTables based on DataRelations in a DataSet, where the tables returned are only included by their relationships with each other, knowing each end of the chain in advance. Such that my DataSet has 7 Tables. The relationships look like this:
Table1 -> Table2 -> Table3 -> Table4 -> Table5
-> Table6 -> Table7
So given Table1 and Table7, I want to return Tables 1, 2, 3, 6, 7
My code so far traverses all the relations and returns all Tables so in the example it returns Table4 and Table5 as well. I've passed in the first, last as arguments, and know that I'm not yet using the last one yet, I am still trying to think how to go about it, and is where I need the help.
type DataItem =
| T of DataTable
| R of DataRelation list
let GetRelatedTables (first, last) =
let rec flat_rec dt acc =
match dt with
| T(dt) ->
let rels = [ for r in dt.ParentRelations do yield r ]
dt :: flat_rec(R(rels)) acc
| R(h::t) ->
flat_rec(R(t)) acc # flat_rec(T(h.ParentTable)) acc
| R([]) -> []
flat_rec first []
I think something like this would do it (although I haven't tested it). It returns DataTable list option because, in theory, a path between two tables might not exist.
let findPathBetweenTables (table1 : DataTable) (table2 : DataTable) =
let visited = System.Collections.Generic.HashSet() //check for circular references
let rec search path =
let table = List.head path
if not (visited.Add(table)) then None
elif table = table2 then Some(List.rev path)
else
table.ChildRelations
|> Seq.cast<DataRelation>
|> Seq.tryPick (fun rel -> search (rel.ChildTable::path))
search [table1]

crystal report error: "a string is required here" on select

I run the following code in crystal reports and it tells me that
" string is required here"
on the second case. I've tried casting {?Pm-Student_course_attendance_csv.Level} as a string, but no luck
Global NumberVar grade;
Global NumberVar baseline;
Select ({?Pm-Student_course_attendance_csv.Level})
Case "R", "TL", "TU", "OTH", "P", "BTECFS":
If (CStr({PM_csv.mv_value}) = "Pass") Then
"On Track"
Else
"Below"
Case "H", "ASD", "AD", "AS", "G": //error from this line onwards inclusive
Select (CStr({PM_csv.mv_value}))
Case "A*":
grade := 11
Case "A*/A":
grade := 10
Case "A":
grade := 9
Case "A/B":
grade := 8
Case "B":
grade := 7
Case "B/C":
grade := 6
Case "C":
grade := 5
Case "C/D":
grade := 4
Case "D":
grade := 3
Case "D/E":
grade := 2
Case "E":
grade := 1
Case "U":
grade := 0
Default :
grade := 0;
Your formula cannot have two different return types. From the error and above it returns a string, but below it you are trying to return a number (A formula's return type is also the last assignment, so if the last thing your formula does is grade:=1 then the formula tries to return the value 1, which it obviously won't allow you to do. You have to add a string return value for each case in the second half of the formula.
...
Case "H", "ASD", "AD", "AS", "G": //error from this line onwards inclusive
Select (CStr({PM_csv.mv_value}))
Case "A*":
grade := 11;
"Return value" //<---- Must be a string
Case "A*/A":
...