I'm loading animal data with a lot of duplicates which I'm trying to merge into one agent representing a single animal. The csv file looks like this:
Animal ID - Group ID
1 - A
2 - A
3 - A
4 - A
1 - B
2 - B
And for this example, i'm hoping to produce 4 unique animal agents that have a list of the groups they are associated with. Animal 1's list would be [A, B] and Animal 4's list would be just [A].
So far, I'm loading the csv using:
csv:from-row file-read-line
create-animal 1 [
set Animal-ID item 0 data
set group-ID item 1 data]
Which produces 6 animals with one group id each.
But how should I cull the duplicate animals?
If you have a csv that looks like:
id, group
1,A
2,A
3,A
4,A
1,B
2,B
You can load the csv as a list, pull unique animals, then filter the original list using the unique animal ids to grab the unique groups to which that animal belongs:
extensions [ csv ]
breed [ animals animal ]
animals-own [ animal-id group-id ]
to setup
ca
; Load animal data as a list of lists, drop the headers
let animalData but-first csv:from-file "exampleAnimals.csv"
print animalData
; Get the unique animal ids
let animalIds remove-duplicates map [ i -> first i ] animalData
print animalIds
foreach animalIds [ id ->
create-animals 1 [
; Set the id
set animal-id id
; Filter the animal data by the id of the current animal
let filtered filter [ row -> first row = id ] animalData
; Map to pull the group id as a list and assign to the animal
set group-id map [ i -> last i ] filtered
fd 1
]
]
reset-ticks
end
Hope that helps!
Related
I have a table in qlik Sense loaded from the database.
Example:
ID
FRUIT
VEG
COUNT
1
Apple
5
2
Figs
10
3
Carrots
20
4
Oranges
12
5
Corn
10
From this I need to make a filter that will display all the Fruit/Veg records along with records from other joined tables, when selected.
The filter needs to be something like this :
|FRUIT_XXX|
|VEG_XXX |
Any help will be appreciated.
I do not know how to do it in qlicksense, but in SQL it's like this:
SELECT
ID
CASE WHEN FRUIT IS NULL THEN VEG ELSE FRUIT END as FruitOrVeg,
COUNT
FROM tablename
Not sure if its possible to be dynamic. Usually I solve these by creating a new field that combines the values from both fields into one field
RawData:
Load * Inline [
ID , FRUIT ,VEG , COUNT
1 , Apple , , 5
2 , Figs , , 10
3 , ,Carrots , 20
4 , Oranges , , 12
5 , ,Corn , 10
];
Combined:
Load
ID,
'FRUIT_' & FRUIT as Combined
Resident
RawData
Where
FRUIT <> ''
;
Concatenate
Load
ID,
'VEG_' & VEG as Combined
Resident
RawData
Where
VEG <> ''
;
This will create new table (Combined) which will be linked to the main table by ID field:
The new Combined field will have the values like this:
And the UI:
P.S. If further processing is needed you can join the Combined table to the RawData table. This way the Combined field will become part of the RawData table. To achieve this just extend the script a bit:
join (RawData)
Load * Resident Combined;
Drop Table Combined;
I have a table agent with an id and parent id. This is an adjacency list.
Each agent can sponsor multiple agents.
ID
sponsorId
1
null
2
1
3
2
4
1
5
3
I want to keep the query with a maximum of 4 level of depth.
I already build a recursive like this to get children per level for a specific user, here is the query
WITH RECURSIVE subordinates(id, sponsorId, LEVEL) AS(
SELECT "id","sponsorId",0 AS LEVEL
FROM "Agent" AS t
WHERE id = 1
UNION ALL
SELECT e."id", e."sponsorId", s."level" + 1 AS leve
FROM "Agent" AS e
INNER JOIN subordinates s ON s."id" = e."sponsorId"
WHERE LEVEL < 4 )
SELECT s."level", array_agg(id ORDER BY s."level") AS childrens
FROM subordinates AS s
GROUP BY s."level";
This query give an array of ids per level for a specific user:
ID
sponsorId
0
[1]
1
[2, 4]
2
[3]
3
[5]
4
[]
But I also want the list of all user with a column per level and a total column, like this but I can't get to make this working, like in this table
ID
level1
level2
level 3
level 4
1
[2, 4]
[3]
[5]
[]
2
[3]
[]
[]
[]
3
[5]
[]
[]
[]
4
[]
[]
[]
[]
5
[]
[]
[]
[]
Could you help me writing this query ? I have only small experience with recursive query?
I try a lot of different things to make it works but I don't get how I can create a specific column a bundle the result of the recursive iteration on this new column.
I did manage to get children array for a specific user but the result I get is "vertical", I would like to get it like it the example table above.
I guess I can maybe find something like PIVOT to achieve that, but I never used it so I don't really know how to handle that.
I want to Merge three List and retrieve each list with a key
List1 =[0.1,0.2,0.3,.....];
List2 =[0.5,0.10,6.0,......];
List3 =[1,2,3,.....]
I want to retrieve the All Lists into a dynamic Data Table Like
Table Row => [ list1[index] , list2[index] , list3[index] ]
I'm trying to efficiently find the top entries by group in Arango (AQL). I have a fairly standard object collection and an edge collection representing Departments and Employees in that department.
Example purpose: Find the top 2 employees in each department by most years of experience.
Sample Data:
"departments" is an object collection. Here are some entries:
_id
name
departments/1
engineering
departments/2
sales
"dept_emp_edges" is an edge collection connecting departments and employee objects by ids.
_id
_from
_to
years_exp
dept_emp_edges/1
departments/1
employees/1
3
dept_emp_edges/2
departments/1
employees/2
4
dept_emp_edges/3
departments/1
employees/3
5
dept_emp_edges/4
departments/2
employees/1
6
I would like to end up with the top 2 employees per department by most years experience:
department
employee
years_exp
departments/1
employee/3
5
departments/1
employee/2
4
departments/2
employee/1
6
Long Working Query
The following query works! But is a bit slow on larger tables and feels inefficient.
FOR dept IN departments
LET top2earners = (
FOR dep_emp_edge IN dept_emp_edges
FILTER dep_emp_edge._from == dept._id
SORT dep_emp_edge.years_exp DESC
LIMIT 2
RETURN {'department': dep_emp_edge._from,
'employee': dep_emp_edge._to,
'years_exp': dep_emp_edge.years_exp}
)
FOR row in top2earners
return {'department': dep_emp_edge._from,
'employee': dep_emp_edge._to,
'years_exp': dep_emp_edge.years_exp}
I don't like this because there is 3 loops in here and feels rather inefficient.
Short Query
However, I tried to write:
FOR dept IN departments
FOR dep_emp_edge IN dept_emp_edges
FILTER dep_emp_edge._from == dept._id
SORT dep_emp_edge.years_exp DESC
LIMIT 2
RETURN {'department': dep_emp_edge._from,
'employee': dep_emp_edge._to,
'years_exp': dep_emp_edge.years_exp}
But this last query only outputs the final department top 2 results. Not all of the top 2 in each department.
My questions are: (1) why doesn't the second shorter query give all results? and (2) I'm quite new to Arango and ArangoQL, what other things can I do to make sure this is efficient?
Your first query is incorrect as written (Query: AQL: collection or view not found: dep_emp_edge (while parsing)) - as I could only guess what you mean, I ignore it for now.
Your smaller query limits the overall results to two - counter intuitively - as you are not grouping by department.
I suggest a slightly different approach: Use the edge collection as central source and group by _from, returning one document per department, containing an array of the two top resulting employees (should they exist), not one document per employee:
FOR edge IN dept_emp_edges
SORT edge.years_exp DESC
COLLECT dep = edge._from INTO deps
LET emps = (
FOR e in deps
LIMIT 2
RETURN ZIP(["employee", "years_exp"], [e.edge._to, e.edge.years_exp])
)
RETURN {"department": dep, employees: emps}
For your example database this returns:
[
{
"department": "departments/1",
"employees": [
{
"employee": "employees/3",
"years_exp": 5
},
{
"employee": "employees/2",
"years_exp": 4
}
]
},
{
"department": "departments/2",
"employees": [
{
"employee": "employees/1",
"years_exp": 6
}
]
}
]
If the query is too slow, an index on the year_exp-field of the dept_emp_edges collection could help (Explain suggests it would).
How to store and retrieve array of data in orientDB. eg array of hobbies
{
'hobbies':[
'cooking',
'dancing'
]
}
Try the following:
INSERT INTO V SET hobbies = [ 'cooking', 'dancing' ]
And
SELECT FROM V WHERE hobbies CONTAINS 'cooking'