how to make an indented tree in vega - visualization

I'm trying to create an indented tree e.g. as in https://observablehq.com/#d3/indented-tree
I think that what this example does which I can't replicate in vega is encapsulated in this code:
root = { let i = 0; return d3.hierarchy(data).eachBefore(d => d.index = i++); }
eachBefore is a pre-order traversal on the output of d3.hierarchy.
Is there any way to get this result from (upstream) vega, or is this a feature request for an index output from the tree transform? (/something similar, or else a custom transform)
By the way, I think it may be easy to turn the specific tree layout example into an indented tree because the id happens to give the same 'index' ordering (I think), but think we need to use eachBefore where the data isn't so conveniently ordered.
Thanks for any suggestions!
Declan
Update
I made a change in vega-hierarchy described here:
https://github.com/declann/vega/commit/a651ff36cd3f0897054aa1b236f82e701db62432
Now I can use pre_traversal_id from tree transform output to do indented trees, e.g.:
indented tree in (custom) vega-editor, with tree output including pre_traversal_id field
Modified spec: https://gist.github.com/declann/91fd150ae04016e5890a30295fa58a07

Not sure if this help, but, when I enter at vega.github.io/vega/examples/tree-layout I played with the controls and (after change the settings to: layout:tidy - links:orthogonal - separation:true) I got a similar result you shown in the observablehq page:
Open the Chart in the Vega Editor
Code:
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"description": "An example of Cartesian layouts for a node-link diagram of hierarchical data.",
"width": 600,
"height": 1600,
"padding": 5,
"signals": [
{
"name": "labels", "value": true
},
{
"name": "layout", "value": "tidy"
},
{
"name": "links", "value": "orthogonal"
}
],
"data": [
{
"name": "tree",
"url": "data/flare.json",
"transform": [
{
"type": "stratify",
"key": "id",
"parentKey": "parent"
},
{
"type": "tree",
"method": {"signal": "layout", "value": "tidy"},
"size": [{"signal": "height"}, {"signal": "width - 100"}],
"separation": true,
"as": ["y", "x", "depth", "children"]
}
]
},
{
"name": "links",
"source": "tree",
"transform": [
{ "type": "treelinks" },
{
"type": "linkpath",
"orient": "horizontal",
"shape": {"signal": "links"}
}
]
}
],
"scales": [
{
"name": "color",
"type": "linear",
"range": {"scheme": "greys"},
"domain": {"data": "tree", "field": "depth"},
"zero": true
}
],
"marks": [
{
"type": "path",
"from": {"data": "links"},
"encode": {
"update": {
"path": {"field": "path"},
"stroke": {"value": "#828282"}
}
}
},
{
"type": "symbol",
"from": {"data": "tree"},
"encode": {
"enter": {
"size": {"value": 25}
},
"update": {
"x": {"field": "x"},
"y": {"field": "y"},
"fill": {"field": "depth"}
}
}
},
{
"type": "text",
"from": {"data": "tree"},
"encode": {
"enter": {
"text": {"field": "name"},
"baseline": {"value": "middle"}
},
"update": {
"x": {"field": "x"},
"y": {"field": "y"},
"dx": {"signal": "datum.children ? -7 : 7"},
"align": {"signal": "datum.children ? 'right' : 'left'"}
}
}
}
]
}

I believe this is possible by adding expression transforms for x (based on tree_node depth) and y (based on the tree node id)
The gist of it is to transform x and y after the tree transform
{"type": "stratify", "key": "id", "parentKey": "parent"},
{
"type": "tree",
"method": {"signal": "layout"},
"size": [{"signal": "height"}, {"signal": "width - 100"}],
"separation": {"signal": "separation"},
"as": ["_", "_", "depth", "children"]
},
{"type": "formula", "expr": "datum.depth * 20", "as": "x"},
{"type": "formula", "expr": "datum.id * 14", "as": "y"}
Here's an example that modifies the Vega Tree layout example

Related

How to group by single field and return more values together

I'm starting to use apache druid but having some difficult to run native queries (and some SQL too).
1- Is it possible to groupBy a single column while also returning more channels?
2- How could I groupBy a single column, while returning different grouped itens on same query/row ?
Query I'm trying to use:
{
"queryType": "groupBy",
"dataSource": "my-data-source",
"granularity": "all",
"intervals": ["2022-06-27T03:00:00.000Z/2022-06-28T03:00:00.000Z"],
"context:": { "timeout: 30000 },
"dimensions": ["userId"],
"filter": {
"type": "and",
"fields": [
{
"type": "or",
"fields": [{...}]
}
]
},
"aggregations": [
{
"type": "count",
"name": "count"
}
]
}
Tried to add a filtered type inside aggregations:[] but 0 changes happened.
"aggregations": [
{
"type: "count",
"name": "count"
},
{
"type": "filtered",
"filter": {
"type": "selector",
"dimension": "block_id",
"value": "block1"
},
"aggregator": {
"type": "count",
"name": "block1",
"fieldName": "block_id"
}
}
]
Grouping Aggregator also didn't work.
"aggregations": [
{
"type": "count",
"name": "count"
},
{
"type": "grouping",
"name": "groupedData",
"groupings": ["block_id"]
}
],
Below is the image illustrating the results I'm trying to achieve.
Not sure yet how to get the results in the format you want, but as a start, something like this might be a step:
{
"queryType": "groupBy",
"dataSource": {
"type": "table",
"name": "dataTest"
},
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
"filter": null,
"granularity": {
"type": "all"
},
"dimensions": [
{
"type": "default",
"dimension": "d2_ts2",
"outputType": "STRING"
},
{
"type": "default",
"dimension": "d3_email",
"outputType": "STRING"
}
],
"aggregations": [
{
"type": "count",
"name": "myCount",
}
],
"descending": false
}
I'm curious, what is the use case?
Using a SQL query you can do it this way:
SELECT UserID,
sum(1) FILTER (WHERE BlockId = 'block1') as Block1,
sum(1) FILTER (WHERE BlockId = 'block2') as Block2,
sum(1) FILTER (WHERE BlockId = 'block3') as Block3
FROM inline_data
GROUP BY 1
The Native Query for this (from the explain) is:
{
"queryType": "topN",
"dataSource": {
"type": "table",
"name": "inline_data"
},
"virtualColumns": [
{
"type": "expression",
"name": "v0",
"expression": "1",
"outputType": "LONG"
}
],
"dimension": {
"type": "default",
"dimension": "UserID",
"outputName": "d0",
"outputType": "STRING"
},
"metric": {
"type": "dimension",
"previousStop": null,
"ordering": {
"type": "lexicographic"
}
},
"threshold": 101,
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
},
"filter": null,
"granularity": {
"type": "all"
},
"aggregations": [
{
"type": "filtered",
"aggregator": {
"type": "longSum",
"name": "a0",
"fieldName": "v0",
"expression": null
},
"filter": {
"type": "selector",
"dimension": "BlockId",
"value": "block1",
"extractionFn": null
},
"name": "a0"
},
{
"type": "filtered",
"aggregator": {
"type": "longSum",
"name": "a1",
"fieldName": "v0",
"expression": null
},
"filter": {
"type": "selector",
"dimension": "BlockId",
"value": "block2",
"extractionFn": null
},
"name": "a1"
},
{
"type": "filtered",
"aggregator": {
"type": "longSum",
"name": "a2",
"fieldName": "v0",
"expression": null
},
"filter": {
"type": "selector",
"dimension": "BlockId",
"value": "block3",
"extractionFn": null
},
"name": "a2"
}
],
"postAggregations": [],
"context": {
"populateCache": false,
"sqlOuterLimit": 101,
"sqlQueryId": "bb92e899-c127-49b0-be1b-d4b38909d166",
"useApproximateCountDistinct": false,
"useApproximateTopN": false,
"useCache": false,
"useNativeQueryExplain": true
},
"descending": false
}

Filtering data on vegalite

I'm trying to filter my data so that it does not include the missing data (noted as 0) in my chart. I'd assume the code I need to add is
"transform": [
{"filter": "datum.Gross enrolment ratio for tertiary education both sexes (%) > 0"
}
],
but it doesn't seem to be working.
Here is the code for my vegalite chart
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"title": {
"text": "Billionaires and Education Attainment",
"subtitle": "Sources: ",
"subtitleFontStyle": "italic",
"subtitleFontSize": 10,
"anchor": "start",
"color": "black"
},
"height": 300,
"width": 260,
"data": {
"url": "https://raw.githubusercontent.com/jamieprince/jamieprince.github.io/main/correlation.csv"
},
"selection": {
"paintbrush": {
"type": "multi",
"on": "mouseover",
"nearest": true
},
"grid": {
"type": "interval",
"bind": "scales"
}
},
"mark": {
"type": "circle",
"opacity": 0.4,
"color": "skyblue"
},
"encoding": {
"x": {
"field": "Number of Billionaires",
"type": "quantitative",
"axis": {
"title": "Number of Billionaires",
"grid": false,
"tickCount": 14,
"labelOverlap": "greedy"
}
},
"y": {
"field": "Gross enrolment ratio for tertiary education both sexes (%)",
"type": "quantitative",
"axis": {
"title": "Educational Attainment",
"grid": false
}
},
"size": {
"condition": {
"selection": "paintbrush",
"value": 300,
"init": {
"value": 70
}
},
"value": 70
},
"tooltip": [
{
"field": "Year",
"type": "nominal",
"title": "Year"
},
{
"field": "Country",
"type": "ordinal",
"title": "Country"
},
{
"field": "Number of Billionaires",
"type": "nominal",
"title": "No of Billionaires"
},
{
"field": "Gross enrolment ratio for tertiary education both sexes (%)",
"type": "nominal",
"title": "Gross enrolment ratio for tertiary education both sexes (%)"
}
]
}
}
Does anyone know how I could filter the data so it only includes points where the variable is not 0?
Thank you so so much!
You can do this with a greater-than predicate:
"transform": [{
"filter": {
"field": "Gross enrolment ratio for tertiary education both sexes (%)",
"gt": 0
}
}],

How to create Grouped Bar Chart in Vegalite?

Name | Value 1 | Value 2
BTC | 1 | 2
ETH | 1 | 2
to this:
Tried to used this as an example: https://vega.github.io/vega-lite/examples/bar_grouped.html,
but I can't make it work.
Can someone please point me to the right direction? Thank you in advance.
Instead of using column provided in your question, You can simply use layers and keep the x axis as common and provide value1 and value2 in y axis of each layer respectively and simply provide some offset to show it as a grouped bar chart. Below is the basic spec configuration and editor:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A simple bar chart with embedded data.",
"title": "My chart",
"width": 200,
"data": {
"values": [
{"name": "BTH", "value1": 28, "value2": 24, "legendTitle": "value1"},
{"name": "ETH", "value1": 55, "value2": 25, "legendTitle": "value2"}
]
},
"encoding": {
"x": {"field": "name", "type": "nominal", "axis": {"labelAngle": 0}}
},
"layer": [
{
"mark": {"type": "bar", "xOffset": -20, "size": 30, "color": "skyblue"},
"encoding": {
"y": {
"field": "value1",
"type": "quantitative",
"axis": {"title": null, "ticks": false}
}
}
},
{
"mark": {"type": "bar", "size": 30, "xOffset": 18, "color": "orange"},
"encoding": {
"y": {
"field": "value2",
"type": "quantitative",
"axis": {"title": null, "ticks": false}
}
}
},
{
"mark": {"type": "text"},
"encoding": {
"fill": {
"field": "legendTitle",
"scale": {"range": ["skyBlue", "orange"]},
"legend": {"title": null, "symbolType": "square", "orient": "bottom"}
}
}
}
]
}

Filter for one attribute (array) for one of its value (json)

Having the following record
{
"name": "
 Festões Plástico, 12mt x 17cm - Festas Populares",
"categories": [
"Festas",
"Casamentos",
"Decorações"
],
"hierarchicalCategories": {
"lvl0": "Festas",
"lvl1": "Festas > Casamentos",
"lvl2": "Festas > Casamentos > Decorações"
},
"description": "",
"brand": "Misterius",
"price": 14.94,
"stock": "Disponível",
"prices": [
{
"value": 12,
"type": "specificValue",
"family": "fatos",
"subfamily": "example"
},
{
"value": 13,
"type": "specificValue13",
"family": "fatos13",
"subfamily": "example13"
},
{
"value": 14,
"type": "specificValue14",
"family": "fatos14",
"subfamily": "example14"
},
{
"value": 15,
"type": "specificValue15",
"family": "fatos15",
"subfamily": "example15"
},
{
"value": 16,
"type": "specificValue16",
"family": "fatos16",
"subfamily": "example16"
}
],
"color": [
{
"name": "Amarelo",
"label": "Amarelo,#FFFF00",
"hexa": "#FFFF00"
},
{
"name": "Azul",
"label": "Azul,#0000FF",
"hexa": "#0000FF"
},
{
"name": "Branco",
"label": "Branco,#FFFFFF",
"hexa": "#FFFFFF"
},
{
"name": "Laranja",
"label": "Laranja,#FFA500",
"hexa": "#FFA500"
},
{
"name": "Verde Escuro",
"label": "Verde Escuro,#006400",
"hexa": "#006400"
},
{
"name": "Vermelho",
"label": "Vermelho,#FF0000",
"hexa": "#FF0000"
}
],
"specialcategorie": "",
"reference": "3546",
"rating": 0,
"free_shipping": false,
"popularity": 0,
"objectID": "30"
}
Now by searching for "Festas Populares" will return the record and its attributes, is it possible to also filter for one attribute array as "prices" to only return one json. for example "prices.type"="specificValue14" and "family"="fatos14" and "family"="fatos" and "subfamily"="example"
{
“value”: 14,
“type”: “specificValue14”,
“family”: “fatos14”,
“subfamily”: “example14”
}
the record return would be:
{
"name": "
 Festões Plástico, 12mt x 17cm - Festas Populares",
"categories": [
"Festas",
"Casamentos",
"Decorações"
],
"hierarchicalCategories": {
"lvl0": "Festas",
"lvl1": "Festas > Casamentos",
"lvl2": "Festas > Casamentos > Decorações"
},
"description": "",
"brand": "Misterius",
"price": 14.94,
"stock": "Disponível",
"prices": [
{
"value": 14,
"type": "specificValue14",
"family": "fatos14",
"subfamily": "example14"
}
],
"color": [
{
"name": "Amarelo",
"label": "Amarelo,#FFFF00",
"hexa": "#FFFF00"
},
{
"name": "Azul",
"label": "Azul,#0000FF",
"hexa": "#0000FF"
},
{
"name": "Branco",
"label": "Branco,#FFFFFF",
"hexa": "#FFFFFF"
},
{
"name": "Laranja",
"label": "Laranja,#FFA500",
"hexa": "#FFA500"
},
{
"name": "Verde Escuro",
"label": "Verde Escuro,#006400",
"hexa": "#006400"
},
{
"name": "Vermelho",
"label": "Vermelho,#FF0000",
"hexa": "#FF0000"
}
],
"specialcategorie": "",
"reference": "3546",
"rating": 0,
"free_shipping": false,
"popularity": 0,
"objectID": "30"
}
for some context a product can have multiple prices associated, for a specific user, or one day there is campaign giving discount, etc so for that cases want to filter price associated to the product/record.
No, this is not possible with Algolia. Records are always returned with the attributes specified inside attributesToRetrieve. These attributes are returned in full.

Get the first document using $in with mongodb

how can get the first element using in in mongo ?
if i've a list like ['car', 'house', 'cat', dog'], and a collection which contains many documents these element, i'd like to find the first document which contain cat, and first which contains dog etc.
I've tried to use limit() but in fact it gives me only one document, which can be either car, or dog or cat etc.
is there a way to combine a limit with $in ?
Thanks
EDIT:
example of data i've:
{
"_id": {
"$oid": "51d53ace9e674607e837d62d"
},
"sensors": [{
"name": "os-hostname",
"value": "yahourt"
}, {
"name": "os-domain-name",
"value": ""
}, {
"name": "os-platform",
"value": "Win32NT"
}, {
"name": "os-fullname",
"value": "Microsoft Windows XP Professional"
}, {
"name": "os-version",
"value": "5.1.2600.131072"
}],
"type": "os",
"serial": "2_os_os-hostname_yahourt"
} {
"_id": {
"$oid": "51d53ace9e674607e837d62e"
},
"sensors": [{
"name": "cpu-id",
"value": "_Total"
}, {
"name": "cpu-usage",
"value": 37.2257042
}],
"type": "cpu",
"serial": "2_cpu_cpu-id_total"
} {
"_id": {
"$oid": "51d53ace9e674607e837d62f"
},
"sensors": [{
"name": "cpu-id",
"value": "0"
}, {
"name": "cpu-usage",
"value": 48.90282
}],
"type": "cpu",
"serial": "2_cpu_cpu-id_0"
} {
"_id": {
"$oid": "51d53ace9e674607e837d630"
},
"sensors": [{
"name": "cpu-id",
"value": "1"
}, {
"name": "cpu-usage",
"value": 25.54859
}],
"type": "cpu",
"serial": "2_cpu_cpu-id_1"
} {
"_id": {
"$oid": "51d53ace9e674607e837d631"
},
"sensors": [{
"name": "volume-name",
"value": "C:"
}, {
"name": "volume-label",
"value": ""
}, {
"name": "volume-total-size",
"value": "52427898880"
}, {
"name": "volume-total-free-space",
"value": "20305170432"
}, {
"name": "volume-percent-free-space",
"value": "38"
}, {
"name": "volume-reads-per-second",
"value": 0.0
}, {
"name": "volume-writes-per-second",
"value": 9.324152
}, {
"name": "volume-read-bytes-per-second",
"value": 0.0
}, {
"name": "volume-write-bytes-per-second",
"value": 194141.6
}, {
"name": "volume-queue-length",
"value": 0.0
}],
"type": "disk",
"serial": "2_disk_volume-name_c"
}
You cannot add a limit to $in but you could cheat by using the aggregation framework:
db.collection.aggregate([
{$match:{serial:{$in:[list_of_serials]}}},
{$sort:{_id:-1}},
{$group:{_id:'$serial',type:{$first:'$type'},sensors:{$first:'$sensors'},id:{$first:'$_id'}}}
]);
Would get a list of all first found of each type.
Edit
The update will get the last inserted according to the _id.