Gremlin: find names of projects that have duplicates (by name) - orientdb

I have projects that have "name" as a property and I want to generate a list of the duplicates. I tried to do this by grouping the projects by name, and using the where clause to filter the results where the count of the project name is greater than 1 and showing those names.
The below generates a list of the project names with the count of each
g.V().hasLabel('project').groupCount().by('name')
So I added the filter to find only the duplicate values and it does not work:
g.V().hasLabel('project').groupCount().by('name').where(select(values).is(gt(1))).values('name')

You need to unfold() the count Map(), thus:
g.V().hasLabel('project').
groupCount().
by('name').
unfold().
where(select(values).is(gt(1))).
values('name')
If you don't unfold(), you have a Map in the pipeline and it tries to apply your where() to that object as a whole when you really want to apply it to each individual key/value pair in the Map.

This worked for me:
g.V().hasLabel('project')
.group().by(values('name')
.fold()).unfold().filter(select(values)
.count(local).is(gt(1))).select(keys)

Related

How can I alias labels (using a query) in Grafana?

I'm using Grafana v9.3.2.2 on Azure Grafana
I have a line chart with labels of an ID. I also have an SQL table in which the IDs are mapped to simple strings. I want to alias the IDs in the label to the strings from the SQL
I am trying to look for a transformation to do the conversion.
There is a transformation called “rename by regex”, but that will require me to hardcode for each case. Is there something similar with which I don't have to hardcode for each case.
There is something similar for variables - https://grafana.com/blog/2019/07/17/ask-us-anything-how-to-alias-dashboard-variables-in-grafana-in-sql/. But I don't see anything for transformations.
Use 2 queries in the panel - one for data with IDs and seconds one for mapping ID to string. Then add transformation Outer join and use that field ID to join queries results into one result.
You may need to use also Organize fields transformation to rename, hide unwanted fields, so only right fields will be used in the label at the end.

Filter on CassandraJoinRDD

I have applied a join on file and existing Cassandra table via joinWithCassandraTable. Now, I want to apply a filter on joinCassandraRDD. Here is the code and functionality which I have written for extraction of data:
var outrdd = sc.textFile("/usr/local/spark/bin/select_element/src/main/scala/file_small.txt")
.map(_.toString).map(Tuple1(_))
.joinWithCassandraTable(settings.keyspace, settings.table)
.select("id", "listofitems")
Here "/usr/local/spark/bin/select_element/src/main/scala/file_small.txt" is the text file which is having a list of ids. Now, I have some elements in another list, say userlistofitems=["jas", "yuk"], I need to search 'userlistofitems' sublist from 'listofitems' column of joinCassandraRDD.
We have around 2Million ids where we have several user_lists for which we have to extract the data from Cassandra. We are using versions spark=2.4.4, scala=2.11.12, and spark-cassandra-connector=spark-cassandra-connector-2.4.2-3-gda70746.jar.
Any help is highly appreciated.
References Used:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc,
https://www.youtube.com/watch?v=UsenTP029tM

Spark agg to collect a single list for multiple columns

Here is my current code:
pipe_exec_df_final_grouped = pipe_exec_df_final.groupBy("application_id").agg(collect_list("table_name").alias("tables"))
However, in my collected list, I would like multiple column values, so the aggregated column would be an array of arrays. Currently the result look like this:
1|[a,b,c,d]
2|[e,f,g,h]
However, I would also like to keep another column attached to the aggragation (lets call it 'status' column name). So my new output would be:
1|[[a,pass],[b,fail],[c,fail],[d,pass]]
...
I tried collect_list("table_name, status") however collect_list only takes one column name. How can I accomplish what I am trying to do?
Use array to collect columns into an array column first, then apply collect_list:
df.groupBy(...).agg(collect_list(array("table_name", "status")))

Querying on multiple LINKMAP items with OrientDB SQL

I have a class that contains a LINKMAP field called links. This class is used recursively to create arbitrary hierarchical groupings (something like the time-series example, but not with the fixed year/month/day structure).
A query like this:
select expand(links['2017'].links['07'].links['15'].links['10'].links) from data where key='AAA'
Returns the actual records contained in the last layer of "links". This works exactly as expected.
But a query like this (note the 10,11 in the second to last layer of "links"):
select expand(links['2017'].links['07'].links['15'].links['10','11'].links) from data where key='AAA'
Returns two rows of the last layer of "links" instead:
{"1000":"#23:0","1001":"#24:0","1002":"#23:1"}
{"1003":"#24:1","1004":"#23:2"}
Using unionAll or intersect (with or without UNWIND) results in this single record:
[{"1000":"#23:0","1001":"#24:0","1002":"#23:1"},{"1003":"#24:1","1004":"#23:2"}]
But nothing I've tried (including various attempts at "compound" SELECTs) will get the expand to work as it does with the original example (i.e. return the actual records represented in the last LINKMAP).
Is there a SQL syntax that will achieve this?
Note: Even this (slightly modified) example from the ODB docs does not result in a list of linked records:
select expand(records) from
(select unionAll(years['2017'].links['07'].links['15'].links['10'].links, years['2017'].links['07'].links['15'].links['11'].links) as records from data where key='AAA')
Ref: https://orientdb.com/docs/2.2/Time-series-use-case.html
I'm not sure of what you want to achieve, but I think it's worth trying with values():
select expand(links['2017'].links['07'].links['15'].links['10','11'].links.values()) from data where key='AAA'

groupBy toList element order

I have a RichPipe with several fields, let's say:
'sex
'weight
'age
I need to group by 'sex and then get a list of tuples ('weight and 'age). I then want to do a scanLeft operation on the list for each group and get a pipe with 'sex and 'result. I currently do this by doing
pipe.groupBy('sex) {_.toList('weight -> 'weights).toList('age - 'ages)}
and then zipping the two lists together. I'm not sure this is the best possible way, and also I'm not sure if the order of the values in the lists is the same, so that when I zip the two lists the tuples don't get mixed up with wrong values. I found nothing about this in the documentation.
Ok, so it looks like I've answered my own question.
You can simply do
pipe.groupBy('sex) {_.toList[(Int, Int)](('weight, 'age) -> 'list)}
which results in a list of tuples. Would've saved me a lot of time if the Fields API Reference mentioned this.