How to exclude certain vertices in gremlin titan - titan

For example I want to exclude some vertex ids while querying.
Step 1: I am taking user followed by me(1234):
g.V(1234).outE("following")
Output : 9876,3246,2343,3452,1233,6545
Step 2: I have to exclude or remove certain ids
users = [3452,1233,6545];
g.V(1234).outE("following").inV().except(users)
Output : 9876,3246,2343.
It should come like this but the except function didn't work. Is there any solution to filter specific vertex ids.

It's as easy as:
users = [3452, 1233, 6545]
g.V(1234).out("following").hasId(without(users))
Or just:
g.V(1234).out("following").hasId(without(3452, 1233, 6545))

You can use the where step to filter the vertices. This allows you to exclude vertices based on their id. The following query should give you the expected result:
users = [3452,1233,6545];
g.V(1234).out("following").where(__.not(hasId(within(users))))
Note, that I used out() as a short form of outE().inV() which allows to directly traverse to the neighbor vertices.

Related

How to perform graph traversals with multiple collections in ArangoDB

I have a graph consisting of 3 document collections and 2 edge collections. I want to perform filtering from one doc collection but need to show the result from all three doc collections.
Graph: AssetWorkTime
Documents Collection: assetnew (consists name of asset), timenew (consists the start date of renovation), and Work (consists type of renovation)
Edge Collection: AssetRenoTimeNew (Connecting assetnew and timenew) and WorkTimeNew (Connecting timenew and Work)
This a part of my graph
FOR c IN timenew
FILTER c.startdate>="2020-01-01"
FOR v, e, p IN 1..1 outbound c GRAPH 'AssetWorkTime'
RETURN {AssetName:v.AssetName, Date:c.startdate, RenoJob:v.WorkName}
This is the query that I have done. I want to know which assets have gone through renovation after 1/1/2020 and what type of renovation that has been done.
This is the result that I got. The result are correct except for it does not show the name of the assets
I've gone through the documentation and other questions in stackoverflow but couldn't find a way to get this. Thanks!

Inconsistent results in filtered queries in Weaviate. HNSW graph traversal + filtering

I have setup a Weaviate db witch holds about 12M vectors, along with some metadata for each one.
I am getting inconsistent/wrong/weird??? results when I perform filtered search,
i.e. filter on a meta data field and then perform ANN search. (I have turned brute force search off completely by passing in a very small number to the "flatSearchCutoff" parameter '500').
Each vector has a 'user' field attached to it which is set to a 'string' type, moreover there is a 'status_int' field which takes the values of [0,4]. There is also a unique identifier which is another 'string' field.
Firstly my use case requires to query the DB for a specific entry and retrieve the vector representation...by using the unique identifier field, which I do with the following:
where_filter = {
"path": ["resource_identifier"],
"operator": "Equal",
"valueString": identifier_var
}
result = client.query.get("Resource", ["resource_identifier" ])\
.with_where(where_filter)\
.with_additional(['vector'])\
.with_limit(1)\
.do()
feature_vector = np.array(result['data']['Get']['Resource'][0]['_additional']['vector'],dtype=np.float32)
nearVector = {"vector":feature_vector, "certainty":SIMILARITY_THRESHOLD}
This works really well - and is blazing fast.
Secondly I want to search for nearby vectors to the one I just retrieved... while making sure that the results fit my criteria.
I am applying the following filter using the metadata fields.
filter_ ={
'operator':'And',
'operands':
[
{
'path': 'user',
'operator': 'Equal',
'valueString': user_var
},
{
'path': 'status_int',
'operator': 'GreaterThanEqual',
'valueInt': status_var
}
]
}
result = client.query.get("Resource", ["resource_uri","user", "_additional{certainty}",'status_int'])\
.with_where(filter_)\
.with_near_vector(nearVector)\
.with_limit(RETRIEVE_RECORDS_LIMIT).do()
While the filtering process does not throw any errors...the results look... weird...I set the similarity threshold to 0.75, so that I get plenty of results... However the results only include really close matches to the query vector... even at a very small similarity threshold which I found really odd.
More specifically I query the DB as follows:
When I query for a specific user and status_int >= 0 i'm stuck with 2 identical results of similarity 1.
However there should be a lot more results.... since the filter covers 3295 objects.
When I query for the same user and status_int >= 1 I get 1 resource as a result... again with similarity=1, which is one of the 2 results i'm given above...( this filter encompasses 2578 objects)
HOWEVER when I query for the same user and status_int >= 2 I GET ALOT MORE RESULTS ! With no exact matches (as it should be) but with 0.85 and below similarity. (this filter encompasses 1900 resources)
So my question is.... isn't this weird or is this intended behaviour and I've misunderstood how Weaviate and HNSW work?
-In my mind the path through the NHSW graph is the same across these queries... its just that results are retrieved based on the filters passed...
Shouldn't the first 2 queries present the perfect matches AS WELL AS the less relevant matches of the third query?
Really confused on this one :(

Querying MongoDB: retreive shops by name and by location with one single query

QUERYING MONGODB: RETREIVE SHOPS BY NAME AND BY LOCATION WITH ONE SINGLE QUERY
Hi folks!
I'm building a "search shops" application using MEAN Stack.
I store shops documents in MongoDB "location" collection like this:
{
_id: .....
name: ...//shop name
location : //...GEOJson
}
UI provides to the users one single input for shops searching. Basically, I would perform one single query to retrieve in the same results array:
All shops near the user (eventually limit to x)
All shops named "like" the input value
On logical side, I think this is a "$or like" query
Based on this answer
Using full text search with geospatial index on Mongodb
probably assign two special indexes (2dsphere and full text) to the collection is not the right manner to achieve this, anyway I think this is a different case just because I really don't want to apply sequential filter to results, "simply" want to retreive data with 2 distinct criteria.
If I should set indexes on my collection, of course the approach is to perform two distinct queries with two distinct mehtods ($near for locations and $text for name), and then merge the results with some server side logic to remove duplicate documents and sort them in some useful way for user experience, but I'm still wondering if exists a method to achieve this result with one single query.
So, the question is: is it possible or this kind of approach is out of MongoDB purpose?
Hope this is clear and hope that someone can teach something today!
Thanks

Titan: Both query is same?

Let say I have following things
V1.setProperty("category","C1");
V1.setProperty("city","City1");
QUERY for vertices having city city1:
v.query().has("category","c1").has("city","city1").vertices();
same thing in different way:
V1.setProperty("category","C1");
V1.setProperty("C1_city","City1");
QUERY for vertices having city city1:
v.query().has("C1_city","city1").vertices();
assume category city and C1_city is both index. Are both query same performancewise?
I wouldn't say that they are the same from a performance perspective. In the first case, Titan will only use the index from category and will not use the city index (it will just iterate all c1 vertices and then filter on city. Therefore, I guess I would expect that the second query would be faster as it is finding exactly what you are looking for completely through the index.

custom sorting in sphinx / sort result by match & distance over a particular field

I am using sphinx 2.0.
I want to achieve following results :
user will input tags with other search terms, documents associated with user input tags should come on top, sorted by distance.
After that documents does not contain those tags sorted by distance.
What i am doing:
I am searching on different parameters at the same time using like #name , #tag, #streetname etc.so i am using below
$cl->SetMatchMode(SPH_MATCH_EXTENDED);
and sorting the result by distance using $cl->SetSortMode(SPH_SORT_EXTENDED, '#geodist asc');
tag filed can contain multiple values i am using OR operator to get the desired results.
If i search for only #tags then i am able to achieve the requirement i have mentioned. but if user input is #tag food|dinner #city london #name taxi
then result with name: London Taxi, street: London comes on top or some other position breaking the sorting order by lat-long. because London is there in two parameters.i just want to sort by tag, do not want to include the weight of other search terms in sorting order.
Ranking mode is : $cl->setRankingMode(SPH_RANK_PROXIMITY_BM25);
any suggestion to overcome this issue ? or any other way to implement it.
Many Thanks.
I think the way to solve this would be to arrange for matches on the tag field to rank way way higher. Would have to test it but something like this...
$cl->setFieldWeights(array('tags' => 100000));
$cl->setSelect("*,IF(#weight>100000,1,0) AS matchtags");
$cl->SetSortMode(SPH_SORT_EXTENDED, 'matchtags DESC, #geodist ASC');