Is it possible to index labels in Titan? - titan

When I run a query like this:
g.V().hasLabel("myDefinedLabel").values("myKey").next()
It prints
20:46:30 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx -
Query requires iterating over all vertices [(~label = myDefinedLabel)].
For better performance, use indexes
So, I guess to solve this issue, I need to index the label. Is there anyway to do this?
I tried doing the normal procedure to create an index described here in Titan docs but the label is not a regular property key to index.
Assuming I want to create a composite index for labels, how to do it?

Titan doesn't allow you to index labels, and according to the devs it isn't something they're interested in enabling.

Related

don't use index in pandas-profiling

When running pandas-profiling on a dataframe I see it analyses the index as a variable. Note: My index is a unique key (named UUID)
Is there a way to exclude bringing in the index to report?
I understand I could remove it in pandas but in my head I would like to do
ProfileReport(df, use_index=False)
I agree that having an option to use_index=False in ProfileReport would be nice and clean, it apparently doesn't exist (yet).
So currently the only way I can find to exclude bringing the index into the report is by dropping it before profiling:
df.reset_index(drop=True, inplace=True)
This gets the job done.

Any method in PostgreSQL to create mutiple index entries on the same datum

I'm indexing some spatial objects in PG and at this time I'm using PG with PostGis to build a R-tree. However, some objects are too large to be indexed, so I want to first split them to smaller parts and index them by the smaller parts.
However, I read the document of Gist, and find that there is nowhere to split (in the index level) one datum unless I first split them in the original relation.
Is there any way so solve this problem in Gist? Or are there any other framework to solve this problem? Or why it's impossible under the PG?
It depends on what exactly you want to do with the index.
One option would be not to index the objects themselves, but a simplified version:
CREATE INDEX ON atable USING gist (st_simplify(geom, 1, TRUE));
And then use that as a pre-filter:
SELECT ... FROM atable
WHERE st_intersects(st_simplify(geom, 1, TRUE), GEOMETRY '...')
AND st_intersects(geom, GEOMETRY '...');
That can use the index. The problem is that it could be that the simplified geometry does not intersect, so you could get false negatives in corner cases.

Postgres query shows under the 'Most time consuming section' of heroku

This below query I am using for a search:
I guess this current time is not bad but still I am searching for some kind of more optimizations. Also I saw in the analyze report this nested loop and nested loop joins shows the red. It would be great if I get an idea to reduce that. I was thinking to add index for search key. It would be great if I can get more suggestions to improve this. Here I have added the explain analyze result with 3 times execution, which ran in production
You could try to add ingredients.name or ingredients.code to an existing index or to create a new index so that more rows are filtered during ingredients index scan.
You should also try to avoid to use fonction on column name such as LOWER(ingredients.name) to make sure the right index is used.

PostgreSQL Probabilities: EXPLAIN on CREATE INDEX

I am using PostgreSQL to compute empirical probability density functions for pairs of variables across all my data. I am trying to determine if/when it is more effective to index before computing the PDF. I run EXPLAIN CREATE INDEX like,
EXPLAIN CREATE INDEX AB ON xrootd ("F.mName", "F.mOpenTime");
CREATE INDEX AB ON xrootd ("F.mName", "F.mOpenTime");
But PSQL complains,
psql:sql/stats.sql:3: ERROR: syntax error at or near "INDEX"
LINE 2: EXPLAIN CREATE INDEX AB ON xrootd ("F.mName", "F.mOpenTime")...
Is there a better way of doing anything I am trying to do? At the very least, I would like to know if constructing the indexes is useful. I have a lot of variable and a lot of data, so speeding this up is crucial.
Checking the cost of CREATE INDEX would be able to tell me if make the index is too expensive for the gain of using it.
There is no EXPLAIN CREATE INDEX, as EXPLAIN only works for SELECTs and DML queries (INSERT/DELETE/UPDATE).
What you are probably looking for is called hypothetical indexes. There is a PostgreSQL extension for that: https://github.com/dalibo/hypopg

Make an index unique after creation in Titan

If I create an index according to the docs (http://s3.thinkaurelius.com/docs/titan/0.5.4/indexes.html) without making it unique is it possible to make it unique after? I have not added any vertices or edges to the graph, just created the index.
Something like:
index = mgmt.getGraphIndex('name')
index.unique()
I am using the Gremlin console to make these changes.
Is it possible to do this somehow?
This is a documented limitation of Titan.
Ref : http://s3.thinkaurelius.com/docs/titan/0.5.0/limitations.html
section - 14.2.1. Unable to Drop Indices
Since no vertices or edges are added to graph, try the below gremlin command.
g.V.remove() or g.V.each{g.removeVertex(it)}
g.commit()
Then try to create the indexes again with .unique().
If still unable to re-create the indices, try to clean storage-backend.
In case of cassandra "DROP Keyspace titan;"
This must definitely work,I have tried in Titan 0.4 and worked.