I've been using Tinkerpop 2.6 with OrientDB to manipulate the graph, and I'm running into some performance problems.
I suspect the issue has to do with the fact that OrientDB lazy loads records. In my case, I'm fetching adjacent vertices by two different labels, adding a new edge, and setting a property.
I know OrientDB recommends paying attention to fetch plans, but I see no clear way of doing so via the OrientGraph or Tinkerpop. Note that I use the OrientGraphFactory to get transaction database instances, and I don't see it in there either.
How can I use fetch plans with OrientGraphs and Tinkerpop?
Thanks!
You can use OrientDB SQL. Example:
graph.command(new OCommandSQL("select from V where name = 'Luca' fetchplan *:3"))
.execute();
Related
I am running Tableau connected to ClickHouse via ODBC driver. At first mostly any report request was failing. I have configured this tdc file https://github.com/yandex/clickhouse-odbc/blob/clickhouse-tbc/clickhouse.tdc and its actually started to work, however now some of the query requests with JOINS that contain check for NULL in ON are failing because of using IS NULL instead of isNull(id)
JOIN users ON ((users.user_id = t0.user_id) OR ((users.user_id IS NULL) AND (t0.user_id IS NULL)))
This is the correct way that works:
JOIN users ON ((users.user_id = t0.user_id) OR ((isNull(users.user_id) = 1) AND (isNull(t0.user_id) = 1 IS NULL)))
How to make tablau driver to send the right requerst?
Here are a few suggestions:
This post on the Tableau Community looks like it has similar symptoms as you describe. The suggested resolution is to wrap all fields as such IfNull([Dimension], "") thereby reducing the need, apparently, to have Clickhouse do the check of nulls.
The TDC file from Github looks pretty complete, but they might not have taken joins into consideration. The GitHub commit states that the tdc is "untested." I would message the creator of that TDC and see if they've done any work around joins and if they have any suggestions.
Here is a list of possible ODBC Customizations that can be added to or removed from your TDC file. The combination of which may take some experimentation, but they're well worth researching as a possible solution.
Create an extract before performing complex analysis. If you're able to connect initially, then it should be possible to bring all the data from Clickhouse into an extract.
Custom SQL would probably alleviate any join syntax issue because the query and any joins are purely written by you. After making the initial connection to ClickHouse, instead of choosing a table, select "Custom ODBC" and write a query that will return the joined tables of your choosing.
Finally, the Tableau Ideas Forum is a place to ask for and/or vote on upcoming connectors. I can see there is already an idea in place for ClickHouse. Feel free to vote it up.
If you can make sure not to have any NULL values in the data, you can also use this proxy that I wrote for this exact problem.
https://github.com/kfzteile24/clickhouse-proxy
It kinda worked, for most cases, but it's not bullet-proof.
I am not an expert of Spark SQL API, nor of the underlying RDD one.
But, knowing of the Catalyst optimization engine, I would expect Spark to try and minimize in-memory effort.
This is my situation:
I have, let's say, two table
TABLE GenericOperation (ID, CommonFields...)
TABLE SpecificOperation (OperationID, SpecificFields...)
They are both quite huge (~500M, not big data, but unfeasible to have as a whole in memory in a standard application server)
That said, suppose I have to retrieve using Spark (part of a larger use case) all the SpecificOperation instances that match some particular condition on fields that belong to GenericOperation.
This is the code that I am using:
val gOps = spark.read.jdbc(db.connection, "GenericOperation", db.properties)
val sOps = spark.read.jdbc(db.connection, "SpecificOperation", db.properties)
val joined = sOps.join(gOps).where("ID = OperationID")
joined.where("CommonField= 'SomeValue'").select("SpecificField").show()
Problem is, when it comes to run the above, I can see from SQL Profiler that Spark does not execute the join on the database, but rather retrieves all the OperationID from SpecificOperation, and then I assume it will be running all the merge in memory. Since no filter is applicable on SpecificOperation, such retrieve would bring a lot, too much, data to the end system.
Is it possible to write the above so that the join is demanded directly to dbms?
Or it depends on some magic configuration of Spark I am not aware of?
Of course, I could simply hardcode the join as a subquery when retrieving, but that's not feasible in my case: statements hve to be created at runtime starting from simple building blocks. Hence, I need to implement this starting from two spark.sql.DataFrame already built up
As a side note, I am running this with Spark 2.3.0 for Scala 2.11, against a SQL Server 2016 database instance.
Is it possible to write the above so that the join is demanded directly to dbms? Or it depends on some magic configuration of Spark I am not aware of?
Excluding statically generated queries (In Apache Spark 2.0.0, is it possible to fetch a query from an external database (rather than grab the whole table)?), Spark doesn't support join pushdown. Only predicates and selection can be delegated to the source.
There is no magic configuration or code that could even support this type of process.
In general if server can handle join, data is usually not large enough to benefit from Spark.
I am trying to pull data from Oracle RDBMS and move it to OrientDB using teleporter. My relational database have multiple columns and have E-R relationships maintained. I have two questions :
My objective is to get only few columns ( that holds unique identity and foreign key relations ) and not all bulky column data. Is there any configuration using which I could do so. Today include and exclude only works at full DB table level.
Another objective is to keep my graph db sync with these selected table-column data which I pushed in previous run. Additional data which comes to RDBMS I would want in my graph db too.
You can enjoy this feature, and more others, in orientdb 3.0 through a JSON configuration, but there is not any documentation about it yet. Currently in 2.2.x you can just configure relationships and edges as described here:
http://orientdb.com/docs/2.2.x/Teleporter-Import-Configuration.html
In the next 2 weeks all these features will be available also in 2.2.x and well documented in order to make the comprehension of the config very easy.
At the moment you can adopt the following workaround:
import all the columns for each table in the correspondent vertex as usual.
drop the properties you are not interested in after each sync. You could write down a script where you call the teleporter execution and then delete the properties you don't care about from the schema.
I will update here when the alignment with 3.0 and the doc will be complete.
I am trying to build a distributed AI. It works on graph and most of its problem relies on the database.
I need an SQL database that supports graphs ( or I can make it support)
I need node locking or atomic increment support
Most importantly I need persistant keys for every node and link in the graph. Most graph databases don't give a persistant key.
I have tried some of the databases like Neo4j but it lacks persistant key and also I can't change a link's first or seconds node.
My writes and reads are nearly equal and I am adding more node and links during runtime so Pregel like graph processing don't work for me.
Do you have any suggestion?
Is there any library where i can access mongodb by using sql like syntax.
Example
use db
select * from table1
insert into table1 values (a,b,c)
delete from table
select a,b,count(*) from table1 group by a,b
select a.field1,b.field2 from a,b where a.id=b.id
Thanks
Raman
The learning curve is small only if you are only doing extremely simple sql queries. If the extent of your SQL querying is "select * from X", then MongoDB looks like a brilliant idea to cut through all the too-complicated SQL. But if you need to perform left outer joins, test for null, check for ranges, subselects, grouping and summation, then you will soon end up with a round concave dent in your desk after being moved to Mongo. The sick punchline is that half the time, the thing you are trying to do can't be done in the Mongo interface. Mongo represents a bold new world where instead of databases doing things like aggregation and query optimization, it just stores data and all the magic is done by retrieving everything, slowly, storing it in app memory, and doing all that stuff in code instead.
YES!
A company called UnityJDBC makes a JDBC driver for mongodb. Unlike the mongo java driver, this JDBC driver allows you to run SQL queries against MongoDB and the driver is supported by any Java appliaction that uses JDBC.
to download this driver go to...
http://www.unityjdbc.com/mongojdbc/mongo_jdbc.php
Its free to download too!
hope this helps
MoSQL might satisfy your needs. It'll require you to run a new PostgreSQL instance but from there you can query your entire Mongo dataset with SQL.
"MoSQL imports the contents of your MongoDB database cluster into a PostgreSQL instance, using an oplog tailer to keep the SQL mirror live up-to-date. This lets you run production services against a MongoDB database, and then run offline analytics or reporting using the full power of SQL."
Have a look at this recent project: http://www.mongosql.com/. I've been looking at it over the last few weeks and it looks very promising.
For those of you who have questioned the usefulness of SQL against MongoDB, consider the large number of not-very-technical users in many organizations, like business analysts, who may know SQL, but don't want to make the leap to JavaScript and JSON. Tools like mongoSQL can help push the adoption of MongoDB in an organization.
There are a few solutions out there, but nearly all of them fail to truly represent the MongoDB data model in a way that the "relationally" minded ODBC/JDBC applications and users desire/require. A recent commercial product was released that addresses these challenges
ODBC:
http://www.progress.com/products/datadirect-connect/odbc-drivers/data-sources/mongodb
JDBC:
http://www.progress.com/products/datadirect-connect/jdbc-drivers/data-sources/mongodb
To address the need for ODBC/JDBC (SQL) access...While there are strong arguments for writing new applications using Mongo's clients, there is still a strong need in the marketplace for quality ODBC/JDBC and SQL based access to MongoDB. This need largely arises from all the reporting, analytic, and BI applications that rely on ODBC/JDBC connectivity and do not offer native integration with MongoDB.
Free NoSQL Viewer supports conversion of SQL queries to MongoDB shell syntax. Furthermore, in SQL Viewer you can even use SQL SELECT statements to query MongoDB collections data without knowing MongoDB query syntax. Check out NoSQL Viewer here www.spviewer.com/nosqlviewer.html
Mongodb and its current driver do not support direct SQL like syntax.
However, all operations are easily doable with the driver specific operations.
Here is a brief mapping of mongodb operations to corresponding SQL like query :
http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart
There are a couple projects underway to emulate a SQL interface for MongoDB. While they provide a familiar interface, in general they should be avoided. They operate on a fundamentally flawed premise in that they parse strings and translate them into method calls.
Once you work with MongoDB you will find the approach of using classes and methods a much more accessible interface as it works exactly like all other parts of your application. Yes there is a small learning curve as you first start, but for the most part, the interface in MongoDB works how you would expect it to.