OrientDB Teleporter - Pull only selected columns for Vertex from RDBMS - orientdb

I am trying to pull data from Oracle RDBMS and move it to OrientDB using teleporter. My relational database have multiple columns and have E-R relationships maintained. I have two questions :
My objective is to get only few columns ( that holds unique identity and foreign key relations ) and not all bulky column data. Is there any configuration using which I could do so. Today include and exclude only works at full DB table level.
Another objective is to keep my graph db sync with these selected table-column data which I pushed in previous run. Additional data which comes to RDBMS I would want in my graph db too.

You can enjoy this feature, and more others, in orientdb 3.0 through a JSON configuration, but there is not any documentation about it yet. Currently in 2.2.x you can just configure relationships and edges as described here:
http://orientdb.com/docs/2.2.x/Teleporter-Import-Configuration.html
In the next 2 weeks all these features will be available also in 2.2.x and well documented in order to make the comprehension of the config very easy.
At the moment you can adopt the following workaround:
import all the columns for each table in the correspondent vertex as usual.
drop the properties you are not interested in after each sync. You could write down a script where you call the teleporter execution and then delete the properties you don't care about from the schema.
I will update here when the alignment with 3.0 and the doc will be complete.

Related

How to check a table is made from which tables in pyspark

I have a core layer where I have some tables and I want to find out by what tables in the source layer are these tables made up of. Like the tables in core layer are made by joining some of the tables of source layer. I want to generate an excel sheet using code so that I am able to display the core tables are made from which tables.
I am using PySpark on Databricks and the codes are written for creating the tables in notebooks.
Any help on how to approach this will be beneficial.
This is possible when you use Databricks Unity Catalog - as part of it, there is a feature called Data Lineage that tracks what tables & columns were used to create a specific table and who are consumers of it as well. It also includes Lineage API that could be used for exporting of the lineage data.

Transforming relational data bases to graph databases

As part of my final thesis, I must transform a relational database in a graph-oriented database, specifically a PostgreSQL database into a Neo4j embedded database. Now, the way is the problem. In Rik Van Bruggen's book: Learning Neo4j, he mentions a data import process using ETL activities with Trascend and MuleSoft tools, but in their official sites, there's no documentation about how to do it, neither help documentation nor examples. Apart from these tools, what other ways can I use to transform this information without using my own code?
Some modeling advice:
A well normalized relational model, which was not yet denormalized for performance reasons can be translated into the equivalent graph model.
Graph model shapes are mostly driven by use-cases, so there will be opportunity for optimization and model evolution afterwards.
A good, normalized Entity-Relationship diagram often already represents a decent graph model.
So if you still have the orignal ER diagram available, try to use it as a guide.
Here are some tips that help you with the transformation:
Each entity table is represented by a label on nodes
Each row in a table is a node
Columns on those tables become node properties.
Remove technical primary keys, keep business primary keys
Add unique constraints for business primary keys, add indexes for frequent lookup attributes
Replace foreign keys with relationships to the other table, remove them afterwards
Remove data with default values, no need to store those
Data in tables that is denormalized and duplicated might have to be pulled out into separate nodes to get a cleaner model.
Indexed column names, might indicate an array property (like email1, email2, email3)
JOIN tables are transformed into relationships, columns on those tables become relationship properties
It is important to have an understanding of the graph model before you start to import data, then it just becomes the task of hydrating that model.
LOAD CSV might be your best option, but of course it means outputting a CSV first. Here are some great resources:
http://neo4j.com/docs/stable/query-load-csv.html
http://watch.neo4j.org/video/112447027
http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/
http://jexp.de/blog/2014/10/load-cvs-with-success/
http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
I've also written a ruby gem which lets you write a little ruby code to import data from various sources. It's called neo4apis. You can look at the neo4apis-twitter gem to get an idea for how it works:
https://github.com/neo4jrb/neo4apis-twitter/
https://github.com/neo4jrb/neo4apis-twitter/blob/master/lib/neo4apis/twitter.rb
I've actually been wanting to implement a neo4apis-activerecord to make it easy to import from SQL with ActiveRecord
You can not directly export data from relational and import to neo4j.
Because these are two different database structures.
Relational Database -
A relational database is a set of tables containing data fitted into predefined categories. Each table (which is sometimes called a relation) contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns.
Graph-oriented database -
A graph database is essentially a collection of nodes and edges. Each node represents an entity (such as a person or business) and each edge represents a connection or relationship between two nodes.
Sollution To your Problem-
First, you need to design Neo4j Data structure. e.g What will be the nodes you required, what will be the relationships between the nodes.
After that you create Script in your application language to fetch data from relational database and insert it into neo4j.
Load CSA is a option to Import/Export (backup) functionality with graph database. you can not directly Export/Import data from Relational DB to Graph DB

Entity Framework with large number of tables

Our database has about 500 tables we'd like to use in our EF model. Of those I'd be happy to start with 50 or fewer just to get our feet wet after working in plain ADO.net for years.
The problem is, our SQL server contains many thousands of other tables that exist in our database that have been created through the years and many that are dynamically generated. Believe it or not:
select count(*) from INFORMATION_SCHEMA.TABLES
73261
So that's a lot of tables. I have found that pretty much every tool I've tried to design, build or template EF models or entities either hangs or does not return a list of tables. Even SQL Server Object Explorer in VS2012 won't list the tables and instead shows the Tables folder with a little "x" over the icon. So I can't even select a subset of tables.
What options do I have for using EF? Is there a template where I can explicitly define the tables that I want to use entities for? Even with 50 tables, I don't want to hand code each one in an empty EDMX.
Using a Database / Code First approach and avoiding connecting Visual Studio to the database at all (i.e. don't create an edmx, or connect with server explorer) would allow you to do this easily. It does not give you any of the Model First advantages, but I think it sounds like your project would be better served with a Database / Code First approach anyway as:
You have an existing Model, and are not looking to push changes from your EDMX to the DB
You are looking to implement this on a subset of your database
This link has a good summation ( Code-first vs Model/Database-first ) with the caveat that in you case a Database/Code First approach does not have you pushing changes from code to the Database, so the last two bullets under code first apply less, and yours is a Database/Code First hybrid.
With 70k tables I think that any GUI is going to be tricky. When I am saying Database / Code First, I am trying to convey that you are not using the code to create / define and update your Database. Someone may be able to answer this more succinctly / accurately?
I now this is an old question. But for those who land here on a google search. The only tool I have found that actually works with thousands of tables is The Sharp Factory.
It is an ORM. Pretty simple to use. So if you are looking for an ORM that can work with a large number of tables and does not require you to write "POCOS" or "Mappings" or SQL then this is the tool.
You can find it here: The Sharp Factory

Entity Framework Self Tracking Entities - Synchronize between 2 databases

I am using Self Tracking Entities with the Entity Framework 4. I have 2 databases, with the exact same schema. However, tables in one database will be added to/edited etc (and I mean data will be added/edited, not the actual table definitions) and at certain points of the day I will need to synchronize all the changes between this database and the other database.
I can create a separate context for both of them. But if I read a large graph from one database, how can I update the other database with the graph? Is there an easy way?
My database model is large and complex and fully relational. So it would be a big job to go through every single entity and do a read from the other database to see if it exists or not, update/insert it if need be, and then carry this on through the full object graph!
Any ideas?
This is not a use case for EF. In EF you will have to do exactly what you've described. Self tracking entities are able to track changes to these object instances - they know nothing about changes made to their own database over time and they will not know anything about state of your second database as well.
Try to look at SQL server native features (including mirroring, transaction log shipping or SSIS) and MS Sync framework. Depending on your detailed requirements these tools can suite you better.

Syncing Core Data Databases in iOS applications

I have a doubt about Core Data migration.
Say I have an application which has some predefined values in a table A. I want to sync it with another database, with a table B in such a way that when new records are added totable B, that record should get added to my table A.
I know using Core Data migration, when I add columns to a table, I will be able to access the values previously stored in the older table before the addition of the column.
I would like to know how my table can be updated with the added records on another table.
Update:
From comment below:
The question I had in mind is this...
I want to release an update for my
app. I'm stuck on how to update the
existing Core Data database which also
stores data entered by the user. All I
need to do is update a couple of
records and preserve current user
data. How do I do this?
Core Data is not SQL. Entities are not tables. Objects are not rows. Columns are not attributes. Core Data is an object graph management system that may or may not persist the object graph and may or may not use SQL far behind the scenes to do so. Trying to think of Core Data in SQL terms will cause you to completely misunderstand Core Data and result in much grief and wasted time.
That way lies madness.
It sounds like you don't actually want to migrate as the term is used in Core Data. Migration in Core Data means moving from an earlier version of a data graph's persistent store to a newer version of the same.
E.g. In the 1.0 version you have an entity Person with the attributes firstNameand lastName. After the app has been release you wish to update to the 2.0 version and add a phoneNumber attribute to the Person entity. You would use migration to update the user's existing object graphs and persistent stores to the new object graph.
If by "table" you actually mean entities, then you can link entities together in a relationship so that they can watch each other. If by "table" you mean a data model or persistent store, then the answer is more complex. It can be done using configurations, fetched attributes, UUIDs etc but you must understand what you really need to do before you jump through all those hoops.