Importing AccessDB and Oracle directly into MongoDB - mongodb

I am receiving .dmp and .mdb files from a customer & need to get that data into MongoDB.
Is there any way to straight import these file types into Mongo?
The goal is to programmatically ingest these into mongo in any way I can. The only rule is that customer will not change their method of data delivery, meaning I'm stuck with the .dmp and .mdb files as a source.
Any assistance would be greatly appreciated.

Here are a few options/ideas:
Convert mdb to csv, then use mongoimport --type csv to import into MongoDB.
Use an ETL tool, e.g. Pentaho, Informatica, etc. This will give you much more flexibility for doing any necessary transformation/conversion of data.
Write a custom ETL tool, using libraries that know how to read mdb and dmp files.
You don't mention how you plan to use this data, how many tables are in the database, and how normalized the tables are. Depending on the specifics of your use case, it's very possible that loading the data from Access "as is" will not be a good choice since normalized schemas are not a good fit for MongoDB and MongoDB does not natively support joins. This is where an ETL tool can help, by extracting the source data and transforming it into an appropriate JSON structure.

MongoDB has released ODBC drivers. Go Here MongoDB ODBC Drivers connect MSAccess directly to MongoDB through ODBC. Voila!

Related

Difference between copy/migrate/export in SQL developer

I am using Oracle SQL developer, it has the following tools,
DATABASE copy, DATABASE export and Migrate.
I want to move one schema and all the data in it from one server to another.
What is the difference between these options? Does anything serve what I am looking for?
Database Copy is probably what you want.
Supply two database connections, and we'll take objects and data and copy them from one database to another.
However, if your schema is large, this will be inefficient. The Copy routine does inserts, row-by-row across the jdbc connections.
Database Export takes the objects and data and offloads them to flat files. These flat files could then be used later to put in another database.
Migrate is used to take a database from SQL Server, Sybase, Teradata, Redshift, DB2, etc. to Oracle. It has an online (jdbc row-by-row) data copy and an offline (flat files for SQL Loader) data move mode. For SQL Server/Sybase, we can also translate the T-SQL stored procedures to PL/SQL.
Your solution might also lie elsewhere - Data Pump. We have a wizard for that as well, and works great for very large schemas/databases. You'll just need access to the database OS so you can put the DMP files into a Database Directory.

Mongodb to redshift

We have a few collections in mongodb that we wish to transfer to redshift (on an automatic incremental daily basis).
How can we do it? Should we export the mongo to csv?
I wrote some code to export data from Mixpanel into Redshift for a client. Initially the client was exporting to Mongo but we found Redshift offered very large performance improvements for query. So first of all we transferred the data out of Mongo into Redshift, and then we came up with a direct solution that transfers the data from Mixpanel to Redshift.
To store JSON data in Redshift first you need to create a SQL DDL to store the schema in Redshift i.e. a CREATE TABLE script.
You can use a tool like Variety to help as it can give you some insight into your Mongo schema. However it does struggle with big datasets - you might need to subsample your dataset.
Alternatively DDLgenerator can generate DDL from various sources including CSV or JSON. This also struggles with large datasets (well the dataset I was dealing with was 120GB).
So in theory you could use MongoExport to generate CSV or JSON from Mongo and then run it through DDL generator to get a DDL.
In practice I found using JSON export a little easier because you don't need to specify the fields you want to extract. You need to select the JSON array format. Specifically:
mongoexport --db <your db> --collection <your_collection> --jsonArray > data.json
head data.json > sample.json
ddlgenerator postgresql sample.json
Here - because I am using head - I use a sample of the data to show the process works. However, if your database has schema variation, you want to compute the schema based on the whole database which could take several hours.
Next you upload the data into Redshift.
If you have exported JSON, you need to use Redshift's Copy from JSON feature. You need to define a JSONpath to do this.
For more information check out the Snowplow blog - they use JSONpaths to map the JSON on to a relational schema. See their blog post about why people might want to read JSON to Redshift.
Turning the JSON into columns allows much faster query than the other approaches such as using JSON EXTRACT PATH TEXT.
For incremental backups, it depends if data is being added or data is changing. For analytics, it's normally the former. The approach I used is to export the analytic data once a day, then copy it into Redshift in an incremental fashion.
Here are some related resources although in the end I did not use them:
Spotify has a open-source project called Luigi - this code claims to upload JSON to Redshift but I haven't used it so I don't know if it works.
Amiato have a web page that says they offer a commercial solution for loading JSON data into Redshift - but there is not much information beyond that.
This blog post discusses performing ETL on JSON datasources such as Mixpanel into Redshift.
Related Redit question
Blog post about dealing with JSON arrays in Redshift
Honestly, I'd recommend using a third party here. I've used Panoply (panoply.io) and would recommend it. It'll take your mongo collections and flatten them into their own tables in redshift.
AWS Database Migration Service(DMS) Adds Support for MongoDB and Amazon DynamoDB.So I think now onward best option to migrate from MongoDB to Redshift is DMS.
MongoDB versions 2.6.x and 3.x as a database source
Document Mode and Table Mode supported
Supports change data capture(CDC)
Details - http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MongoDB.html
A few questions that would be helpful to know would be:
Is this an add-only always increasing incremental sync i.e. data is only being added and not being updated / removed or rather your redshift instance is interested only in additions?
Is the data inconsistency due to delete / updates happening at source and not being fed to redshift instance ok?
Does it need to be daily-incremental batch or can it be realtime as it is happening as well?
Depending on your situation may be mongoexport works for you, but you have to understand the shortcoming of it, which can be found at http://docs.mongodb.org/manual/reference/program/mongoexport/ .
I had to tackle the same issue (not on a daily basis though).
as ask mentioned, You can use mongoexport in order to export the data, but keep in mind that redshift doesn't support arrays, so in case your collections data contains arrays you'll find it a bit problematic.
My solution to this was to pipe the mongoexport into a small utility program I wrote that transforms the mongoexport json rows into my desired csv output.
piping the output also allows you to make the process parallel.
Mongoexport allows you to add a mongodb query to the command, so if your collection data supports it you can spawn N different mongoexport processes, pipe it's results into the other program and decrease the total runtime of the migration process.
Later on, I uploaded the files to S3, and performed a COPY into the relevant table.
This should be a pretty easy solution.
Stitch Data is the best tool ever I've ever seen to replicate incrementally from MongoDB to Redshift within a few clicks and minutes.
Automatically and dynamically Detect DML, DDL for tables for replication.

Transferring data from Lotus Notes to DB2 using Agent

before you say something, I searched a lot but didn't find how to do that.
So I got database in .NSF format for use in Lotus Notes. I need to write an Agent (I know how to) so data from that database will be automatically transferred to DB2 database.
So before I create DB2 tables, how do i know which structure I need to use? How do I check how exactly data in that .NSF file is stored?
Thanks
Notes documents are unstructured, there's no guarantee that any two documents in a database have the same structure. You will need to decide what data you want to transfer to a relational table, then check each document to see if it contains the corresponding fields (items). You didn't mention what language you're planning to use for your agent; in Java you would use NotesDocument.getItems() to enumerate all items in a document.
As mustaccio also said, since Notes/Domino is a NoSQL database, you don't have a schema.
You should talk to the developer of the application and get an understanding of what data is lovated where.
You could of course use the Design Synopsis function in Domino Designer to export the actual design, but document can potentially contain data not showing up in the design.
If you want to export the documents as XML, I have a tool I wrote available here: http://www.texasswede.com/home.nsf/Page/Notes%20XML%20Exporter
You can export all the documents and then look at the XML to see what data you have.

ETL tool or ad-hoc solutions?

I'm designing a data warehouse system, the origin data sources are two: files (hexadecimal format, record structure known) and PostgreSQL database.
The ETL phase has to read the content of the two sources (files and DB) and combining/integrating/cleaning them. After this, loading data into the DW.
For this purpose, is better a tool (for example Talend) or ad-hoc solution (writing ad-hoc routines by using a programming language)?
I would suggest you use the Bulk Loader to get your flat file into DB. This allows you to customize the loading rules and then process/cleanse the resulting data set using regular SQL (no other custom code to write)

How to transfer or copy tables of DB2 to oracle database

I want to transfer some tables of DB2 to oracle daily for accessing them from web page,
But I don't know commands of DB2. How to do this?
I want this action should perform on database daily on particular time, so is there any tool is available to do this operation. And for writing the program for operating above query which programming language should I use? I am using windows XP.
I think Change Data Capture is used to replicate DML from one database to other databases continuously.
However, what you need is to transfer some data at a particular time each day, thus CDC could be too heavy for that.
You could do a simply "db2 export", and then you could import the generated file from Oracle.
There should be an option to create an adapter in Oracle that permits to query DB2 tables. The opposite is called federation in DB2 (InfoSphere Information Server) that permits to query Oracle tables.
Export http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0008303.html
CMD examples http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.dm.doc/doc/r0004567.html
Check this link
http://blogs.oracle.com/warehousebuilder/entry/simple_change_data_capture_from_db2_table_to_oracle_table
In 11.2 releases, Change Data Capture (CDC) can be done by code template mapping. This allows users to capture the data changes from heterogeneous data source, and load into the target across different platforms.