Power BI and MongoDB - What is the best setup? - mongodb

Currently i have a database with almost 50kk record and an collection with almost 20kk that i need to connect with PowerBI, I tried using the MongoDB BI Connector, but it is impossible slow, are anything I can do to speed things up? Or should i move to something more robust to help my BI Team?

Need to identify the performance bottle neck in detail.
If data import is slow, then Extract the data from Mongo DB into a file. (as the data volume that you specified seems to be low - in thousands). Then import the file into Power BI model.
If the slicing and dicing is slower after data import into PowerBI, then tuning the model is required.
Thanks.

Anything going through ODBC (I am guessing that's what you are using), is going to be slow. One way to do this is to use r from Power BI to get the data from MongoDB. The following link gives details about how you can connect to MongoDB from r:
Connect R to remote mongoDB with rmongodb
Once your query is ready, you can choose "R Script" from the "Get data" window to use the r query. Hope this helps.

Related

Import data from MongoDB to SQL Server

I a developing a POC to import some data from MongoDB to MS SQL Server 2012. For e g. there are 5 collections n Mongo which need to be loaded to 5 tables in MS SQL Server tables. The data needs to be dumped as is.
What is the best approach to achieve this ? Wil any ETL Tool (SSIS) suffice or shall we have to write code in Node jS? Can anyone point me in the right direction.
Please advise.
Thanks in advance.
I use ZappySys' SSIS PowerPack, which includes MongoDB Source and Destination connections.

MongoDB in Luigi Python

I would like to know if there is a way to output to a MongoDB in Luigi. I see in the documentation they support files (local FS, HDFS), S3, PostgreSQL but not MongoDB. If not, could someone explain me why not? Maybe it is a bad idea to have it? I would like to store the data in a database because then I can explore it by querying it. However I am using mongodb and I would not like to install another database. I do not need a relational database as I am using the database only to store and query ( NoSql ) without relationships, so the best option is mongodb.
Basically I need a task to read the data and save it in the database. Then the next task take this output and process the data.
Any recommendation, suggestion or clarification is more than welcome. Thanks!
You can try using mortar-luigi.
Check out this link for MongoDB tasks and this example.

Data mining with postgres in production environment - is there a better way?

There is a web application which is running for a years and during its life time the application has gathered a lot of user data. Data is stored in relational DB (postgres). Not all of this data is needed to run application (to do the business). However form time to time business people ask me to provide reports of this data data. And this causes some problems:
sometimes these SQL queries are long running
quires are executed against production DB (not cool)
not so easy to deliver reports on weekly or monthly base
some parts of data is stored in way which is not suitable for such
querying (queries are inefficient)
My idea (note that I am a developer not the data mining specialist) how to improve this whole process of delivering reports is:
create separate DB which regularly is update with production data
optimize how data is stored
create a dashboard to present reports
Question: But is there a better way? Is there another DB which better fits for such data analysis? Or should I look into modern data mining tools?
Thanks!
Do you really do data mining (as in: classification, clustering, anomaly detection), or is "data mining" for you any reporting on the data? In the latter case, all the "modern data mining tools" will disappoint you, because they serve a different purpose.
Have you used the indexing functionality of Postgres well? Your scenario sounds as if selection and aggregation are most of the work, and SQL databases are excellent for this - if well designed.
For example, materialized views and triggers can be used to process data into a scheme more usable for your reporting.
There are a thousand ways to approach this issue but I think that the path of least resistance for you would be postgres replication. Check out this Postgres replication tutorial for a quick, proof-of-concept. (There are many hits when you Google for postgres replication and that link is just one of them.) Here is a link documenting streaming replication from the PostgreSQL site's wiki.
I am suggesting this because it meets all of your criteria and also stays withing the bounds of the technology you're familiar with. The only learning curve would be the replication part.
Replication solves your issue because it would create a second database which would effectively become your "read-only" db which would be updated via the replication process. You would keep the schema the same but your indexing could be altered and reports/dashboards customized. This is the database you would query. Your main database would be your transactional database which serves the users and the replicated database would serve the stakeholders.
This is a wide topic, so please do your diligence and research it. But it's also something that can work for you and can be quickly turned around.
If you really want try Data Mining with PostgreSQL there are some tools which can be used.
The very simple way is KNIME. It is easy to install. It has full featured Data Mining tools. You can access your data directly from database, process and save it back to database.
Hardcore way is MADLib. It installs Data Mining functions in Python and C directly in Postgres so you can mine with SQL queries.
Both projects are stable enough to try it.
For reporting, we use non-transactional (read only) database. We don't care about normalization. If I were you, I would use another database for reporting. I will desing the tables following OLAP principals, (star schema, snow flake), and use an ETL tool to dump the data periodically (may be weekly) to the read only database to start creating reports.
Reports are used for decision support, so they don't have to be in realtime, and usually don't have to be current. In other words it is acceptable to create report up to last week or last month.

how to extract data from mongo collection for data warehouse use

My company starts to use mongo and we are starting to think about what is the best way to extract data from mongodb and send it to our data warehouse.
My question focus around the extract part of the process. As i see it, the best way is to expose API on the service that is built on top of mongo, that the ETL process (that is invoked by a job from the data warehouse) will execute with some specific query that will probably will query for set of times (i.e. - startdate and enddate for every record).
is that sound right or i am missing something or maybe there is better way than that?
initially i was thinking about doing mongoexport every X duration but according to the documentation it seems not so good performance wise.
Thanks in advance!
give a try to pentaho kettle.
https://anonymousbi.wordpress.com/2012/07/25/creating-pentaho-reports-from-mongodb/
I am using Alteryx Designer to extract from MongoDB with the dedicated connector and prep my data to load into Tableau, with optional data prep in between.
Works pretty well!
ALteryx can write to most DBs though...

Mongodb access through sql like syntax

Is there any library where i can access mongodb by using sql like syntax.
Example
use db
select * from table1
insert into table1 values (a,b,c)
delete from table
select a,b,count(*) from table1 group by a,b
select a.field1,b.field2 from a,b where a.id=b.id
Thanks
Raman
The learning curve is small only if you are only doing extremely simple sql queries. If the extent of your SQL querying is "select * from X", then MongoDB looks like a brilliant idea to cut through all the too-complicated SQL. But if you need to perform left outer joins, test for null, check for ranges, subselects, grouping and summation, then you will soon end up with a round concave dent in your desk after being moved to Mongo. The sick punchline is that half the time, the thing you are trying to do can't be done in the Mongo interface. Mongo represents a bold new world where instead of databases doing things like aggregation and query optimization, it just stores data and all the magic is done by retrieving everything, slowly, storing it in app memory, and doing all that stuff in code instead.
YES!
A company called UnityJDBC makes a JDBC driver for mongodb. Unlike the mongo java driver, this JDBC driver allows you to run SQL queries against MongoDB and the driver is supported by any Java appliaction that uses JDBC.
to download this driver go to...
http://www.unityjdbc.com/mongojdbc/mongo_jdbc.php
Its free to download too!
hope this helps
MoSQL might satisfy your needs. It'll require you to run a new PostgreSQL instance but from there you can query your entire Mongo dataset with SQL.
"MoSQL imports the contents of your MongoDB database cluster into a PostgreSQL instance, using an oplog tailer to keep the SQL mirror live up-to-date. This lets you run production services against a MongoDB database, and then run offline analytics or reporting using the full power of SQL."
Have a look at this recent project: http://www.mongosql.com/. I've been looking at it over the last few weeks and it looks very promising.
For those of you who have questioned the usefulness of SQL against MongoDB, consider the large number of not-very-technical users in many organizations, like business analysts, who may know SQL, but don't want to make the leap to JavaScript and JSON. Tools like mongoSQL can help push the adoption of MongoDB in an organization.
There are a few solutions out there, but nearly all of them fail to truly represent the MongoDB data model in a way that the "relationally" minded ODBC/JDBC applications and users desire/require. A recent commercial product was released that addresses these challenges
ODBC:
http://www.progress.com/products/datadirect-connect/odbc-drivers/data-sources/mongodb
JDBC:
http://www.progress.com/products/datadirect-connect/jdbc-drivers/data-sources/mongodb
To address the need for ODBC/JDBC (SQL) access...While there are strong arguments for writing new applications using Mongo's clients, there is still a strong need in the marketplace for quality ODBC/JDBC and SQL based access to MongoDB. This need largely arises from all the reporting, analytic, and BI applications that rely on ODBC/JDBC connectivity and do not offer native integration with MongoDB.
Free NoSQL Viewer supports conversion of SQL queries to MongoDB shell syntax. Furthermore, in SQL Viewer you can even use SQL SELECT statements to query MongoDB collections data without knowing MongoDB query syntax. Check out NoSQL Viewer here www.spviewer.com/nosqlviewer.html
Mongodb and its current driver do not support direct SQL like syntax.
However, all operations are easily doable with the driver specific operations.
Here is a brief mapping of mongodb operations to corresponding SQL like query :
http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart
There are a couple projects underway to emulate a SQL interface for MongoDB. While they provide a familiar interface, in general they should be avoided. They operate on a fundamentally flawed premise in that they parse strings and translate them into method calls.
Once you work with MongoDB you will find the approach of using classes and methods a much more accessible interface as it works exactly like all other parts of your application. Yes there is a small learning curve as you first start, but for the most part, the interface in MongoDB works how you would expect it to.