Can someone give me some options about how I can connect a PostgreSQL database to Power BI?
Right now, I used the Power BI Desktop and drivers to connect to my local database. I then published the data to Power BI for users to access and set up a daily refresh schedule with a Personal Gateway installed. This worked fine.
My issue is that my users now want refreshes every 30 minutes instead of daily and Power BI only allows 8 refreshes per day. This seems like it would require a live connection. My only Windows machine is quite weak and I live across the world from my end-users, so my only option is to set up a remote server.
I have an Azure Linux VM which I would prefer to use, but Power BI does not work on Linux as far as I can tell
My ETL pipelines and database are all based on PostgreSQL and I do not want to switch over to MS SQL or the Azure database product, if I can avoid it
Should I create a Windows-based VM on Azure and install PostgreSQL there and then replicate the required tables for Power BI to visualize? What is the best set up? I did not see any option on the Power BI website to connect live to Postgres so I am a bit concerned.
This is an old question, so you've probably figured out a workaround, but just to confirm:
No, Power BI does not offer a live connection to PostgreSQL at the moment. You can see the current list of what Power BI does live connect to here: https://powerbi.microsoft.com/en-us/documentation/powerbi-refresh-data/#live-connections-and-directquery-to-on-premises-data-sources
If a live connection to PostgreSQL is important to you, I would recommend posting an idea at https://ideas.powerbi.com/ (or up-voting someone else's idea - though I don't see one right now). Microsoft does review these ideas. I'd also recommend sharing the link here, so others searching for how to do this can up-vote the same idea.
In the meantime, a couple of different workarounds:
Even though you can't automate refreshes as often as you'd like, you can do additional manual refreshes. You can initiate the refresh yourself, or you can suggest end-users click the refresh button to get the latest data.
If you don't want to manually refresh, you could look into a 3rd party tool such as Power Update (http://poweronbi.com/power-update-features/). I've never used it before, but it can refresh a Power BI Desktop file and publish it up to the service. This would have the same effect as a manual refresh, but automated.
Note: This question was also asked (and answered) here: https://community.powerbi.com/t5/Integrations-with-Files-and/DirectQuery-for-PostgreSQL-Gateways-on-Linux/td-p/103418.
Since the august release 2019 of power BI there is now a directquery connection for postgres.
https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-august-2019-feature-summary/
For any future viewers of this question - I'm working on building and maintaining a custom connector for exactly this purpose. So far I've been able to access most features except those which require datetime adds or diffs. We do have this working in our production environment w/ Postgres 11 via an enterprise gateway.
Repo:
https://github.com/sgoley/DirectQuery-for-ODBC-in-PowerBI
Please feel free to reach out to me if you'd like to help me resolve any outstanding bugs remaining or just learn more.
A how-to is available at medium here:
https://medium.com/just-readr-the-instructions/directquery-with-postgres-from-powerbi-desktop-f3d8c4dc5e15
Edit: As of August 2019 release, PowerBI will be supporting Direct Query in the native PostgreSQL connector:
https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-august-2019-feature-summary/#postgresql
Related
Our teamwork pipeline uses Google Sheets as a very basic database. We use it because it is a standard spreadsheet that can be accessed online and shared. Based on the exported CSV from this Google sheet, carry out any further analysis.
Since sharing work leads to mistakes, I have to restore the version that makes the mistake without losing the other changes. Since Google Sheets' version history isn't as useful as Git's, I want to put this spreadsheet (ideally, the CSV) under Github version control on an automatic basis.
Would it be possible to do that?
I will need to get into the spreadsheet, export the CSV, and push it to the appropriate repository if I have to do that manually. I think it would be easy to automate. I'm not sure how to do it.
I appreciate your help.
You do not need to export, your spreadsheet can be reached using endpoint as follows
https://docs.google.com/spreadsheets/d/##ssID##/gviz/tq?tqx=out:csv&sheet=##sheetName##
DoltHub is an excellent fit for this use case where you have a spreadsheet being built collaboratively and you want the ability to see the full version history, audit where/when/who each cell's value came from, diff any versions of the spreadsheet, and much more. It's free to use DoltHub and you can easily export your data to CSV from the web, or pull it all down as a Dolt database and access everything locally.
Here's a DoltHub blog post that covers this exact use case in more detail:
https://www.dolthub.com/blog/2022-07-15-so-you-want-spreadsheet-version-control/
If you haven't heard of DoltDB or DoltHub yet, here's a little more background...
DoltDB is the first versioned SQL relational database. It has all the power of a SQL database with all the versioning features of Git. That gives you a database that you can branch and fork, push and pull, merge and diff, just like a Git repository. It's open-source and written from the ground up in Go, and targets full MySQL compliance, so you can use it seamlessly with any tools that connect to a MySQL database.
DoltHub is an online site for finding and collaborating on datasets. The Git-style versioning features built into DoltDB enable easy and safe collaboration and gives you a Pull Request workflow for accepting changes, just like on GitHub. You can control if you want your dataset to be public or private with the free tier, and there's a Pro tier if you need to host private databases larger than 1GB. There's even a DoltLab product available for teams that need to keep their data on their own private network.
There's a very active and friendly DoltHub user community on Discord where the DoltHub dev team hangs out, too, if you have any questions/comments/feedback.
Am need of your suggestion for scenario below :
one of our clients has 8 postgres DB servers used as OLTP and now wants to generate MIS reports/dashboards integrating all the data in the servers.
- There are around 100 reports to be generated
- There would be around 50k rows added to each of these databases
- the reports are to be generated for once every month
- they are running all there setup in baremetals
- they don't want to use hadoop/spark , since they think the maintainabilty will be higher
- they want to use opensource tech to accomplish this task
with all said above, one approach would be to write scripts to bring aggregated data into one server
and then manually code the reports with frontend javascript.
is there any better approach using ETL tools like Talend,Pentaho etc.
which ETL tool would be best suited for this ?
community editions of any ETL tool would suffice the above requirement..?
I know for the fact that the commercial offering of any of the ETL tools will not be in the budget.
could you please let me know your views on this.
Thanks in Advance
Deepak
Sure Yes. I did similar things successfully a dozen time in my life.
My suggestion is to use Pentaho-Data-Integrator (or Talend) to collect the data in one place and then filter, aggregate and format the data. The data volume is not an issue as long as you have a decent server.
For the reports, I suggest to produce them with Pentaho-Report-Designer so that they can be send by mail (with Pentaho-DI) or distributed with a Pentaho-BI-server.
You can also make javascript front end with Pentaho-CDE.
All of these tools are mature, robust, easy to use, have a community edition and well supported by the community.
I've recently decided to embark on a fun / educational personal project to create some data visualizations and power metrics for my fantasy football league. Since ESPN doesn't provide an API, I've decided to use a combination of elbow grease and the nfldb to pull relevant data (and am hoping to get familiar with Plotly for presenting the data). In setting up nfldb, I'm also getting my first exposure to databases, using postgresql in particular (as required by nfldb).
Since the installation guide provided by nfldb is Linux-centric and assumes a fair bit of previous database experience, I've looked to this guide for help and blindly followed its instructions in hopes of sidestepping postgresql (aka the "just make it work" "solution"). Of course, that didn't work, and I have no idea how to diagnose the problem(s), so I've decided to go ahead and use this opportunity to get a little familiar with databases / postgresql.
I've looked to the postgresql documentation for guidance. Having never worked in a server / client environment, the following text (from "18.1. The PostgreSQL User Account") has me particularly confused:
As with any server daemon that is accessible to the outside world, it is advisable
to run PostgreSQL under a separate user account. This user account should only own
the data that is managed by the server, and should not be shared with other
daemons. (For example, using the user nobody is a bad idea.) It is not advisable
to install executables owned by this user because compromised systems could then
modify their own binaries.
To add a Unix user account to your system, look for a command useradd or adduser.
The user name postgres is often used, and is assumed throughout this book, but you
can use another name if you like.
I'd really appreciate a well annotated version of these paragraphs. How does it apply to someone like me, storing and accessing date on the same machine? Do I need to create a new system user account? How do I make sure it "only owns the data that is managed by the server"? Where is the responsible location to install postgresql? Am I exposed to some sort of security risk by downloading the nfldb database? Why is the user nobody a bad idea?
Relevant: I am using a Mac (v10.11.6) and plan to install (or re-install, if necessary) postgresql using Homebrew.
We've got an ASP.NET Web API application and we are using Entity Framework v6.1.3 and SQL Server 2012. Everything works fine locally with no performance issues, it also works very well on a server we have that runs Windows Server 2012.
When we tried Azure as a cloud platform, we got the free trial and deployed our application, but the performance was so bad with some queries taking from 4-7 minutes. Also as a side note, the server we got is connected to the internet and can be accessed outside our company's internal network, the performance still is not an issue only on Azure we faced this problem.
Any help would be appreciated, we want to invest in Azure but we fear we will face same issues we had on our trial subscription.
Probably you are using Basic tier which has 5 DTUs. Simply, we can say this database can process 5 transactions per second. Azure have tiers from basic to P11 which is 5 DTUs to 1750 DTUs. Additionally, if you choose right region for data center, it will help for network speed.
Select single database on menu and see the tiers and their prices. You can see how many DTUs you are using on portal. That's way you can choose perfect price/perfomance tier for you. For example, If it's higher than 80% increase the tier. If it's below 60% you can keep this tier.
I am using IBM UrbanCode udeploy, great tool but afraid IBM will kill it sooner or later with their slow response in supporting. What are the other choices?
UCD isn't going anywhere. Its in high demand these days, and is gaining traction in z/OS shops as well.
However, there other popular tools are Chef and Puppet. Just be prepared for manual data entry and less friendly interface. They work, but are more work to use.