Using MongoDB in AWS Lambda with the mLab API - mongodb

Usually you cant use MongoDB in Lambdas because Lambda functions are stateless and operations on MongoDB require a connection, thus you suffer a large performance hit in setting up a DB connection each time a function is run.
A solution I have thought of is to use mLab's REST API (http://docs.mlab.com/data-api/), that way I dont need to open a new connection each time my Lambda function is called.
On problem I can see if that mLab's REST service could become a bottleneck, plus im relying on it never going down.
Thoughts?

I have a couple of alternative suggestions for you on this. Only because I've never used mLab.
Setup http://restheart.org/ and have that sit between your lambda micro services and your MongoDB instance. I've used this with pretty decent success on another project. It does come with the downside of now having an EC2 instance to maintain. However, setting up restheart is pretty easy and the crew maintaining it and giving support is pretty great.
You can setup a lambda function that pays the cost of connecting and keeping a connection open. All of your other microservices can then call that lambda function for the data they need. If it is hit more frequently, you will not have to pay the cost of the DB connection as frequently. However, that first connection can be pretty brutal so you may need something keeping it warm. You will also have the potential issue of connections never getting properly closed, and eventually running out.
Those two options aside, if mlab is hosting your DB, you already have put a lot of faith in their ability to keep a system alive. If they cant keep an API up that lack of faith should also translate to their ability to keep your DB alive.

Related

RethinkDB - How to stream data to the browser

Context
Greetings,
One day I randomly found RethinkDB and I was really fascinated by the whole real-time changes thing. In order to learn how to use this tool I quickly spinned up a container running RethinkDB and i started making a small project. I wanted to make something very simple therefore i thought about creating a service in which speakers can create room and the audience can ask questions. Other users can upvote questions in order to let the speaker know which one are the best. Obviously this project has a lot of realtime needs that i believe are best satisfied by using RethinkDB.
Design
I wanted to use a vary specific set of tools for this. The backend would be made in Laravel Lumen, the frontend in Vue.JS and the database of course would be RethinkDB.
The problem
RethinkDB as it seems is not designed to be exposed to the end user directly despite the fact that no security concern exists.
Assuming that the user only needs to see the questions and the upvoted in real time, no write permissions are needed and if a user changed the room ID nothing bad will happen since the rooms are all publicly accessible.
Therefore something is needed in order to await data updates and push it through a socket to the client (socket.io for example or pusher).
Given the fact that the backend is written in PHP i cannot tell Lumen to stay awake and wait for data updates. From what i have seen from the online tutorials a secondary system should be used that should listen for changes and then push them. (lets say a node.js service for example)
This is understandable however i strongly believe that this way of transferring the data to the user is inefficient and it defeats the purpose of RethinkDB.
If I have to send the action from the client's computer (user asks a question), save it to the database, have a script that listens for changes, then push the changes to socket.io and finally have the client (vue.js) act when a new event arrives, what is the point of having a real-time database in the first place?
I could avoid all this headache simply by having the Lumen app push the event directly to socket.io and user any other database system instead.
I really cant understand the point of all this. I am not experienced with no-sql databases by any means but i really want to experiment with them.
Thank you.
This is understandable however i strongly believe that this way of transferring the data to the user is inefficient and it defeats the purpose of RethinkDB.
RethinkDB has no built in mechanism to transfer data to end-users. It has no access control (in the conventional sense) as well. The common way, like you said, is to spin up one / multiple node instance(s) running socket.io. On each instance you can listen on your RethinkDB change streams and use socket.io's broadcast functionality. This would be a common way, but as RethinkDB's streams are pretty optimized, you could also open a change stream for every incoming socket.io connection.

Google Cloud SQL Postgres - randomly slow queries from Google Compute / Kubernetes

I've been testing Google Cloud SQL with Postgresql, but I have random queries taking ~3s instead of a few ms.
Troubleshooting I did:
The queries themselves aren't problems, rerunning the same query will work.
Indexes are properly set. The database is also very very small, it shouldn't do this, even if there weren't any index.
The Kubernetes container is connecting to the database through SQL Proxy (I followed this https://cloud.google.com/sql/docs/postgres/connect-kubernetes-engine). It is not the problem though as I tried to connect directly to the database, with the same issue.
I configured net.ipv4.tcp_keepalive_time to 60 to make sure the connection weren't dropping.
I also have a pool of connection that are never disconnected to make sure it wasn't from that.
When I run queries directly through my local Postgresql client, I never have the problem.
I don't have this issue when developing locally either and connecting to my local database.
What I'm getting at is: I feel there's some weird connection/link issue between my Google Compute instances and my Google SQL instance that I can't seem to figure out.
Any idea?
Edit:
I also noticed these logs in my SQL Cloud instance every 30s:
ERROR: recovery is not in progress
HINT: Recovery control functions can only be executed during recovery.
STATEMENT: SELECT pg_is_xlog_replay_paused(), current_timestamp
That's an interesting problem you are facing. So my knowledge on Kubernetes isn't that great, but I do have a general understanding so let's see if I can provide some suggestions.
To start with, the API that you linked to in your question does mention that it is still in beta. So I do believe there would still be issues to patch in maximizing speed performance.
Secondly, from what I understand, Kubernetes is a great tool for handling stateless workloads. Thus, handling data where state is required for queries would be a slow operation. This article (although not entirely related) does explain some of the pitfalls of Kubernetes (not all the questions are relevant)
Thirdly, could you explain your use case a little bit? Do you really need to use Kubernetes or will another tool like a powerful Compute Engine Instance or or a Dataflow job resolve the the issue? Are you making your database queries through a programming language or an application call?
Thanks, and do let me know!

Database access: singleton or close

I am working on a project for a while and I am close to finish.
However now some technical questions are coming.
So I am dealing with mongoDB and SpringData layer.
However this is not the DB which matter but more the question behind.
I am building a rest api with jaxrs.
I decided to put all my end point in scope prototype and my services inside scope request (because some services can be used more than one time in the same request).
However comes the question of the database.
- There are some people telling me that a singleton is the best approach to have only one connection but in the other end if the traffic grows all the request will be stuck at database entry
There is as well the solution of the closing connection. I implemented a filter which performs some processes (like renewing token if needed etc) I could plug the close connection here. But some person say that it is really costly to open a connection and close it.
I found some answers but it is more related to the client part (phone for example) and the constraint are not the same.
from the mongodb java driver documentation http://mongodb.github.io/mongo-java-driver/3.4/driver/tutorials/connect-to-mongodb/
The MongoClient() instance represents a pool of connections to the database; you will only need one instance of class MongoClient even with multiple threads.
I might be wrong but I dont think Mongo takes advantage of multiple connections (they run sequentially anyway), making them pointless

Is there a way to enable an SQL log to see/optimize my queries using CloudSQL

I started my test of using a Google's CloudSQL instance with a desktop based application, so far I am impressed with a performance, even it is laggy, it does the job, so my next step is to see what simple modifications can do to my application most intended to reduce Access to the database and optimize if there is something more to do.
How can I do log the sql commands send to the database in order to check what queries are being sent. My app uses ODBC drivers in Windows.
Regards
What you probably want is to turn on the general log. Unfortunately, that requires SUPER privileges and that was removed some time ago (announcement). We are going to provide a way to tweak parameters like that via the Cloud SQL API. For now, the best solution is to use a setup a local server and use the logging on that one. If you really want it on production ping us on the google-cloud-sql-discuss Google group and we'll enable the SUPER for your instance.

How Do I Optimize Zend Framework

I have a application built on Zend Framework I am trying to optimize.
I did some Xdebug profiling and although i cant say i understand every nitty gritty of the results i got, some things were quite obvious from the result.
For instance, the file Bootstrap.php seems to be the one gulping most of the time taking 4,553MS seconds which accounts for 92.49% of the total time.
And if i dig further, I could see that Zend_Application_Bootstrap_Boostrap->run takes the bulk of the time. Checking this out again, I found out that Zend_Controller_Front->Dispatch might actually be the function inside the Boostrap.php that takes time to execute.
Question is, from these indices that i have, how best can I go about Optimizing the application? If it caching, how do i go about applying Caching to this situation?
Thanks
From the look of the callgrinds, on the login page the app is spending most of it's time in curl_exec, which is to be expected if you're doing a remote login. But it is doing 10 separate curl_execs which seems excessive. I'm not familiar with the LinkedIn login auth, but is it possible your app is running the remote login code multiple times?
On the standard page request the app is spending most of its time connecting to MySQL, and it seems to be doing this twice. Are you using a remote DB server, and do you need two separate DB connections?
Assuming you are using a remote DB server and it is on the same network as your web server, there seems to be some networking issue there. I'd check the latency to that server if you can, and try connecting to the IP address instead of a hostname to see if that makes any difference (if doing this is much faster this would suggest an issue with the DNS setup on your web server).