Elixir / Elixir-mongo collection find breaks on Enum.to_list - mongodb

This is my first go around with elixir, and I'm trying to make a simple web scraper that saves into mongodb.
I've installed the elixir-mongo package and am able to insert into the database correctly. Sadly, I'm not able to retrieve the values that I have put into the DB.
Here is the error that I am getting:
** (Mix) Could not start application jobboard: exited in: JB.start(:normal, [])
** (EXIT) an exception was raised:
** (ArgumentError) argument error
(elixir) lib/enum.ex:1266: Enum.reduce/3
(elixir) lib/enum.ex:1798: Enum.to_list/1
(jobboard) lib/scraper.ex:8: JB.Scraper.scrape/0
(jobboard) lib/jobboard.ex:26: JB.start/2
(kernel) application_master.erl:272: :application_master.start_it_old/4
If I understand the source correctly, then the mongo library should implement reduce here:
https://github.com/checkiz/elixir-mongo/blob/13211a0c0c9bb5fed29dd2faf7a01342b4e97eb4/lib/mongo_find.ex#L78
Here are the relevant sections of my code:
#JB.Scraper
def scrape do
urls = JB.ScrapedUrls.unscraped_urls
end
#JB.ScrapedUrls
def unscraped_urls do
MongoService.find(%{scraped: false})
end
#MongoService
def find(statement) do
collection |> Mongo.Collection.find(statement) |> Enum.to_list
end
defp collection do
mongo = Mongo.connect!
db = mongo |> Mongo.db("simply_hired_urls")
db |> Mongo.Db.collection("urls")
end
As a bonus, if anyone can tell me how I can get around connecting to Mongo every time I make a new call, that would be awesome. :) I'm still figuring out FP.
Thanks!
Jon

Didn't use this library, but I just made a simple attempt of the simplified version of your code.
I've started with
Mongo.connect!
|> Mongo.db("test")
|> Mongo.Db.collection("foo")
|> Mongo.Collection.find(%{scraped: true})
|> Enum.to_list
This worked fine. Then I suspected that the problem occurs when too many connections are open, so I ran this test repeatedly, and then it failed with the same error you got. It failed consistently when trying to open the connection for the 2037th time. Looking at the mongodb log, I can tell that it can't open another connection:
[initandlisten] can't create new thread, closing connection
To fix this, I simply closed the connection after I converted the results to list, using Mongo.Server.close/1. That fixed the problem
As you detect yourself, this is not an optimal way of communicating with the database, and you'd be better off if you could reuse the connection for multiple queries.
A standard way of doing this is to hold on to the connection in a process, such as GenServer or an Agent. The connection becomes a part of the process state, and you can run multiple queries in that process over the same connection.
Obviously, if multiple client processes use a single database process, all queries will be serialized, and the database process then becomes a performance bottleneck. To deal with this, you could open a pool of processes, each one managing a distinct database connection. This can be done in simple way with the poolboy library.
My suggestion is that you try implementing a single GenServer based process that maintains the connection and runs queries. Then see if your code works correctly, and when it does, try to use poolboy to be able to deal with concurrent requests efficiently.

Related

Can I reuse an connection in mongodb? How this connections actually work?

Trying to do some simple things with mongodb my mind got stuck in something that feels kinda strange for me.
client = MongoClient(connection_string)
db = client.database
print(db)
client.close()
I thought that when make a connection it is used only this one along the rest of the code until the close() method. But it doesn't seem to work that way... I don't know how I ended up having 9 connections when it supposed to be a single one, and even if each 'request' is a connection there's too many of them
For now it's not a big problem, just bothers me the fact that I don't know exactly how this works!
When you do new MongoClient(), you are not establishing just one connection. In fact you are creating the client, that will have a connection pool. When you do one or multiple requests, the driver uses an available connection from the pool. When the use is complete, the connection goes back to the pool.
Calling MongoClient constructor every time you need to talk to the db is a very bad practice and will incur a penalty for the handshake. Use dependency injection or singleton to have MongoClient.
According to the documentation, you should create one client per process.
Your code seems to be the correct way if it is a single thread process. If you don't need any more connections to the server, you can limit the pool size by explicitly specifying the number:
client = MongoClient(host, port, maxPoolSize=<num>).
On the other hand, if the code might later use the same connection, it is better to simply create the client once in the beginning, and use it across the code.

How to debug PSQLException: FATAL: sorry, too many clients already in Play

I'm using Play 2.6 with Scala, and something in my program is eating up the connections to my Postgresql database. I keep getting:
PSQLException: FATAL: sorry, too many clients already
In my console. I've checked my max connections with
show max_connections;
And it's at the default of 100, but I shouldn't be eating up this many. Any time I access the database in the application, I use the suggested:
myDB.withConnection { conn => \... do work with SQL here ...\ }
block, which according to the documentation, should release the connection once it escapes the the block. I don't think anything is getting stuck in a loop, as otherwise other pieces of my code wouldn't be executing. Unfortunately, the stack trace that's printing only shows the "behind the scenes" stuff and won't show what caller is establishing the DB connection. Any ideas on how I can find the offender?

How to perform an "I can reach my database" healthcheck?

I have a classic spray+slick http server which is my database access layer, and I'd like to be able to have an healthcheck route to ensure my server is still able to reach my DB.
I could do it by doing a generic sql query, but I was wondering if there was a better way to just check the connection is alive and usable without actually adding load on the database (or at least the minimum possible load).
So pretty much :
val db = Database.forConfig("app.mydb")
[...]
db.???? // Do the check here
Why do you want to avoid executing a query against the database?
I think the best health check is to actually use the database as your application would (actually connecting and running a query). With that in mind, you can perform a SELECT 1 against your DB, and verify that it responds accordingly.

SQLTimeoutException in play-slick

I'm using play-slick with slick 3.0.0 in this way:
I got a connection by
val conn = db.createSession.conn
then got statement:
val statement = conn.prepareStatement(querySQL)
and return ResultSet:
Future{statement.executeQuery()}
But I got a problem: I tried to use this query about 50 times, then, I got the exception:
SQLTimeoutException: Timeout after 1000ms of waiting for a connection.
I know this may caused by connections are not closed and I didn't close the connection or session in my code manually.
I want to know:
Will connection create by my way close and return to connection pool automatically?
Was my situation caused by connection didn't release?
How to close a connection manually?
Any help would be greatly appreciated!
Remark: It would most helpful, if you post your full code (including your call that is performed 50 times)
Will connection create by my way close and return to connection pool automatically?
No. Even though Java 7 (and up) provides the so called try-with-resources (see https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html ) to auto-close your resource. ,AFAIK, this mechanism is not available in Scala (please correct me somebody, if this is not true).
Still, Scala provides the LOAN-Pattern ( see https://wiki.scala-lang.org/display/SYGN/Loan , especially using ) which provides a FP way of closing resources finally.
Was my situation caused by connection didn't release?
As long as you don't provide your full code, it is only a guess. Yes, not closing connections make the connection pool exceed, thus that no new connections are available eventually.
How to close a connection manually?
connection.close()

ActiveRecord find_or_initialize_by race conditions

I have a scenario where 2 db connections might both run Model.find_or_initialize_by(params) and raise an error: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint
I'd like to update my code so it could gracefully recover from it. Something like:
record = nil
begin
record = Model.find_or_initialize_by(params)
rescue ActiveRecord::RecordNotUnique
record = Model.where(params).first
end
return record
The trouble is that there's not a nice/easy way to reproduce this on my local machine, so I'm not confident that my fix actually works.
So I thought I'd get a bit creative and try calling create 2 times (locally) in a row which should raise then PG::UniqueViolation: ERROR, then I could rescue from it and make sure everything is handled gracefully.
But I get this error: PG::InFailedSqlTransaction: ERROR: current transaction is aborted, commands ignored until end of transaction block
I get this error even when I wrap everything in individual transaction blocks
record = nil
Model.transaction do
record = Model.create(params)
end
begin
Model.transaction do
record = Model.create(params)
end
rescue ActiveRecord::RecordNotUnique
end
Model.transaction do
record = Model.where(params).first
end
return record
My questions:
What's the right way to gracefully handle the race condition I mentioned at the very beginning of this post?
How do I test this locally?
I imagine there's probably something simple that I'm missing here, but it's late and perhaps I'm not thinking too clearly.
I'm running postgres 9.3 and rails 4.
EDIT Turns out that find_or_initialize_by should have been find_or_create_by and the errors I was getting was from the actual save call that happened later on in execution. #VeryTiredWhenIWroteThis
Has this actually happenend?
Model.find_or_initialize_by(params)
should never raise an ´ActiveRecord::RecordNotUnique´ error as it is not saving anything to db. It just creates a new ActiveRecord.
However in the second snippet you are creating records.
create (without bang) does not throw exceptions caused by validations, but
ActiveRecord::RecordNotUnique is always thrown in case of a duplicate by both create and create!
If you're creating records you don't need transactions at all. As Postgres being ACID compliant guarantees that only one of the both operations succeeds and if it responds so it's changes will be durable. (a single statement query against postgres is also a transaction). So your above code is almost fine if you replace through find_or_create_by
begin
record = Model.find_or_create_by(params)
rescue ActiveRecord::RecordNotUnique
record = Model.where(params).first
end
You can test if the code behaves correctly by simply trying to create the same record twice in row. However this will not test ActiveRecord::RecordNotUnique is actually thrown correctly on race conditions.
It's also no the responsibility of your app to test and testing it is not easy. You would have to start rails in multithread mode on your machine, or test against a multi process staging rails instance. Webrick for example handles only one request at a time. You can use puma application server, however on MRI there is no true concurrency (GIL). Threads only share the GIL only on IO blocking. Because talking to Postgres is IO, i'd expect some concurrent requests, but to be 100% sure, the best testing scenario would be to deploy on passenger with multiple workers and then use jmeter to run concurrent request agains the server.