NestJS Mongoose connection dies on load testing - mongodb

When multiple devs use my API, multiple concurrent requests are being sent to Mongoose.
When the concurrency is high, the connection just "dies" and refuses to fulfil any new request, no matter how long I wait (hours!).
I just want to state everything is working fine on regular use. Heavy use leads the connection to crash.
My MongooseModule initialization:
MongooseModule.forRoot(DatabasesService.MONGO_FULL_URL, {
useNewUrlParser: true,
useUnifiedTopology: true,
useFindAndModify: false,
autoEncryption: {
keyVaultNamespace: DatabasesService.keyVaultNamespace,
kmsProviders: DatabasesService.kmsProviders,
extraOptions: {
mongocryptdSpawnArgs: ['--pidfilepath', '/tmp/mongocryptd.pid']
}
} as any
})
Module that imports the feature:
#Module({
imports: [MongooseModule.forFeature([{ name: 'modelName', schema: ModelNameSchema }])],
providers: [ModelNameService],
controllers: [...],
exports: [...]
})
Service:
#Injectable()
export class ModelNameService {
constructor(
#InjectModel('modelName') private modelName: Model<IModelName>
) {}
async findAll(): Promise<IModelName[]> {
const result: IModelName[] = await this.modelName.find().exec();
if (!result) throw new BadRequestException(`No result was found.`);
return result;
}
}
I've tried loadtesting using different utils, the easiest was:
ab -c 200 -n 300 -H "Authorization: Bearer $TOKEN" -m GET -b 0 https://example.com/getModelName
Any new request after the connection hangs gets stuck at ModelNameService.findAll() first line (the request to mongo).
On mongodb logs with verbosity of "-vvvvv" I can see few suspicious lines:
User Assertion: Unauthorized: command endSessions requires authentication src/mongo/db/commands.cpp
Cancelling outstanding I/O operations on connection to 127.0.0.1:33134
And I've also found that it doesn't exceed 12 open connections at the same time. It always waits to close one before opening a new one.
Other key points:
Mongoose doesn't return any value or notifies about any error. It just hangs without notifying anything.
Terminus health check able to ping the DB and returns a healthy status.
NestJS API still works - I'm able to send new requests and receive a response. Just requests that are related to the faulty connection hang.
When I inject connection and check its readyState it returns connected.
Restarting the API fixes it immediately.
MongoDB itself keeps working as normal.
Increasing Mongoose poolSize is able to handle more requests at the same time but will still crash on a larger amount of requests.
My main question here is how do I handle this case? Currently, I've added another health check to try and send a query to the problematic connection every half a minute, and k8s restarts the pod if it determines a failure. This works but it is not optimal.

Related

Total open connections reached the connection limit

I'm running Python Flask with Waitress. I'm starting the server using the following code:
from flask import Flask, render_template, request
from waitress import serve
#app.route("/get")
def first_method():
...
#app.route("/second")
def second_method():
...
app = Flask(__name__)
app.static_folder = 'static'
serve(app, host="ip_address", port=8080)
I'm calling the server from a Webpage and also from Unity. From the webpage, I'm using the following example get request in jQuery:
$.get("/get", { variable1: data1, variable2: data2 }).done(function (data) {
...
}
In Unity I'm using the following call:
http://ip_address/get?msg=data1?data2
Unfortuantely, after some time I'm getting the error on the server total open connections reached the connection limit, no longer accepting new connections. This especially happens with Unity. I assume that for each get request a new channel/connection is established.
How can this be fixed, i.e. how can channels/connections be reused?

Flow testnet getting: An error occurred when interacting with the Access API

I'm trying to make a request to the testnet but getting the following error:
HTTP Request Error: An error occurred when interacting with the Access API.
transport=FetchTransport
error=Failed to fetch hostname=access.devnet.nodes.onflow.org:9000
path=/v1/scripts?block_height=sealed
method=POST
requestBody={
"script":"aW1wb3J0IEZ1bmdpYmxlVG9rZW4gZnJvbSAweDlhMDc2NmQ5M2I2NjA4YjcKaW1wb3J0IEZVU0QgZnJvbSAweGUyMjNkOGE2MjllNDljNjgKCnB1YiBmdW4gbWFpbihhZGRyZXNzOiBBZGRyZXNzKTogVUZpeDY0IHsKICAgIGxldCBhY2NvdW50ID0gZ2V0QWNjb3VudChhZGRyZXNzKQoKICAgIGxldCB2YXVsdFJlZiA9IGFjY291bnQuZ2V0Q2FwYWJpbGl0eSgvcHVibGljL2Z1c2RCYWxhbmNlKSEKICAgICAgICAuYm9ycm93PCZGVVNELlZhdWx0e0Z1bmdpYmxlVG9rZW4uQmFsYW5jZX0+KCkKICAgICAgICA/PyBwYW5pYygiQ291bGQgbm90IGJvcnJvdyBCYWxhbmNlIHJlZmVyZW5jZSB0byB0aGUgVmF1bHQiKQoKICAgIHJldHVybiB2YXVsdFJlZi5iYWxhbmNlCn0=",
"arguments":["eyJ0eXBlIjoiQWRkcmVzcyIsInZhbHVlIjoiMHg1ODA2MjJlNzQ1MTgzYjE2In0="]
}
The base64 decoded script is:
import FungibleToken from 0x9a0766d93b6608b7
import FUSD from 0xe223d8a629e49c68
pub fun main(address: Address): UFix64 {
let account = getAccount(address)
let vaultRef = account.getCapability(/public/fusdBalance)!
.borrow<&FUSD.Vault{FungibleToken.Balance}>()
?? panic("Could not borrow Balance reference to the Vault")
return vaultRef.balance
}
and the arguments are:
{"type":"Address","value":"0x580622e745183b16"}
When I run the following command in the cli, it works: flow scripts execute -n testnet scripts/getFUSDBalance.cdc --arg "Address:580622e745183b16"
not sure why I'm getting issues with this
EDIT:
didn't mention that the error is coming from FCL
Here's the code I'm using to interact with Flow:
async getFUSDBalance(): Promise<Result<number, string>> {
const scriptText = getFUSDBalance as string;
const user = await this.getCurrentUser();
return await flow.query<number>({
cadence: scriptText,
payer: fcl.authz,
authorizations: [fcl.authz],
args: (arg, t) => [
arg(user.addr, t.Address)
]
})
}
flow.query<T> is just a wrapper for fcl.query
It works in the CLI but not from FCL
EDIT 2:
So I found that the real issue I was facing was I was getting this error:
Fetch API cannot load access.devnet.nodes.onflow.org:9000/v1/scripts?block_height=sealed. URL scheme "access.devnet.nodes.onflow.org" is not supported
access.devnet.nodes.onflow.org:9000 is gRPC (I think). I should have been using https://rest-testnet.onflow.org. However, when I change to that I get a CORS violation. I think I read somewhere that you can't access the testnet from localhost (why?), but I deployed to a *.app domain and I'm getting the same error No 'Access-Control-Allow-Origin' header is present on the requested resource. Is the CORS policy not set up for *.app domains?
So turns out changing to https://rest-testnet.onflow.org for the accessNode.api value did work. Not sure why it didn't when I posted the second edit, but whatever. This works for localhost too.

Failed to enroll admin, error:%o message=Calling enroll endpoint failed, CONNECTION Timeout

I am running my fabric network on kubernetes and I have setup ca servers for all the organisations. I am able to register and enroll the user from the cli but when i am using the fabric-ca-client library with nodejs to register and enroll the users. I am facing the CONNECTION Timeout issue, also at the same time if I look at the logs of my ca-server it show that is able to process the request.
Edit1: I am using the same code provided in fabric-sample to register and enroll the users.
All the all the pods are communicating with each other using these services in kubernetes
this is how my connection profile looks
"certificateAuthorities": {
"ca-org2": {
"url": "https://ca-org2:8054",
"caName": "ca-org2",
"tlsCACerts": {
"pem": ["-----BEGIN CERTIFICATE-----\nMIICBjCCAa2gAwIBAgIUHwBYatG6KhezYWHxdGgYGqs77PIwCgYIKoZIzj0EAwIw\nYDELMAkGA1UEBhMCVUsxEjAQBgNVBAgTCUhhbXBzaGlyZTEQMA4GA1UEBxMHSHVy\nc2xleTEZMBcGA1UEChMQb3JnMi5leGFtcGxlLmNvbTEQMA4GA1UEAxMHY2Etb3Jn\nMjAeFw0yMTAzMjAxMDI4MDBaFw0zNjAzMTYxMDI4MDBaMGAxCzAJBgNVBAYTAlVL\nMRIwEAYDVQQIEwlIYW1wc2hpcmUxEDAOBgNVBAcTB0h1cnNsZXkxGTAXBgNVBAoT\nEG9yZzIuZXhhbXBsZS5jb20xEDAOBgNVBAMTB2NhLW9yZzIwWTATBgcqhkjOPQIB\nBggqhkjOPQMBBwNCAAQUIABkRhfPdwoy2QrCY3oh8ZuzP5OprZJawVXO2ojid3j4\nC9W4l46QXR5J7iG5MLczguPZWB9dZWygRQdUQeoAo0UwQzAOBgNVHQ8BAf8EBAMC\nAQYwEgYDVR0TAQH/BAgwBgEB/wIBATAdBgNVHQ4EFgQURx/h3nkH0fq+3TlRPnQW\nWTHbR7YwCgYIKoZIzj0EAwIDRwAwRAIgCF+vcLFERb+VHa6Att0rh5yhpMd0bHEn\nmkNo0YfKuX4CICodtpp6AKtNWXreskaN+kRMH8eDmwvxkhvTK68ejv8U\n-----END CERTIFICATE-----\n"]
},
"httpOptions": {
"verify": false
}
}
}
I found the solution to this issue. The issue was related to the connection timeout, my CA Server was receving the requests and able to process them also but due to the short timeout the request was being cancelled. The solution was to increase the connection timeout and request-timeout. The default value of timeouts is 3s and I increased it to 30s and it started working. The default configuration can be found here
{
"request-timeout" : 3000,
"tcert-batch-size" : 10,
"crypto-hash-algo": "SHA2",
"crypto-keysize": 256,
"crypto-hsm": false,
"connection-timeout": 3000
}
we can update the timeout values from source code of the fabric-ca-client library or simply can use the methods of fabric-common library to update the these configuration values like this.
const { Utils: utils } = require('fabric-common');
const path=require('path');
let config=utils.getConfig()
config.file(path.resolve(__dirname,'config.json'))
And here is our modified configuration file config.json
{
"request-timeout" : 30000,
"tcert-batch-size" : 10,
"crypto-hash-algo": "SHA2",
"crypto-keysize": 256,
"crypto-hsm": false,
"connection-timeout": 30000
}

ECONNREFUSED during 'next build'. Works fine with 'next dev' [duplicate]

This question already has an answer here:
Fetch error when building Next.js static website in production
(1 answer)
Closed last year.
I have a very simple NextJS 9.3.5 project.
For now, it has a single pages/users and a single pages/api/users that retrieves all users from a local MongoDB table
It builds fine locally using 'next dev'
But, it fails on 'next build' with ECONNREFUSED error
page/users
import fetch from "node-fetch"
import Link from "next/link"
export async function getStaticProps({ params }) {
const res = await fetch(`http://${process.env.VERCEL_URL}/api/users`)
const users = await res.json()
return { props: { users } }
}
export default function Users({ users }) {
return (
<ul>
{users.map(user => (
<li key={user.id}>
<Link href="/user/[id]" as={`/user/${user._id}`}>
<a>{user.name}</a>
</Link>
</li>
))}
</ul>
);
}
pages/api/users
import mongoMiddleware from "../../lib/api/mongo-middleware";
import apiHandler from "../../lib/api/api-handler";
export default mongoMiddleware(async (req, res, connection, models) => {
const {
method
} = req
apiHandler(res, method, {
GET: (response) => {
models.User.find({}, (error, users) => {
if (error) {
connection.close();
response.status(500).json({ error });
} else {
connection.close();
response.status(200).json(users);
}
})
}
});
})
yarn build
yarn run v1.22.4
$ next build
Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`
> Info: Loaded env from .env
Creating an optimized production build
Compiled successfully.
> Info: Loaded env from .env
Automatically optimizing pages ..
Error occurred prerendering page "/users". Read more: https://err.sh/next.js/prerender-error:
FetchError: request to http://localhost:3000/api/users failed, reason: connect ECONNREFUSED 127.0.0.1:3000
Any ideas what is going wrong ? particularly when it works fine with 'next dev' ?
Thank you.
I tried the same few days ago and didn't work... because when we build the app, we don't have localhost available... check this part of the doc - https://nextjs.org/docs/basic-features/data-fetching#write-server-side-code-directly - that said: "You should not fetch an API route from getStaticProps..." -
(Next.js 9.3.6)
Just to be even more explicit on top of what Ricardo Canelas said:
When you do next build, Next goes over all the pages it detects that it can build statically, i.e. all pages that don't define getServerSideProps, but which possibly define getStaticProps and getStaticPaths.
To build those pages, Next calls getStaticPaths to decide which pages you want to build, and then getStaticProps to get the actual data needed to build the page.
Now, if in either of getStaticPaths or getStaticProps you do an API call, e.g. to a JSON backend REST server, then this will get called by next build.
However, if you've integrated both front and backend nicely into a single server, chances are that you have just quit your development server (next dev) and are now trying out a build to see if things still work as sanity check before deployment.
So in that case, the build will try to access your server, and it won't be running, so you get an error like that.
The correct approach is, instead of going through the REST API, you should just do database queries directly from getStaticPaths or getStaticProps. That code never gets run on the client anyways, only server, to it will also be slightly more efficient than doing a useless trip to the API, which then calls the database indirectly. I have a demo that does that here: https://github.com/cirosantilli/node-express-sequelize-nextjs-realworld-example-app/blob/b34c137a9d150466f3e4136b8d1feaa628a71a65/lib/article.ts#L4
export const getStaticPathsArticle: GetStaticPaths = async () => {
return {
fallback: true,
paths: (await sequelize.models.Article.findAll()).map(
article => {
return {
params: {
pid: article.slug,
}
}
}
),
}
}
Note how on that example, both getStaticPaths and getStaticProps (here generalized HoC's for reuse, see also: Module not found: Can't resolve 'fs' in Next.js application ) do direct database queries via sequelize ORM, and don't do any HTTP calls to the external server API.
You should then only do client API calls from the React components on the browser after the initial pages load (i.e. from useEffect et al.), not from getStaticPaths or getStaticProps. BTW, note that as mentioned at: What is the difference between fallback false vs true vs blocking of getStaticPaths with and without revalidate in Next.js SSR/ISR? reducing client calls as much as possible and prerendering on server greatly reduces application complexity.

Riak Java HTTPClientAdapter TCP CLOSE_WAIT

TLDR:
Lots of TCP connections in OPEN_WAIT status shutting down server
Setup:
riak_1.2.0-1_amd64.deb installed on Ubuntu12
Spring MVC 3.2.5
riak-client-1.1.0.jar
Tomcat7.0.51 hosted on Windows Server 2008 R2
JRE6_45
Full Description:
How do I ensure that the Java RiakClient is properly cleaning up it's connections to that I'm not left with an abundance of CLOSE_WAIT tcp connections?
I have a Spring MVC application which uses the Riak java client to connect to the remote instance/cluster.
We are seeing a lot of TCP Connections on the server hosting the Spring MVC application, which continue to build up until the server can no longer connect to anything because there are no ports available.
Restarting the Riak cluster does not clean the connections up.
Restarting the webapp does clean up the extra connections.
We are using the HTTPClientAdapter and REST api.
When connecting to a relational database, I would normally clean up connections by either explicitly calling close on the connection, or by registering the datasource with a pool and transaction manager and then Annotating my Services with #Transactional.
But since using the HTTPClientAdapter, I would have expected this to be more like an HttpClient.
With an HttpClient, I would consume the Response entity, with EntityUtils.consume(...), to ensure that the everything is properly cleaned up.
HTTPClientAdapter does have a shutdown method, and I see it being called in the online examples.
When I traced the method call through to the actual RiakClient, the method is empty.
Also, when I dig through the source code, nowhere in it does it ever close the Stream on the HttpResponse or consume any response entity (as with the standard Apache EntityUtils example).
Here is an example of how the calls are being made.
private RawClient getRiakClientFromUrl(String riakUrl) {
return new HTTPClientAdapter(riakUrl);
}
public IRiakObject fetchRiakObject(String bucket, String key, boolean useCache) {
try {
MethodTimer timer = MethodTimer.start("Fetch Riak Object Operation");
//logger.debug("Fetching Riak Object {}/{}", bucket, key);
RiakResponse riakResponse;
riakResponse = riak.fetch(bucket, key);
if(!riakResponse.hasValue()) {
//logger.debug("Object {}/{} not found in riak data store", bucket, key);
return null;
}
IRiakObject[] riakObjects = riakResponse.getRiakObjects();
if(riakObjects.length > 1) {
String error = "Got multiple riak objects for " + bucket + "/" + key;
logger.error(error);
throw new RuntimeException(error);
}
//logger.debug("{}", timer);
return riakObjects[0];
}
catch(Exception e) {
logger.error("Error fetching " + bucket + "/" + key, e);
throw new RuntimeException(e);
}
}
The only option I can think of, is to create the RiakClient separately from the adapter so I can access the HttpClient and then the ConnectionManager.
I am currently working on switching over to the PBClientAdapter to see if that might help, but for the purposes of this question (and because the rest of the team may not like me switching for whatever reason), let's assume that I must continue to connect over HTTP.
So it's been almost a year, so I thought I would go ahead and post how I solved this problem.
The solution was to change the client implementation we were using to the HTTPClientAdapter provided by the java client, passing in the configuration to implement pools and max connections. Here's some code example of how to do it.
First, we are on an older version of RIAK, so here's the amven dependency:
<dependency>
<groupId>com.basho.riak</groupId>
<artifactId>riak-client</artifactId>
<version>1.1.4</version>
</dependency>
And here's the example:
public RawClient riakClient(){
RiakConfig config = new RiakConfig(riakUrl);
//httpConnectionsTimetolive is in seconds, but timeout is in milliseconds
config.setTimeout(30000);
config.setUrl("http://myriakurl/);
config.setMaxConnections(100);//Or whatever value you need
RiakClient client = new RiakClient(riakConfig);
return new HTTPClientAdapter(client);
}
I actually broke that up a bit in my implementation and used Spring to inject values; I just wanted to show a simplified example for it.
By setting the timeout to something less than the standard five minutes, the system will not hang to the connections for too long (so, 5 minutes + whatever you set the timeout to) which causes the connectiosn to enter the close_wait status sooner.
And of course setting the max connections in the pool prevents the application from opening up 10's of thousands of connections.