DigitalOcean Droplet Inbound / Outbound (What is what?) - server

I am still new to DigitalOcean and Ubuntu servers, so I hope one of you can explain to me the difference between Inbound and Outbound traffic. I tried reading through DigitalOcean's information, but it still doesn't inform exactly what i can count as inbound and outbound.
As far as i can see, it is only the outbound traffic that is being billed. So my question is, what is counted as outbound traffic?
Is it outbound traffic when i log into my server and run "npm install" while creating docker containers, or is that inbound?
Is it outbound or inbound when i run "git clone"?
I hope one of you can give me an explanation of what is categorised as inbound and outbound.

First off, you're highly unlikely to hit your bandwidth limit on DigitalOcean even if you are on the lowest of the low end droplet with no other bandwidth-boosting services.
That said, there are four types of traffic on DigitalOcean when it comes to droplets: outbound local, inbound local, outbound remote, and inbound remote. Any transaction will do some of both, but likely will be skewed in the direction of one or the other.
Case in point: I have a droplet that runs a Neo4j database server, and another droplet that queries that server with Neo4j (graph database) drivers. When my client server makes a request to the database, it sends a database query through the driver, which is a small outbound remote from the client POV (the servers are in separate projects, so it's not "internal traffic" like if they were using internal IPs in the same project) and a small inbound remote for the database server. The actual substantive data transfer is the query response, which in the case of my workloads tends to be a few hundred MB at minimum, which shows up as remote outbound for the database server and remote inbound for the client.
To sum up: any transfer outside of DigitalOcean resources in the same virtual network counts as outgoing when you send data (ev like curl or wget) and a likely much larger inbound when you receive it. Vice versa if you're responding to requests with a lot of data.

Related

How to Confirm PostgreSQL on Ubuntu VM is communicating with External Server for Updates

I have an Ubuntu VM installed on a client's VMware system. Recently, the client's IT informed us that his firewall has been detecting consistent potential port scans to our VM's internal IP address (coming from 87.238.57.227). He asked if this was part of a known package update process on our VM.
He sent us a firewall output where we can see several instances of the port scan, but there are also instances of our Ubuntu VM trying to communicate back to the external server on port 37258 (this is dropped by the firewall).
Based on a google lookup, the hostname of the external IP address is "feris.postgresql.org", with the ASN pointing to a European company called Redpill-Linpro. As far as I can tell, they offer IT consulting services, specializing in open source software (like PostgreSQL, which is installed on our VM). I have never heard of them before though and have no idea why our VM would be communicating with them or vice-versa. I'm also not sure if I'm interpreting the IP lookup information correctly: https://ipinfo.io/87.238.57.227
I'm looking for a way to confirm or disprove that this is just our VM pinging for a standard postgres update. If that's the case I'd like to restrict this behaviour. We would prefer to do these types of updates manually and limit the communication outside of the VM to what is strictly necessary for the functionality of our application.
Update
I sent an email to Redpill's abuse account. They responded quickly saying that the server should not be port scanning anyone and if it appears that way, something is wrong.
The server is part of a cluster of machines that serves apt.postgresql.org among other postgres download sites. I don't think we have anything like ansible or puppet installed that would automatically check for updates but I will look into that to make sure. I'm wondering if Ubuntu reaching out to update the MOTD with the number of available packages would explain why our VM is trying to reach out to the external postgres server?
The abuse rep said in any case there should only be outgoing connections from the VM, not incoming. He asked for some additional info so I will keep communicating with him and try to update this post accordingly
My communication with the client's IT dropped off so I did not get a definitive answer on this, but I'll provide some new details:
I reached out to the abuse email for Redpill-Linpro. He got back to me and confirmed the server corresponding to the detected IP address is part of a cluster that hosts postgres download sites, including apt.postgresql.org. He was surprised to learn we had detected a port scan from their server and seems eager to figure out why that is happening.
He asked if the client IT could pass along some necessary info for them to set up tracking on that server. But the client IT never got back to me. I think he was satisfied that it wasn't malicious and stopped pursuing it.
Here's one of the messages the abuse rep sent me that may be relevant:
That does look a lot like the tcp to the apt download server yes. It's
strange that your firewall reports that many incoming connections, but
they could be fallout from some connection tracking that's not
operating as intended. The timing appears to be matching up more or
less perfectly. And there should definitely not be any ping-back
connections from it.
Since you appear to be using the http version of the server (and not https) bringing the data in cleartext, they should be able to just
dump the TCP connection contents and verify exactly what it does. But
I bet they are going to see a number of http requests initiated by the
apt client that is checking for updates.

How to keep track of the number of clients that are connecting to server

I'm building a software agent that run on a server, this software agent act as a server manager i.e. starting/stoping Docker container, monitoring etc.
This server will host/serve many services, these services are programs running in Docker container, 1 program/service per container.
There may be so many servers and these servers aren't necessary be a high performance server, they ranges from a small VM to high performance computer. Right now, I assume that every service uses HTTP to serve request.
The function that I want to implement in this software agent is tracking the number of clients that are currently connecting (requesting) to server for every service (e.x. server A is processing 500 requests) or specific program is ok (e.x. program A is processing 100 requests, program B is processing 200 request).
I want to know this number because I want to do workload balancing across servers that host the same service.
The following is ideas that I have.
Implementing load balancer/reverse proxy inside this agent (I would use this load balancer https://github.com/nwoodthorpe/Load-Balancer-Golang). This may be the last choice because I think it will use pretty much resources for load balancing.
Letting service programs that are running on server tell agent whenever they start and finish processing request. I simply implement UDP socket server in agent to listen for a datagram that tell unique ID of request (actually, can be anything that help me distinguish specific request that being processed) and status whether is being processed or finish processing.
So, I would like to ask for a suggestion for above approaches, which one is better or how should I implement it? Is there any better approach to do this?

Alt-N Mdaemon mail server on google compute engine

I have a VM instance on Google Compute Engine, which is running Windows Server 2012 R2. I have my Apache web server, PHP, MySQL, FTP, and various other things running great, easily accessible from the world. I installed MDaemon Messaging Server (Alt-N's email server), which I had on my old physical box for years. I am able to use port 110 just fine, but I simply cannot get SMTP to work. Yes, I'm well aware of (Compute Engine's Blocked SMTP Ports). Knowing these blocked google ports, I would like to choose port 2525 as Mdaemon's SMTP port. I added firewall rules on the server to allow it, and I added the Compute Engine Network port exceptions as well, and of course changed the Mdaemon's server settings for 2525. I still cannot send email. Hell, I even tried port 2626 and nadda.
I understand I could sign up through google's recommended "sendgrid" that would force me to use mail.sendgrid.com, port 2525, and have a maximum of 25,000 per month limit (on free sendgrid account), but I personally think it's ridiculous needing to sign up and have limits on email when I paid $2,400 for Mdaemon email server. I should be able to use my own domain's mail.mydomain.com and authenticate through MY email server, not sendgrid's.
Is there something simple I'm missing to be able to use my own email server software on my google compute engine VM instance? Or is it just fact that this is the right I give up by choosing google's cloud server services?
SMTP server, in your case MDaemon, sends all outbound emails directly to the recipient's mail servers on their inbound SMTP port which is 25. This is the port you will need to configure on your MDaemon's SMTP outbound port setting. So changing it to 2525 or 2626 won't help because recipient's SMTP servers usually do not listen on those ports.
As you also mentioned all outgoing traffic to port 25 (SMTP) is blocked on Compute Engine. Therefore you'll need to configure a smart host for your MDaemon message routing which listens on a non-blocked port. This is something like using a third party service (e.g. SendGrid).

What does it mean to connect to a certain port?

For example, when you make an ssh connection, you are connected to port 22. What happens then? On a very high level brief overview, I know that if port 22 is open on the other end and if you can authenticate to it as a certain user, then you get a shell on that machine.
But I don't understand how ports tie into this model of services and connections to different services from remote machines? Why is there a need for so many specific ports running specific services? And what exactly happens when you try to connect to a port?
I hope this question isn't too confusing due to my naive understanding. Thanks.
Imagine your server as a house with 65536 doors. If you want to visit family "HTTP", you go to door 80. If you were to visit family "SMTP", you would visit door no. 25.
Technically, a port is just one of multiple possible endpoints for outgoing/incomming connections. Many of the port numbers are assigned to certain services by convention.
Opening/establishing a connection means (when the transport protocol is TCP, which are most of the “classical” services like HTTP, SMTP, etc.) that you are performing a TCP handshake. With UDP (used for things like streaming and VoIP), there's no handshake.
Unless you want to understand the deeper voodoo of IP networks, you could just say, that's about it. Nothing overly special.
TCP-IP ports on your machine are essentially a mechanism to get messages to the right endpoints.
Each of the possible 65536 ports (16 total bits) fall under certain categories as designated by the Internet Assigned Numbers Authority (IANA).
But I don't understand how ports tie into this model of services and
connections to different services from remote machines? Why is there a
need for so many specific ports running specific services?
...
And what exactly happens when you try to connect to a port?
Think of it this way: How many applications on your computer communicate with other machines? Web browser, e-mail client, SSH client, online games, etc. Not to mention all of the stuff running under the hood.
Now think: how many physical ports do you have on your machine? Most desktop machines have one. Occasionally two or three. If a single application had to take complete control over your network interface nothing else would be able to use it! So TCP ports are a way of turning 1 connection into 65536 connections.
For example, when you make an ssh connection, you are connected to
port 22. What happens then?
Think of it like sending a package. Your SSH client in front of you needs to send information to a process running on the other machine. So you supply the destination address in the form of "user#[ip or hostname]" (so that it knows which machine on the network to send it to), and "port 22" (so it gets to the right application running on the machine). Your application then packs up a TCP parcel and stamps a destination and a return address and sends it to the network.
The network finds the destination computer and delivers the package. So now it's at the right machine, but it still needs to get to the right application. What do you think would happen if your SSH packet got delivered to an e-mail client? That's what the port number is for. It effectively tells your computer's local TCP mailman where to make the final delivery. Then the application does whatever it needs to with the data (such as verify authentication) and sends a response packet using your machine's return address. The back and forth continues as long as the connection is active.
Hope that helps. :)
The port is meant to allow applications on TCP/IP to exchange data. Each machine on the internet has one single address which is its IP. The port allows different applications on one machine to send and receive data with multiple servers on the network/internet. Common application like ftp and http servers communicate on default ports like 21 and 80 unless network administrators change those default ports for security reasons

Is a server farm abstracted on both sides?

I am trying to understand how a solution will behave if deployed in a server farm. We have a Java web application which will talk to an FTP server for file uploads and downloads.
It is also desirable to protect the FTP server with a firewall, such that it will allow incoming traffic only from the web server.
AT the moment since we do not have a server farm, all requests to the FTP server come from the same IP (web server IP) making it possible to add a simple rule in the firewall. However, if the application is moved to a server farm, then I do not know which machine in the farm will make a request to the FTP server.
Just like the farm is hidden behind a facade for it's clients, is it hidden behind a facade for the services it might invoke, so that regardless of which machine from the farm makes the request to the FTP server, it always sees the same IP?
Are all server farms implemented the same way, or would this behavior depend on the type of server farm? I am thinking of using Amazon Elastic CLoud.
It depends very much on how your web cluster is configured. If your cluster is behind a NAT firewall, then yes, all outgoing connections will appear to come from the same address. Otherwise, the IP addresses will be different, but they'll almost certainly all be in a fairly small range of addresses, and you should be able to add that range to the firewall's exclude list, or even just list the IP address of each machine individually.
Usually you can enter cnames or subnets when setting up firewall rules, which would simplify the maintenance of them. You can also send all traffic through a load balancer or proxy. Thats essentially how any cloud/cluster/farm service works.
many client ips <-> load balancer <-> many servers