I'm looking at reducing DNS lookups on my website and I'm looking to see which DNS records I should keep or remove.
I have no idea how to decide which records aren't needed versus the records that are needed.
I have my settings to redirect all instances of my domain to the NON-www version, with HTTPS active throughout the entire site.
If the Aim is to reduce the DNS lookups then you can achieve that by
Simply increase the TTL (expire) time, it will reduce the lookups drastically, though the downside is that whenever there is any change in the records, the updation will also take that much more time.
The lookups are proportionate to the traffic you get on your webpages, so if you don't see that much traffic and lookups are high, then there is something wrong in it.
About identifying unused DNS records, as a practice you should immediately remove the unused records as soon as you stop your service. Otherwise all records added in your DNS might be getting used. So, you might want to re-think on this.
Related
I'm looking at increasing the number of days historical events that are stored in the Tableau Server database from the default 183 to +365 days and I'm trying to understand what the performance impact to Tableau Server itself would be since the database and backup sizes also start increasing. Would it cause the overall Tableau Server running 2019.1.1 over time to slow to a crawl or begin to have a noticeable impact with respect to performance?
I think the answer here depends on some unknowns and makes it pretty subjective:
How much empty space is on your PostGres node.
How many events typically occur on your server in a 6-12 month period.
Maybe more importantly than a yes or a no (which should be taken with a grain of salt) would be things to consider prior to making the change.
Have you found value in the 183 default days? Is it worth the risk adding 365? It sounds like you might be doing some high level auditing, and a longer period is becoming a requirement. If that's the case, the answer is that you have no choice but to go ahead with the change. See steps below.
Make sure that the change is made in a Non-prod environment first. Ideally one with high traffic. Even though you will not get an exact replica - it would certainly be worth the practice in switching it over. Also you want to make sure that Non-prod and prod environments match exactly.
Make sure the change is very well documented. For example, if you were to change departments without anyone having knowledge of a non-standard config setting, it might make for a difficult situation if ever Support is needed or if there is a question as to what might be causing slow behavior.
Things to consider after a change:
Monitor the size of backups.
Monitor the size of the historical table(s) (See Data-Dictionary for table names if not already known.)
Be ready to roll back the config change if the above starts to inflate.
Overall:
I have not personally seen much troubleshooting value from these tables being over a certain number of days (ie: if there is trouble on the server it is usually investigated immediately and not referenced to 365+ days ago.) Perhaps the value for you lies in determining the amount of usage/expansion on Tableau Server.
I have not seen this table get so large that it brings down a server or slows it down. Especially if the server is sized appropriately.
If you're regularly/heavily working with and examining PostGres data, it may be wise to extract at a low traffic time of the day. This will prevent excess usage from outside sources during peak times. Remember that adhoc querying of PostGres is Technically unsupported. This leads to awkward situations if things go awry.
I have created Pagination in Go using Page Number & Limit. Where Limit & Page Number are INT
and I created pagination like :
MONGO_SESSION.Find(nil).Skip(pageNumber*limit).Limit(limit).Sort("_id").All(&RETURN_STRUCT)
It's working fine. But when I send page number or limit ZERO. By default mongo DB return all records because nothing to skip and limit.
So my question is that what is good practice in case of zero limit and
zero page number.
Practice 1: Send all data. Don't send error response.
Practice 2: Send error response saying "Page number and limit can't be
zero"
Note: I can't hardcode limit or page number.
Any suggestion would be appreciated.
This question is somewhat opinion-based (and therefore somewhat off-topic for StackOverflow), but I think some advice or general practice may be helpful and useful for others. So therefore the following of the answer is my opinion.
You as the developer of the server application are responsible for the safety and security of the server, and for the utilization of the server's resources. You as the developer should trust clients to the least extent necessary.
That said, sending all documents when client fails to specify the limit (either accidentally or purposefully) is the worst way to handle the situation. It's like screaming out: "Hey, clients and hackers, here's an endpoint, and if you want to DoS attack my server, just call this endpoint a couple of times".
To protect your server, you should have a safety limit even for the "limit" parameter, because allowing any value for it may be just as bad: just because you force clients to specify a limit doesn't protect your server, "bad" clients may just as well send a limit like 1e9 which will most likely include all your documents.
My advice is to always have meaningful defaults and safety limits. Default values should always be documented, safety limits are not so important (but could be documented as well).
So how you should handle it is:
If limit is missing, apply the defaults. If you can't have defaults, just skip this step (although there must be a good reason not to have / allow a default).
Limit should be checked against a safety limit. If it exceeds the safety value, use the safety limit (max allowed limit).
If the server has the "right" to alter the requested limit (e.g. using a default when limit is missing, or capping the limit based on a safety limit), the server should communicate back to the client the limit that was actually used to serve the request.
And regarding efficient MongoDB paging: I only recommend using Query.Skip() and Query.Limit() for "small" document counts. For an efficient paging that "scales" with the number of documents, check out this question+answer: Efficient paging in MongoDB using mgo
I believe, that paging should be used only for cases when the size of the collection is big (otherwise just show all the data at once and don't fiddle with paging at all).
However, if the collection is reasonably big, then sending all the data is a bad idea.
There is an additional issue with "skip" statement (it's not unique to mongo though):
in order to skip N records, the db has to do a full scan (full collection scan in case of mongo), so it will take more time to fetch results for page N + 1 than for page N.
Now, in order to deal with it, there is a "trick":
Don't work with skip at all, instead "memorize" the last document id (its already indexed anyway and you sort by _id anyway).
Then the queries will be (pseudocode since I don't speak 'Go'):
For the first query:
Find().sort(_id).limit(limitSize)
For subsequent queries:
Find ().where(_id > lastMemorizedId).sort(_id).limit(limitSize)
I also faced the same problem. I preferred to send Error Response instead of showing all the data.
Because It's a heavy transaction for DB if DB has to send all data. On small collections, it's work fine but for the large collections, it's gonna hang DB.
So send an error response.
I am working on a front end system for a radius server.
The radius server will pass updates to the system every 180 seconds. Which means if I have about 15,000 clients that would be around 7,200,000 entries per day...Which is a lot.
I am trying to understand what the best possible way to store and retrieve this data will be. Obviously as time goes on, this will become substantial. Will MongoDB handle this? Typical document is not much, something this
{
id: 1
radiusId: uniqueId
start: 2017-01-01 14:23:23
upload: 102323
download: 1231556
}
However, there will be MANY of these records. I guess this is something similar to the way that SNMP NMS servers handle data which as far as I know they use RRD to do this.
Currently in my testing I just push every document into a single collection. So I am asking,
A) Is Mongo the right tool for the job and
B) Is there a better/more preferred/more optimal way to store the data
EDIT:
OK, so just incase someone comes across this and needs some help.
I ran it for a while in mongo, I was really not satisfied with performance. We can chalk this up to the hardware I was running on, perhaps my level of knowledge or the framework I was using. However I found a solution that works very well for me. InfluxDB pretty much handles all of this right out of the box, its a time series database which is effectively the data I am trying to store (https://github.com/influxdata/influxdb). Performance for me has been like night & day. Again, could all be my fault, just updating this.
EDIT 2:
So after a while I think I figured out why I never got the performance I was after with Mongo. I am using sailsjs as framework and it was searching by id using regex, which obviously has a huge performance hit. I will eventually try migrate back to Mongo instead of influx and see if its better.
15,000 clients updating every 180 seconds = ~83 insertions / sec. That's not a huge load even for a moderately sized DB server, especially given the very small size of the records you're inserting.
I think MongoDB will do fine with that load (also, to be honest, almost any modern SQL DB would probably be able to keep up as well). IMHO, the key points to consider are these:
Hardware: make sure you have enough RAM. This will primarily depend on how many indexes you define, and how many queries you're doing. If this is primarily a log that will rarely be read, then you won't need much RAM for your working set (although you'll need enough for your indexes). But if you're also running queries then you'll need much more resources
If you are running extensive queries, consider setting up a replica set. That way, your master server can be reserved for writing data, ensuring reliability, while your slaves can be configured to serve your queries without affecting the write reliability.
Regarding the data structure, I think that's fine, but it'll really depend on what type of queries you wish to run against it. For example, if most queries use the radiusId to reference another table and pull in a bunch of data for each record, then you might want to consider denormalizing some of that data. But again, that really depends on the queries you run.
If you're really concerned about managing the write load reliably, consider using the Mongo front-end only to manage the writes, and then dumping the data to a data warehouse backend to run queries on. You can partially do this by running a replica set like I mentioned above, but the disadvantage of a replica set is that you can't restructure the data. The data in each member of the replica set is exactly the same (hence the name, replica set :-) Oftentimes, the best structure for writing data (normalized, small records) isn't the best structure for reading data (denormalized, large records with all the info and joins you need already done). If you're running a bunch of complex queries referencing a bunch of other tables, using a true data warehouse for the querying part might be better.
As your write load increases, you may consider sharding. I'm assuming the RadiusId points to each specific server among a pool of Radius servers. You could potentially shard on that key, which would split the writes based on which server is sending the data. Thus, as you increase your radius servers, you can increase your mongo servers proportionally to maintain write reliability. However, I don't think you need to do this right away as I bet one reasonably provisioned server should be able to manage the load you've specified.
Anyway, those are my preliminary suggestions.
I'm doing some user analytics tracking with mongo. I'm averaging about 200 updates a second to documents (around 400k) based on a users email address. There are 3 shards split along email alphabetically. It works pretty well except for the daily user state change scripts. It bursts the requests to about 6k per second.
This causes a tail spin effect where it overloads the mongo queue and it never seems to catch up again. Scripts fail, bosses get angry, etc. They also won't allow the scripts to be throttled. Since they are update operations and not insert they are not able to be submitted in bulk. What I see for me options are.
1:) Finding a way to allocate a large queue to mongo so it can wait for low points and get the data updated
2:) Writing a custom throttling solution
3:) Finding a more efficient indexing strategy (currently just indexing the email address)
Pretty much anything is on the table.
Any help is greatly appreciated
We're making a tool for bulk checking the whois records of domains. Is there any possibility to query as much as 1 million of domains per day without be concerned about the quota? All whois servers seem to have a limit of approximately 100 per day.
I know of two ways: use hundreds of IP addresses or use a paid API.
Is there any other way that you know of?
The short answer is no. Easy to believe, otherwise it would not be such hard business to deal with.
And even with hundreds of IPs, you are not safe. Another route is to get ICANN accredited and then ask for accreditation for every TLDs you want to use. That will give you a large access to the whois details for the TLD zones.
An alternative to get "unrestricted" WHOIS data access is to be prepared to pay a lot of money to individual registrars. As ICANN rules dictate:
3.3.6.2 Registrar may charge an annual fee, not to exceed US$10,000, for such bulk access to the data.
3.3.6.1 Registrar shall make a complete electronic copy of the data available at least one (1) time per week for download by third parties who have entered into a bulk access agreement with Registrar.
So to query GoDaddy and the like without restriction, you may well need a hefty sum of cash.
Note that the ICANN rules place (the usual kind of) restrictions of the use of this data so a careful reading is recommended.