We're making a tool for bulk checking the whois records of domains. Is there any possibility to query as much as 1 million of domains per day without be concerned about the quota? All whois servers seem to have a limit of approximately 100 per day.
I know of two ways: use hundreds of IP addresses or use a paid API.
Is there any other way that you know of?
The short answer is no. Easy to believe, otherwise it would not be such hard business to deal with.
And even with hundreds of IPs, you are not safe. Another route is to get ICANN accredited and then ask for accreditation for every TLDs you want to use. That will give you a large access to the whois details for the TLD zones.
An alternative to get "unrestricted" WHOIS data access is to be prepared to pay a lot of money to individual registrars. As ICANN rules dictate:
3.3.6.2 Registrar may charge an annual fee, not to exceed US$10,000, for such bulk access to the data.
3.3.6.1 Registrar shall make a complete electronic copy of the data available at least one (1) time per week for download by third parties who have entered into a bulk access agreement with Registrar.
So to query GoDaddy and the like without restriction, you may well need a hefty sum of cash.
Note that the ICANN rules place (the usual kind of) restrictions of the use of this data so a careful reading is recommended.
Related
I have a MongoDB Atlas cluster that serves many customers. Each customer has its own database on the cluster.
I would like to reduce my application's impact on MongoDB data transfer costs, which have been increasing for the last few days, but the billing info provided by Atlas does not break down prices per database. Therefore, I have no way of knowing which customers are costly and what are the most costly queries in terms of data transfer.
Moreover, using the prices on a daily basis and a few queries, I cannot correlate insertion of resources in my application with prices. For example, let's say my resources are Cats, one day it will cost 5$ of data transfer with 5000 Cats inserted in total in the databases, but the next day, it's going to cost 13$ with 1500 Cats inserted.
Do you know of tools or something in the Atlas dashboard I might've missed that could help me better track costs per customer, or say, a cost per Cat (in my example) so that I build a pricing model for my customers?
Thank you
You are most likely going to need separate projects and deployments.
A MongoDB client instance is generally capable of using any database on the server (subject to authorization rules and APIs provided in the language in question), therefore to get a breakdown of data transfer by database would require the server to track bytes transferred per operation and then aggregate those counts. As far as I know this isn't a feature that currently exists.
The most practical way of tracking this today is probably writing a layer on top of the driver on the client side that would look at data actually received.
I'm looking at reducing DNS lookups on my website and I'm looking to see which DNS records I should keep or remove.
I have no idea how to decide which records aren't needed versus the records that are needed.
I have my settings to redirect all instances of my domain to the NON-www version, with HTTPS active throughout the entire site.
If the Aim is to reduce the DNS lookups then you can achieve that by
Simply increase the TTL (expire) time, it will reduce the lookups drastically, though the downside is that whenever there is any change in the records, the updation will also take that much more time.
The lookups are proportionate to the traffic you get on your webpages, so if you don't see that much traffic and lookups are high, then there is something wrong in it.
About identifying unused DNS records, as a practice you should immediately remove the unused records as soon as you stop your service. Otherwise all records added in your DNS might be getting used. So, you might want to re-think on this.
Nearly all PowerShell code leans heavily on a list of computer names. Maintaining this list is crucial for an up to date data center.
I used to gather computer names from my WSUS boxes. We have now switched to an inferior product which does not give up secrets easily! How are others gathering accurate lists of computer names with which to fuel your code?
Grabbing names from Active Directory works well, however not ALL servers are in the domain. However all servers need updates so let's get a list from the update server. Ahhh, there is the rub, inferior updating software! How is everyone gathering computer names for future processing?
For our datacenters, I always query each of the domain controllers to get a list of systems instead of relying on csv files (we no longer keep anything unjoined).
we used to have stand alone boxes, when we did I would simply get their IPs or netbios names by using the VMWare Power CLI to query that info from vCenter.
Alternatively, if you only have VMs, you can just go that route directly.
If you have stand-alone physical machines, then you would want to scan all of your IP range (after converting items you got from the previous two steps into IPs, so that you can exclude all IPs that are already accounted for by replies to netbios (to check the OS reported to make sure they are Windows systems)).
Alternatively, if all of the servers are in DNS, then just query the DNS entries from your domain controller, slap a parallel computer-exists check on the results, select unique where result is good and that is your list.
I'm not sure where you're getting this idea that people are composing CSV lists of machines for data centers. Most data centers I'm aware of will have virtualization stacks and have complex monitoring software such as the various flavors of SolarWinds products. Usually these will allow you to export a customized CSV of the machines they monitor. Of course, like anything else, there are quite a few of these monitoring services out there.
For small and mid-sized companies, admins may or may not maintain an inventory of company assets. I've been contracted to come out to companies who have let their asset management system slip and have no idea where everything is, and I usually use a combination of command line tools such as nmap, fping, and the PSADUC module to do some network discovery over several site days.
PowerShell isn't a great tool for data center use anyway, since most infrastructure uses some sort of Linux to host their virtual machines. Increasingly, the industry has moved towards containerization and microservices as well, which make maintaining boxes less and less relevant.
I really want to understand the NoSQL approach, but some aspects baffle me. And the most readily prominent docs don't seem to address them (that I've found, so far).
For example, I'm looking at the CouchDB website...
Self-Contained Data
An invoice contains all the pertinent information about a single transaction the seller, the buyer, the date, and a list of the items or services sold. [...] Self-contained documents, there’s no abstract reference on this piece of paper that points to some other piece of paper with the seller’s name and address. Accountants appreciate the simplicity of having everything in one place. And given the choice, programmers appreciate that, too.
By "one abstract reference" I think they mean an FK, right? And in an analogous SQL DB the "some other piece of paper" would be a row in a sellers table?
Ok, but what happens when it turns out someone messed up and the seller's address is actually on Maple Avenue, not Maple Lane And you have 96,487 invoices with that say Maple Lane.
What is the orthodox NoSQL way of dealing with that inevitability?
Do you scan your 4.8 million invoice "documents" for the 96k with "Lane", dredge them up, and execute 96k writes?
And if so, in this described CouchDB-based app, WHO goes in and performs that? Because, guessing here, but I imagine your front end probably doesn't have a view with a Seller form. Because your sellers are all embedded inside invoices, right? So in NoSQL, does this sort of data correction & maintenance become the DBA's job?
(Also, do you actually repeat all of the seller's info on every single invoice involving that seller? Doesn't that get expensive?)
And in a huge, busy system, how do you ensure that all that repeated seller data is correct and consistent?
I'm considering which storage technology to look at for a series of upcoming projects. NoSQL is obviously extremely popular and widely adopted. In some domains it's kind of the "Golden Path"/default choice. If I want to use PostgreSQL with Node.js I'll have to scrounge for info about less popular libraries and support.
So there's significant real-world pressure towards MongoDB, CouchDB, etc.
Yet in the systems I'm designing, the questions I mention above are going to really matter. Is there a proven, established, and practical way of addressing these concerns?
What is the orthodox NoSQL way of dealing with that inevitability?
Two possible approaches:
Essentially the same as the pre-SQL (i.e. paper filing cabinets) way:
Update the master file for the customer.
Use the new address on all new invoices.
Historical invoices will continue to have wrong data. But that's okay, and arguably even better than the RDBMS way, because it accurately reflects history.
Go to the extra work of updating all the affected documents. With properly built indexes or views, this isn't that hard (you won't have to scan all 4.8 million invoices--your view will direct you to the 18 actually affected by the change)
I imagine your front end probably doesn't have a view with a Seller form.
Why not? If you do seller-based queries, I sure hope you have a seller-based view (or several).
Because your sellers are all embedded inside invoices, right?
That's irrelevant. Views can index any part of the data.
do you actually repeat all of the the seller's info on every single invoice involving that seller?
Of course. You would repeat it every time you print an invoice on paper, right? Your database document is a "document", same as a printed invoice is.
Doesn't that get expensive?
If you're storing your entire database on a mobile phone, maybe. Otherwise, hard drives are cheap these days.
Yet in the systems I'm designing, the questions I mention above are going to really matter.
NoSQL isn't right for every job. If transactional integrity is important (and it likely is for a financial app like the one you seem to be discussing), it likely is not the right tool.
Think of CouchDB as a sync protocol with a database tacked on for good luck.
If your core feature is the ability to sync, then CouchDB is probably a good fit. If that's not a feature core to your application, then it's probably the wrong tool for the job.
I'm thinking of creating a multi-tenant app using MongoDB. I don't have any guesses in terms of how many tenants I'd have yet, but I would like to be able to scale into the thousands.
I can think of three strategies:
All tenants in the same collection, using tenant-specific fields for security
1 Collection per tenant in a single shared DB
1 Database per tenant
The voice in my head is suggesting that I go with option 2.
Thoughts and implications, anyone?
I have the same problem to solve and also considering variants.
As I have years of experience creating SaaS multi-tenant applicatios I also was going to select the second option based on my previous experience with the relational databases.
While making my research I found this article on mongodb support site (way back added since it's gone):
https://web.archive.org/web/20140812091703/http://support.mongohq.com/use-cases/multi-tenant.html
The guys stated to avoid 2nd options at any cost, which as I understand is not particularly specific to mongodb. My impression is that this is applicable for most of the NoSQL dbs I researched (CoachDB, Cassandra, CouchBase Server, etc.) due to the specifics of the database design.
Collections (or buckets or however they call it in different DBs) are not the same thing as security schemas in RDBMS despite they behave as container for documents they are useless for applying good tenant separation. I couldn't find NoSQL database that can apply security restrictions based on collections.
Of course you can use mongodb role based security to restrict the access on database/server level. (http://docs.mongodb.org/manual/core/authorization/)
I would recommend 1st option when:
You have enough time and resources to deal with the complexity of the
design, implementation and testing of this scenario.
If you are not going to have much differences in structure and
functionality in the database for different tenants.
Your application design will allow tenants to make only minimal
customizations at runtime.
If you want to optimize space and minimize usage of hardware
resources.
If you are going to have thousands of tenants.
If you want to scale out fast and at good cost.
If you are NOT going to backup data based on tenants (keep separate
backups for each tenant). It is possible to do that even in this
scenario but the effort will be huge.
I would go for variant 3 if:
You are going to have small list of tenants (several hundred).
The specifics of the business requires you to be able to support big differences in the database structure for different tenants (e.g. integration with 3rd-party systems, import-export of data).
Your application design will allow customers (tenants) to make significant changes in the application runtime (adding modules, customizing the fields etc.).
If you have enough resources to scale out with new hardware nodes quickly.
If you are required to keep versions/backups of data per tenant. Also the restore will be easy.
There are legal/regulatory restrictions that forces you to keep different tenants in different databases (even data centers).
If you want to fully utilize the out-of-the-box security features of mongodb such as roles.
There are big differences in matter of size between tenants (you have many small tenants and few very large tenants).
If you post additional details about your application, perhaps I can give you more detailed advice.
I found a good answer in the comments in this link:
http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/
Basically option #2 seems to be the best way to go.
Quote from David Mytton's comment:
We decided not to have a database per
customer because of the way MongoDB
allocates its data files. Each
database uses it’s own set of files:
The first file for a database is
dbname.0, then dbname.1, etc. dbname.0
will be 64MB, dbname.1 128MB, etc., up
to 2GB. Once the files reach 2GB in
size, each successive file is also
2GB.
Thus if the last datafile present is
say, 1GB, that file might be 90% empty
if it was recently reached.
from the manual.
As users sign up to the trial and give
things a go, we’d get more and more
databases that were at least 2GB in
size, even if the whole of the data
file wasn’t use. We found this used a
massive amount of disk space compared
to having several databases for all
customers where the disk space can be
used to maximum efficiency.
Sharding will be on a per collection
basis as standard which presents a
problem where the collection never
reaches the minimum size to start
sharding, as is the case for quite a
few of ours (e.g. collections just
storing user login details). However,
we have requested that this will also
be able to be done on a per database
level. See
http://jira.mongodb.org/browse/SHARDING-41
There are no performance tradeoffs
using lots of collections. See
http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections
There is a reasonable article on MSDN about multi-tenant data architecture which you might wish to refer to. Some key topics touched on by this article:
Economic considerations
Security
Tenant considerations
Regulatory (legal)
Skill set concerns
Also touched upon are some patterns for Software as a Service (SaaS) configuration.
Additionally, worth a gander is an interesting write-up from the SQL Anywhere guys.
My own personal take - unless you are certain of enforced security / trust, I would go with option 3, or if scalability concerns prohibit fallback to option 2 at a minimum. That said... I'm no pro with MongoDB. I get pretty nervous using a shared "schema" - but I will happily defer to more experienced practitioners.
I would go for option 2.
However you could set mongod.exe command line option --smallfiles. This means that the biggest file size of an extent will be 0.5 gigabyte and not 2 gigabyte. I tested this with mongo 1.42 . So option 3 is not impossible.
According to my research in MongoDB. Trucos y consejos. Aplicaciones multitenant.
that option is not recommended if you do not know how many tenants you can have, it could be thousands and it would be complicated when it comes to sharding, also imagine having thousands of collections in a single database ... So in your case it is recommended to use option one. Now if you are going to have a limited number of users, it is already different and yes, you could use option two as you thought.
While the discussion here is on NoSQL and primarily MongoDB, we at Citus are using PostgreSQL and building a distributed/sharded multi-tenant database.
Our use-case guide walks through an example app, covering the schema and various multi-tenant specific features.
For more unstructured data, we use PostgreSQL's JSONB column to store such and tenant-specific data.