Openstreetmap's limitation in usage policy - openstreetmap

I am using openstreetmap's Nominatim service in order to add a map in my website, on which the users can select their location. But there are some limitations in their usage policy and more specifically they require no heavy uses (an absolute maximum of 1 request per second).
Is there any way that I can prioritize the requests that will be sent on the same second or add them in some kind of queue, so that no request is lost?
Thanks in advance

You'd have to build that yourself, using some sort of FIFO queue receiving requests to be sent to OSM and then firing them off one every 1.5 seconds or so (bit slower than the limit just to be sure).
Of course that means whatever is calling your method that does the calls to OSM has be be able to work with potentially long delays in getting its results.

Nominatim is open source. Apart from OSM's Nominatim instance there are other third-party instances available with different usage limits. And of course you can install your own Nominatim instance which won't have any usage limitations.

Related

How to overcome API/Websocket limitations with OHCL data for trading platform for lots of users?

I'm using CCXT for some API REST calls for information and websockets. It's OK for 1 user, if I wanted to have many users using the platform, How would I go about an inhouse solution?
Currently each chart is either using websockets or rest calls, if I have 20 charts then thats 20 calls, if I increase users, then thats 20x whatever users. If I get a complete coin list with realtime prices from 1 exchange, then that just slows everything down.
Some ideas I have thought about so far are:
Use proxies with REST/Websockets
Use timescale DB to store the data and serve that OR
Use caching on the server, and serve that to the users
Would this be a solution? There's got to be a way to over come rate limiting & reducing the amount of calls to the exchanges.
Probably, it's good to think about having separated layers to:
receive market data (a single connection that broadcast data to OHLC processors)
process OHLC histograms (subscribe to internal market data)
serve histogram data (subscribe to processed data)
The market data stream is huge, and if you think about these layers independently, it will make it easy to scale and even decouple the components later if necessary.
With timescale, you can build materialized views that will easily access and retrieve the information. Every materialized view can set a continuous aggregate policy based on the interval of the histograms.
Fetching all data all the time for all the users is not a good idea.
Pagination can help bring the visible histograms first and limit the query results to avoid heavy IO in the server with big chunks of memory.

Is it possible to run rrdtool serverside?

Have searched, but not found an answer.
Presently running RRDTool at the same processor which is collecting the information, making rrd-files and related graphic output at that processor.
Is it also possible to run RRDTool at a server for graphic output, applying rrd-files being uploaded?
Yes; at least to some extent. You need to run rrdcached on your backend server; then, your collector and graphing servers can make remote calls to obtain or store the data.
How you tune rrdcached depends on the amount of data and frequency of writes, and how much you can afford to lose in the even of a server crash; however generally a 30min cache works. This also greatly decreases the amount of disk IO required.
Note that some rrdtool functions do not work exactly the same via rrdcached; check the documentation for more details.
Read about rrdcached here: https://oss.oetiker.ch/rrdtool/doc/rrdcached.en.html

What is the optimal way to do server side paging in expressjs with mongoose

I'm currently doing a project with my own MEAN stack.
Now in a new project I'm creating I've got a collection that I'm paging with Express on serverside, returning the page size every time (e.g 10 results out of the total 2000) and the total rows found for the query the user preformed (e.g 193 for UserID 3).
Although this works fine, I'm afraid that this will create an enormous load on the server since a user can easily pull 50-60 pages a session with 10, 20, 50 or even 100 results each.
My question to you guys is: if I have say 1000 concurrent users paging every few seconds like this, will MongoDB be able to cope with this? If not, what might be my alternatives here?
Also is there anyway I can simulate such concurrent read tests on my app/MongoDB?
Please take in account that I must do server side paging because the app will be quite dynamic and information can change very often.
If you're planning on only using a single webserver, you could cache the result set belonging to a certain page in memory. If you're planning on using multiple webservers, caching in-memory would lead to different result sets across servers, so in that case I'd recommend storing your cache either in MongoDB or in Redis.
A certain result set would be stored under a certain key in your cache. Your key would probably be composed of something like entityName + filterOptions + offset + resultsLimit. So for example you're loading movies with title=titanic, skipping the first 100, so offset=100 and loading only 50 per page so limit=50, which would all be concatenated into a single key.
When a request comes in, you would first try to load the result set from the cache. If the result set is inside the cache, you'll return that to the client. If it's not in the cache, you'd query the database for the latest result set, put that in the cache and return it to the client.
Whether or not you could pull it off with 1000 concurrent users depends a lot on your hardware, the data you are loading, how you're loading it and the efficiency of your implementation. There's one way to find out, and that's testing.
Of course by using the asynchronous capabilities of Node.js you can achieve the best scalability, so every call that can be executed async, such as database calls, should definitely be executed asynchronously.
You could load test your application for free from your local computer using Apache JMeter or let it be tested using for example Azure.

Increase maximum memory limit in own install of OpenStreetMap's Overpass API

For my specific purpose, I need to alter Overpass API's definition of an area to include all buildings, whether or not they have names (so is_in() will return these buildings when requested).
To achieve this, I've installed a local copy of the API with 3 specific countries and have modified the os3m script that generates areas to suite my definition -- so far, so good.
Obviously this script will require more memory than the default one as it is handling a lot more ways. The machine I'm running on has 16GB of RAM. If I specify 2GB or less for the script (i.e. element-limit="2073741824") then it will run out of memory, but specifying any more (even by 100MB) will always result in the error:
Dispatcher_Client::request_read_and_idx::timeout
after just a few seconds.
The question is, how I can tell the Overpass API/dispatchers that using more than 2GB is perfectly fine, and in fact, allow it to allocate up to ~15GB for this query?
You could try to increase both values for total_available_space in settings.cc (currently at 4GB) and recompile Overpass API from source.
AFAIK nobody has tried to process a huge number of buildings via areas before, so be prepared for further issues. The Overpass API developer list may be a good place to discuss this, also to get Roland's attention (Overpass API developer/maintainer), as he's not around on stackoverflow.

Incrementing hundreds of counters at once, redis or mongodb?

Background/Intent:
So I'm going to create an event tracker from scratch and have a couple of ideas on how to do this but I'm unsure of the best way to proceed with the database side of things. One thing I am interested in doing is allowing these events to be completely dynamic, but at the same time to allow for reporting on relational event counters.
For example, all countries broken down by operating systems. The desired effect would be:
US # of events
iOS - # of events that occured in US
Android - # of events that occured in US
CA # of events
iOS - # of events that occured in CA
Android - # of events that occured in CA
etc.
My intent is to be able to accept these event names like so:
/?country=US&os=iOS&device=iPhone&color=blue&carrier=Sprint&city=orlando&state=FL&randomParam=123&randomParam2=456&randomParam3=789
Which means in order to do the relational counters for something like the above I would potentially be incrementing 100+ counters per request.
Assume there will be 10+ million of the above requests per day.
I want to keep things completely dynamic in terms of the event names being tracked and I also want to do it in such a manner that the lookups on the data remains super quick. As such I have been looking into using redis or mongodb for this.
Questions:
Is there a better way to do this then counters while keeping the fields dynamic?
Provided this was all in one document (structured like a tree), would using the $inc operator in mongodb to increment 100+ counters at the same time in one operation be viable and not slow? The upside here being I can retrieve all of the statistics for one 'campaign' quickly in a single query.
Would this be better suited to redis and to do a zincrby for all of the applicable counters for the event?
Thanks
Depending on how your key structure is laid out I would recommend pipelining the zincr commands. You have an easy "commit" trigger - the request. If you were to iterate over your parameters and zincr each key, then at the end of the request pass the execute command it will be very fast. I've implemented a system like you describe as both a cgi and a Django app. I set up a key structure along the lines of this:
YYYY-MM-DD:HH:MM -> sorted set
And was able to process Something like 150000-200000 increments per second on the redis side with a single process which should be plenty for your described scenario. This key structure allows me to grab data based on windows of time. I also added an expire to the keys to avoid writing a db cleanup process. I then had a cronjob that would do set operations to "roll-up" stats in to hourly, daily, and weekly using variants of the aforementioned key pattern. I bring these ideas up as they are ways you can take advantage of the built in capabilities of Redis to make the reporting side simpler. There are other ways of doing it but this pattern seems to work well.
As noted by eyossi the global lock can be a real problem with systems that do concurrent writes and reads. If you are writing this as a real time system the concurrency may well be an issue. If it is an "end if day" log parsing system then it would not likely trigger the contention unless you run multiple instances of the parser or reports at the time of input. With regards to keeping reads fast In Redis, I would consider setting up a read only redis instance slaved off of the main one. If you put it on the server running the report and point the reporting process at it it should be very quick to generate the reports.
Depending on your available memory, data set size, and whether you store any other type of data in the redis instance you might consider running a 32bit redis server to keep the memory usage down. A 32b instance should be able to keep a lot of this type of data in a small chunk of memory, but if running the normal 64 bit Redis isn't taking too much memory feel free to use it. As always test your own usage patterns to validate
In redis you could use multi to increment multiple keys at the same time.
I had some bad experience with MongoDB, i have found that it can be really tricky when you have a lot of writes to it...
you can look at this link for more info and don't forget to read the part that says "MongoDB uses 1 BFGL (big f***ing global lock)" (which maybe already improved in version 2.x - i didn't check it)
On the other hand, i had a good experience with Redis, i am using it for a lot of read / writes and it works great.
you can find more information about how i am using Redis (to get a feeling about the amount of concurrent reads / writes) here: http://engineering.picscout.com/2011/11/redis-as-messaging-framework.html
I would rather use pipelinethan multiif you don't need the atomic feature..