How to store locations in ArangoDB? - nosql

I am building a web application where I need to store a large number of unique addresses as nodes in ArangoDB.
One approach would be using a hierarchical graph model: a country node connected to county nodes, county nodes connected to cities and cities connected to exact addresses with GeoJSON attributes.
The other option would be having only address nodes which contain city, county and country as attributes.
Which method would be more beneficial? I would be running queries to find locations in a given range or locations in a given city.

Well, let's see what you will need in terms of collections to build your app:
Collection storing the Places you want to use in your app. This would be your main collection and would contain among other things a map location object {longitude: XXX, Latitude YYY}
Because you probably want people to be able to search by city, country, etc.. You need either a collection per Location type (city, country, etc) or a table with all the locations and a "type" flag that indicates the location is city or country, etc....
3.- You need a table that allows you to start at a country and drill down to a particular set of cities (for example). So, you need a table with a from key and a to key
By this point you probably have noticed that we have basically built a hierarchy, which in Arango I would build as at least one Places vertex collection, a Locations vertex collection and a locationContains edge collection. This would allow for really fast lookups and is one of the reasons why graph databases were originally created.
Now note that since Arango is a multi model DB, you can use the graph syntax (I like anonymous graph syntax myself), but you can also use traditional joins whenever needed, which behave very similar to a traditional relational DB.

Related

Making unique reproducible ids based on strings (for mongoose model)

I pull data (a list of properties) from a city's website and it contains info like address, property owner, and longitude and latitude.
I need to make a unique id out of this and the properties will be stored in a database. The id needs to be reproducible because I will be checking occasionally to see if there are new properties so the only way to cross compare is by being able to make the id on the spot.
Is this a common problem to run into when pulling data and checking if there is anything new? I am using mongoose as the database and I am not sure if a non relational database is the best solution for my problem but I know I can make it work.
The current plan is to just combine longitude and latitude and call that the unique id. The city's data unfortunately doesn't have any unchanging unique id so I have to hack something together for comparing their data to what I have saved in the database.
Any ideas on what a better less hacky method is? Does mongoose have a different way to deal with this?

Feed JSON to Nominatim with import.io

Greetings fellow SO users,
I am extracting data with import.io from a site which contains city names. What I want to accomplish is to get the coordinates to each city from Nominatim and finally create/get a JSON response which contains the city name and the corresponding coordinates for each.
So I basically need to use the result from one API as the input for another (Nominatim).
Or in other words: feed a JSON list of city names to OSM's Nominatim and get back the coordinates to each city.
I wonder if this is even possible or what other options I have. Finally this would be used with leaflet to put some markers at a map.
There are tutorials for Nominatim, how to query etc. but only one query at a time. Is it even possible to query a whole list of places?
What you want to achieve in here is what we call Chained APIs. So you will need two APIs, where the input of the second one is the output of the first one.
In this case you will need some custom processing between the two APIs. From the first API you are getting the city name and from here you need to generate a list of URLs in the format http://nominatim.openstreetmap.org/search?q=CITY&format=xml one per each CITY.
After that you can use the Bulk Extract feature in import.io and pass the whole list of URLs to be queried against the API.

Modeling RealEstate hierarchical location using MongoDB

I am designing a Real Estate classifieds website. Each ad will have a location which will defined using the following three attributes (state, city, town).
What's the best way to represent this in a Mongodb document? I am worrying about query performance because:
Location will be the first search criteria
I want to allow users to search by multiple locations.
Example: A user looking to rent a flat in New York (state) or in Harrisburg (city in Pennsylvania).
I think, I will use regular expressions like the following:
db.ads.find({$or:[{location: /^,new york/},{location: /^,pennsylvania,harrisburg,/}]})
This solution was inspired from the following article

Mongdb database design

I am developing an app and I've chosen mongodb as the database mainly because of its flexibility and the ability to query geospatial data. But I tend to be a bit old school concerning the design (read 'relational database') and I'd like a few hints on how to design my database so it best fits my need.
I have a User model , and let's say a Object item. Each user has a location (which can change rapidly over the time). The Object items also have a location and belongs to a User.
For now I kinda developed my database like I'd do it in MySql:
* A User table with an array of Object ID
* an Object table with a reference to the owner (user) ID.
Since I will need to make frequent query on the location of each model and make some range query (which objects are closer than 100m to this user etc), is this a good design ? My main concern is the location query. I know I can put an index on the location, but I did not want to put two index location on the User and the Object array of the user on the same table.
Another feature is that I will surely be doing some sharding on my database, and according from what I read on mongodb, I think I'll make the sharding on the location index (mostly the user).
Does that make sense or should I actually just go with the one-size-fits-all approach ? Or do you have another design in mind that would be better ?
Thanks.
As per my opinion:
Step 1: You can use three different collections. First one for the Users and second will be Locations and last will be ITEMS(Along with user ID).
Step 2: In your location collection you can save both of the coordinates of the location distinguished by two different types. For example, for user location you can set type as "USER_LOC" and for Object(Item) location you can set type as "ITEM_LOC" along with USER ID and ITEM ID ().
Step 3: As if ITEM and USERS locations are moving so you can save both the(user and item) coordinates in your location collection.
Step 4: You can save last coordinates of item and users in their respective collections itself.
Step: 5 As per your query if you want to fetch items which are near to user in X meter distance so you can filter location collection for ITEM type only and you have last coordinates of the user in USER collection itself. So you can find List of items which are near to user current location within a specific distance.
This is just my opinion. Thanks.

Which of CouchDB or MongoDB suits my needs?

Where I work, we use Ruby on Rails to create both backend and frontend applications. Usually, these applications interact with the same MySQL database. It works great for a majority of our data, but we have one situation which I would like to move to a NoSQL environment.
We have clients, and our clients have what we call "inventories"--one or more of them. An inventory can have many thousands of items. This is currently done through two relational database tables, inventories and inventory_items.
The problems start when two different inventories have different parameters:
# Inventory item from inventory 1, televisions
{
inventory_id: 1
sku: 12345
name: Samsung LCD 40 inches
model: 582903-4
brand: Samsung
screen_size: 40
type: LCD
price: 999.95
}
# Inventory item from inventory 2, accomodation
{
inventory_id: 2
sku: 48cab23fa
name: New York Hilton
accomodation_type: hotel
star_rating: 5
price_per_night: 395
}
Since we obviously can't use brand or star_rating as the column name in inventory_items, our solution so far has been to use generic column names such as text_a, text_b, float_a, int_a, etc, and introduce a third table, inventory_schemas. The tables now look like this:
# Inventory schema for inventory 1, televisions
{
inventory_id: 1
int_a: sku
text_a: name
text_b: model
text_c: brand
int_b: screen_size
text_d: type
float_a: price
}
# Inventory item from inventory 1, televisions
{
inventory_id: 1
int_a: 12345
text_a: Samsung LCD 40 inches
text_b: 582903-4
text_c: Samsung
int_a: 40
text_d: LCD
float_a: 999.95
}
This has worked well... up to a point. It's clunky, it's unintuitive and it lacks scalability. We have to devote resources to set up inventory schemas. Using separate tables is not an option.
Enter NoSQL. With it, we could let each and every item have their own parameters and still store them together. From the research I've done, it certainly seems like a great alterative for this situation.
Specifically, I've looked at CouchDB and MongoDB. Both look great. However, there are a few other bits and pieces we need to be able to do with our inventory:
We need to be able to select items from only one (or several) inventories.
We need to be able to filter items based on its parameters (eg. get all items from inventory 2 where type is 'hotel').
We need to be able to group items based on parameters (eg. get the lowest price from items in inventory 1 where brand is 'Samsung').
We need to (potentially) be able to retrieve thousands of items at a time.
We need to be able to access the data from multiple applications; both backend (to process data) and frontend (to display data).
Rapid bulk insertion is desired, though not required.
Based on the structure, and the requirements, are either CouchDB or MongoDB suitable for us? If so, which one will be the best fit?
Thanks for reading, and thanks in advance for answers.
EDIT: One of the reasons I like CouchDB is that it would be possible for us in the frontend application to request data via JavaScript directly from the server after page load, and display the results without having to use any backend code whatsoever. This would lead to better page load and less server strain, as the fetching/processing of the data would be done client-side.
I work on MongoDB, so you should take this with a grain of salt, but this looks like a great fit for Mongo.
We need to be able to select items from only one (or several) inventories.
It's easy to ad hoc queries on any fields.
We need to be able to filter items based on its parameters (eg. get all items from inventory 2 where type is 'hotel').
The query for this would be: {"inventory_id" : 2, "type" : "hotel"}.
We need to be able to group items based on parameters (eg. get the lowest price from items in inventory 1 where brand is 'Samsung').
Again, super easy: db.items.find({"brand" : "Samsung"}).sort({"price" : 1})
We need to (potentially) be able to retrieve thousands of items at a time.
No problem.
Rapid bulk insertion is desired, though not required.
MongoDB has much faster bulk inserts than CouchDB.
Also, there's a REST interface for MongoDB: http://github.com/kchodorow/sleepy.mongoose
You might want to read http://chemeo.com/doc/technology, who dealt with the arbitrary property search problem with MongoDB.