I have a database mysql and it contains 1000 records of people (Name, Age, Adress, Country). I want to make an application that :
Randomly displays ONE of the records in the database.
Randomly displays ONE of the records according to a specific 'Country'.
My usual approach to this is : parse ALL data to the iphone using JSON, then manipulate the data. Because of the volume of data that the user needs to download (1000 records) and because i only need just ONE of the records to be displayed, it seems wasteful to download all 1000 records and then sort the data. What other approach can i use? (not sure but is it possible to query the database directly?)
It would be very unsafe to query the DB directly and the usual approach you mentioned would be a very bad idea. So, the general approach, safe and optimized, for this scenario is to write some web-service, can be in PHP or ASP.net or etc., end-points which can optionally take some parameters, like the specific country, as argument and return you data in JSON format and then you can display it. This is how I'd approach to such problem. Here's a tutorial on writing some basic web-services with PHP.
http://davidwalsh.name/web-service-php-mysql-xml-json
Hope that'd help.
Related
I have some extensive queries (each of them lasts around 90 seconds). The good news is that my queries are not changed a lot. As a result, most of my queries are duplicate. I am looking for a way to cache the query result in PostgreSQL. I have searched for the answer but I could not find it (Some answers are outdated and some of them are not clear).
I use an application which is connected to the Postgres directly.
The query is a simple SQL query which return thousands of data instance.
SELECT * FROM Foo WHERE field_a<100
Is there any way to cache a query result for at least a couple of hours?
It is possible to cache expensive queries in postgres using a technique called a "materialized view", however given how simple your query is I'm not sure that this will give you much gain.
You may be better caching this information directly in your application, in memory. Or if possible caching a further processed set of data, rather than the raw rows.
ref:
https://www.postgresql.org/docs/current/rules-materializedviews.html
Depending on what your application looks like, a TEMPORARY TABLE might work for you. It is only visible to the connection that created it and it is automatically dropped when the database session is closed.
CREATE TEMPORARY TABLE tempfoo AS
SELECT * FROM Foo WHERE field_a<100;
The downside to this approach is that you get a snapshot of Foo when you create tempfoo. You will not see any new data that gets added to Foo when you look at tempfoo.
Another approach. If you have access to the database, you may be able to significantly speed up your queries by adding and index on on field
I have to make a website. I have the choice between Postgres and MongoDB.
Case 1 : Postgres
Each page has one table and each table has only one row, for one page (each page is not structured like another)
I have a timelined page with medias (albums and videos)
So I have multiple medias (pictures, videos), and I display it as well by an album of pictures page and a videos page.
Therefore I have a medias table, linked with an album table (many-to-many), and a type column for determining if it's picture or video.
Case 2 : MongoDB
I'm completely new to NoSQL and I don't know how to store the data.
Problems that I see
Only one row for a table, that disturb me
In the medias table, I can have an album with videos, I'd like to avoid it. But if I cut this table in pictures table and videos table, How can I do a single call to have all the medias for the timelined page.
That's why I think it's better to me to make the website with MongoDB.
What is the best solution, Postgres or MongoDB? What do I need to know if it's MongoDB? Or maybe something escape me for Postgres.
It will depend on time, if you don't have time to learn another technology, the answer is to going straight forward with the one you know and solve the issues with it.
If scalability is more important, then you'll have to take a deeper look to your architecture and know very well how to scale postgresql.
Postgresql can handle json columns for unstructured data, I use it and it's great. I will have a single table with the unstructured data in a column name page_structure, so you'll have one single big indexed table instead of a lot of one row tables.
It's relative easy to query just what you want so no need no separate tables for images and videos, in order to be more specific, you'll need to provide some scheme.
I think you are coming to the right conclusion of using a NoSql database because you are not sure about the columns in a table for a page and thats the reason you are creating different tables for different pages. I will still say to make columns a bit consistent over the records. Anyways, by using MongoDB, you can have different records (called documents in MongoDB) with different columns based on attributes of your page in a single Collection (Tables in SQL). You can have pictures and videos collections separately if you want and wire them with your page collection using some foreign key like page_id. Or you can call page collection to get all the attributes including an array containing the IDs of all videos or pictures by which you can retrieve corresponding videos and pictures of a particular page like illustrated below,
Collections
Pages [{id, name, ...., [video1, video2,..], [pic1, pic2, pic78,...]}, id, name, ...., [video1_id, video2_id,..], [pic1_id, pic2_id, pic78_id,...]},...]
Videos [{video1_id, content,... }, {video2_id, content,...}]
Pictures [{pic1_id, content,... }, {pic2_id, content,...}]
I suggest you use the Clean Code architecture. personally, I believe that you MUST departure your application logic and data access functions aside so they can both work separately. your code must not rely on your database. I rather code the way that I can migrate my data to every database I'd like it would still work.
think about when your project gets big and you want to try cashing to solve a problem. if your data access functions are not separated from your business logic code you can not easily achieve that.
I agree with #espino316 about using the project you are already familiar with.
and also with #Actung about you should consider learning a database like MongoDB but in some training projects first, because there are many projects that the best way to go is to use NoSQL.
just consider that might find out about this 2 years AFTER you deployed your website. or the opposite way, you go for MongoDB and you realize the best way to go was to use Postgres or IDK MySQL, etc.
I think the best way to go is to make the migration easy for yourself.
all the best <3
if i want to develop an application, I'm worried about its performance after the number of users and stored data increases.
actually I don't know what is the best way to implement a program that it works with a really large data and do some things like search in it, find and receive user information, search text and so on in real time without any delay !
Let's me explain the problem more
for example i have chosen 'Mongodb' as a database and suppose we have at least five million users and a user want to log in into the system, the user has sent the username and password
The first thing that we should do is to find the user with that username and then check the password, in mongodb we should use something like 'find' method to get the user's information, something like below:
Users.find({ username: entered_username })
then get the user information and we check the password
but the 'find' method should search the username between million users and it's a large number and if any person request for authentication, this method should be run for each of them and it cause a heavy processing on the system
but unfortunately this problem is only for something like finding a user, if we decide to search a text when we have a lot of texts and posts on the database the problem is more bigger
i don't know how big companies like facebook and linkedin search through millions of data in such a short span of time. actually i don't want to create something like facebook or more but i have a large amount of data and i'm looking for a good way to handle it
is there any framework or something else that help me to handle large data on the databases or is there exist a method to implement data on database so that we search and find data fast and quickly? should i use a particular data structure?
i founded an opensource project elasticsearch that it help us to search faster but i don't know if i found something with elastic how can i find it on mongodb too for doing something like updating data and if i use elastic search i should use mongodb too or not!? can i use elastic as a database and as a search engine simultaneous !?
if i use elasticsearch and mongodb together then i should have two copies of my data, one in mongodb and one in elasticsearch!? and this two copies of the data that are separated :( i wish elasticsearch search in the mongodb that does not have to create two copies of the data
thank you if you help me to find out a good way and understand what should i do.
When you talk about performance, it usually boils down to three things:
Your design
Your definition of "quick", and
How much you're willing to pay
Your design
MongoDB is great if you want to iterate on your data model, can scale horizontally, and very quick if used properly. Elasticsearch on the other hand, is not a database. However, it is very quick for searching. A traditional relational database will be useful if you know exactly how your data looks like, and don't expect it to change much, or is relational by nature.
You can, for example, use a relational database for user login, use MongoDB for everything else, and use Elastic for textual, searchable data. There is no rule that tells you to keep everything within a single database.
Make sure you understand indexing, and know how to utilize it to its fullest potential. The fastest hardware will not help you if you don't design your database properly.
Conclusion: use any tool you need, combine if necessary, but understand their strengths and weaknesses.
Your definition of "quick"
How "quick" is quick enough for your application? Is 100ms quick enough? Is 10ms quick enough? Remember that more performance you ask of the machine, more expensive it will be. You can get more performance with a better design, but design can only go so far.
Usually this boils down to what is acceptable for you and your client. Not every application needs a sub-10ms response time. There's plenty of applications that can tolerate queries that return in seconds.
Conclusion: determine what is acceptable, and design accordingly.
How much you're willing to pay
Of course, it all depends on how much you're willing to pay for all the hardware that need to host all that stuff. MongoDB might be open source, but you need some place to host it. Also, you cannot expect magic. You can't throw thousands of queries and updates per second, and expect it to be blazing fast when you only give it 1 GB of RAM.
Conclusion: never under-provision to save money if you want your application to be successful.
I was given the task to decide whether our stack of technologies is adequate to complete the project we have at hand or should we change it (and to which technologies exactly).
The problem is that I'm just a SQL Server DBA and I have a few days to come up with a solution...
This is what our client wants:
They want a web application to centralize pharmaceutical researches separated into topics, or projects, in their jargon. These researches are sent as csv files and they are somewhat structured as follows:
Project (just a name for the project)
Segment (could be behavioral, toxicology, etc. There is a finite set of about 10 segments. Each csv file holds a segment)
Mandatory fixed fields (a small set of fields that are always present, like Date, subjects IDs, etc. These will be the PKs).
Dynamic fields (could be anything here, but always as a key/pair value and shouldn't be more than 200 fields)
Whatever files (images, PDFs, etc.) that are associated with the project.
At the moment, they just want to store these files and retrieve them through a simple search mechanism.
They don't want to crunch the numbers at this point.
98% of the files have a couple of thousand lines, but there's a 2% with a couple of million rows (and around 200 fields).
This is what we are developing so far:
The back-end is SQL 2008R2. I've designed EAVs for each segment (before anything please keep in mind that this is not our first EAV design. It worked well before with less data.) and the mid-tier/front-end is PHP 5.3 and Laravel 4 framework with Bootstrap.
The issue we are experiencing is that PHP chokes up with the big files. It can't insert into SQL in a timely fashion when there's more than 100k rows and that's because there's a lot of pivoting involved and, on top of that, PHP needs to get back all the fields IDs first to start inserting. I'll explain: this is necessary because the client wants some sort of control on the fields names. We created a repository for all the possible fields to try and minimize ambiguity problems; fields, for instance, named as "Blood Pressure", "BP", "BloodPressure" or "Blood-Pressure" should all be stored under the same name in the database. So, to minimize the issue, the user has to actually insert his csv fields into another table first, we called it properties table. This action won't completely solve the problem, but as he's inserting the fields, he's seeing possible matches already inserted. When the user types in blood, there's a panel showing all the fields already used with the word blood. If the user thinks it's the same thing, he has to change the csv header to the field. Anyway, all this is to explain that's not a simple EAV structure and there's a lot of back and forth of IDs.
This issue is giving us second thoughts about our technologies stack choice, but we have limitations on our possible choices: I only have worked with relational DBs so far, only SQL Server actually and the other guys know only PHP. I guess a MS full stack is out of the question.
It seems to me that a non-SQL approach would be the best. I read a lot about MongoDB but honestly, I think it would be a super steep learning curve for us and if they want to start crunching the numbers or even to have some reporting capabilities,
I guess Mongo wouldn't be up to that. I'm reading about PostgreSQL which is relational and it's famous HStore type. So here is where my questions start:
Would you guys think that Postgres would be a better fit than SQL Server for this project?
Would we be able to convert the csv files into JSON objects or whatever to be stored into HStore fields and be somewhat queryable?
Is there any issues with Postgres sitting in a windows box? I don't think our client has Linux admins. Nor have we for that matter...
Is it's licensing free for commercial applications?
Or should we stick with what we have and try to sort the problem out with staging tables or bulk-insert or other technique that relies on the back-end to do the heavy lifting?
Sorry for the long post and thanks for your input guys, I appreciate all answers as I'm pulling my hair out here :)
I have a table that loads in data from a webservice that returns a bunch of JSON data. I load more data when the user scrolls down as the DB I am querying holds quite a bit of data.
The question I have is - will it be feasible to implement the right side alphabetical listing on such a table and how could this be done? It is definitely possible if I load in ALL the data and then sort them locally, populate the index and cache the data for every other time. But what if this is going to be 10K rows of data or more. Maybe load this data on application first launch is one option.
So in terms of performance and usability, does anyone have any recommendations of what is possible to do?
I don't think that you should download all data to make those indexes, it would decrease refreshing time and might cause memory problems.
But if you think that indexes could make a good difference than you can add some features to your server API. I would add either a different API call like get_indexes. Or even I would add POST parameter get_indexes which adds an array of indexes to any call which has this parameter set.
And you should be ready to handle cases when user taps on indexes without any downloaded data or when user just stresses out your app making fast index scrolling up and down.
First see how big the data download is. If the server can gzip the data, it may be surprisingly small - JSON zips very well because of the duplicated keys.
If it's too big, I would recommend modifying the server if possible to let you specify a starting letter. That way, if the user hits the "W" in the index you should be able to request all items that begin with "W".
It would also be helpful to get a total record count from the server so you can know how many rows are in the table ahead of time. I would also return a "loading..." string for each unknown row until the actual data comes down.