How do you store and display if a user has voted or not on something? - zend-framework

I'm working on a voting site and I'm wondering how I should handle votes.
For example on SO when you vote for a question (or answer) your vote is stored, and each time I go back on the page I can see that I already voted for this question because the up/down button are colored.
How do you do that? I mean I've several ideas but I'm wondering if it won't be an heavy load for the database.
Here is my ideas:
Write an helper which will check for every question if a voted has been casted
That's means that the number of queries will depends on the number of items displayed on the page (usually ~20)
Loop on my items get the ids and for each page write a query which will returns if a vote has been casted or NULL
Looks ok because only one query doesn't matter how much items on the page but may be break some MVC/Domain Model design, dunno.
When User log in (or a guest for whom an anonymous user is created) retrieve all votes, store them in session, if a new vote is casted, just add it to the session.
Looks nice because no queries is needed at all except the first one, however, this one and, depending on the number of votes casted (maybe a bunch for each user) can increase the size of the session for each users and potentially make the authentification slow.
How do you do? Any other ideas?

For eg : Lets assume you have a table to store votes and the user who cast it.
Lets assume you keep votes in user_votes when a vote is cast with a table structure something like the below one.
id of type int autoincrement
user_id type int, Foreign key representing users table
question_id type of int, Foreign key representing questions table
Now as the user will be logged in , when you are doing a fetch for the questions do a left join with the user_id in the user_votes table.
Something like
SELECT q.id, q.question, uv.id
FROM questions AS q
LEFT JOIN user_votes AS uv ON
uv.question_id = q.id AND
uv.user_id = <logged_in_user_id>
WHERE <Your criteria>
From the view you can check whether the id is present. If so mark voted, else not.
You may need to change your fields of the questions table and all. I am assuming you store questions in questions table and users in user table so and so. All having the primary key id .
Thanks

You could use a combination of your suggested strategies.
Retrieve all the votes made by the logged in user for recent/active questions only and store them in the session.
You then have the ones that are more likely to be needed while still reducing the amount you need to store in the session.
In the less likely event that you need other results, query for just those as and when you need to.
This strategy will reduce the amount you need to store in the session and also reduce the number of calls you make to your database.

Just based on the information than you've given so far, I would take the second approach: get the IDs of all the items on the page, and then do a single query to get all the user's votes for that list of item IDs. Then pass the collection of the user's item votes to your view, so it can render items differently when the user has voted for that item.
The other two approaches seem like they would tend to be less efficient, if I understood you correctly. Using a view helper to initiate an individual query for each item to check if the user has voted on it could lead to a lot of unnecessary queries. And preloading all the user's voting history at login seems to add unnecessary overhead, getting data that isn't always needed and adding the burden of keeping it up to date for the duration of the session.

Related

Aggregate data while inserting into raw table

I'm currently building a forum alike application. Users will be able to see recent posts with the total like count. If the post is interesting to the user, they can like it as well and contribute to the total like count.
The normalized approach would be to have two tables: user_post(contains id, metadata ...), liked_post(which includes the user id + post id). When posts are getting queried, the like count would be determined with the COUNT() statement on the liked_post table grouped by the post id.
Im thinking of another approach, which requires no group by on a potential huge table. That would be to add a like_count column to the user_post table and break the normalization. This column would be always updated when a new liked_post entry gets inserted or deleted. That means: Every time a user likes a post -> there will be an update on the user_post table (increment the like_count column) + insert/delete entity in liked_post table (With a trigger or code in App layer).
Would this aggregation on the fly approach have any disadvantages, except for consistency concerns? This would enable very simple and fast select queries but Im not sure if the additional update would be an issue.
What are your thoughts ?
Im really interested in the performance impact and not if you should do this from the project begin or not.
Your idea is correct and widely used. Problem that you will face:
how do you make sure that like_count is valid? Can this number be delayed or approximated somehow?
In general you can do this following ways
update like_count within application code
update like_count by triggers
If you want to have exact values correct you could accumulate those sums by triggers or do it programatically ensuring that like count update is always within same transaction that insert to liked_posts
Using triggers it could be something like this:
CREATE FUNCTION public.update_like_count() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
UPDATE user_post SET user_post.liked_count = user_post.liked_count + 1
WHERE user_post.id = NEW.post_id;
RETURN NEW;
END;
$$;
CREATE TRIGGER update_like_counts
AFTER INSERT ON public.liked_posts
FOR EACH ROW EXECUTE PROCEDURE public.update_like_count();
Also you should handle AFTER DELETE by separate trigger.
Be aware that depending on transaction isolation level you might enter concurrency problem here (if 2 inserts are done at the same time - like_count may be exactly same number for two transactions) and end up with invalid total.
So I've had a problem similar to this in the past, the solution I went with is similar to what you've described, which is having an aggregated stored value like_count. Like you mentioned the only downside would be consistency concerns however this problem exists even in the former.
The solution to something like this lies more in the application dev, so utilizing something like web-sockets to keep posts up to date, without too much fluff
When a user's browser/client loads a post they join a room with the post id, and when user interacts with a post ( like, dislike etc ) that interaction is broadcasted to all users in that room ( post id ).
Finally when it comes to finding out which users liked this post, you can query/load at the point of when the user clicks to find out. ~ cheers

Ordering Firebase posts Chronologically Swift

I have added posts to firebase and I am wondering how I can pull the posts chronologically based on when the user has posted them.
My Database is set up like below
The first node after comments is the User ID and then the posts are underneath that. Obviously, these posts are in order, however if a new user posts something in between "posting" and "another 1" ,for example, how would I pull that so it shows up in between.
Is there a way to remove the autoID and just use the userID as a key? The problem I am running into is the previous post is overwritten then.
I am accepting the answer as it is the most thorough. What I did to solve my problem was just create the unique key as the first node and then use the UID as a child and the comment as a child. Then I pull the unique key's as they are in order and find the comment associated with the uid.
The other answers all have merit but a more complete solution includes timestamping the post and denormalizing your data so it can be queried (assuming it would be queried at some point). In Firebase, flatter is better.
posts
post_0
title: "Posts And Posting"
msg: "I think there should be more posts about posting"
by_uid: "uid_0"
timestamp: "20171030105500"
inv_timestamp: "-20171030105500"
uid_time: "uid_0_ 20171030105500"
uid_inv_time: "uid_0_-20171030105500"
comments:
comment_0
for_post: "post_0"
text: "Yeah, posts about posting are informative"
by_uid: "uid_1"
timestamp: "20171030105700"
inv_timestamp: "-20171030105700"
uid_time: "uid_1_20171030105700"
uid_inv_time: "uid_1_-20171030105700"
comment_1
for_post: "post_0"
text: "Noooo mooooore posts, please"
by_uid: "uid_2"
timestamp: "20171030110300"
inv_timestamp: "-20171030110300"
uid_time: "uid_2_20171030110300"
uid_inv_time: "uid_2_-20171030110300"
With this structure we can
get posts and their comments and order them ascending or descending
query for all posts within the last week
all comments or posts made by a user
all comments or posts made by a user within a date range (tricky, huh)
I threw a couple of other key: value pairs in there to round it out a bit: compound values, query-ing ascending and descending, timestamp.
You can not use the userID as key value instead of the autoID, because the key must be unique, thats why Firebase just updates the value and does not add another one with the same key. Normally Firebase nodes are ordered chronologically by default, so if you pull the values, those should be in the right order. However if you wanna make sure about that, you can add a timestamp value and set a server timestamp. After pulling the data you can order it by that timestamp (I think there is actually a timestamp saved automatically by firebase that you can access somehow, but you need to look that up in the documentation). If I got it right, in order to accomplish what you want, you need to change the structure of your database. For example you could maybe use the autoID but save the userID you wanted to use as key as a value if you need that. Hope I got your idea right, if not just be more precise and I will try to help.
Firebase keys are chronological by default - it's built into their key generation algorithm. I think you need to restructure/rethink your data.
Your POSTS database should (possibly) have the comments listed with each post, and then you can duplicate on the user record if needed for faster retrieval if they need to be accessed by user. So something like:
POSTS
- post (unique key)
- title (text)
- date (timestamp)
- comments
- comment (unique key)
- text (text)
- user_id (user key)
- date (timestamp)
When you pull the comments, you shouldn't be pulling them from a bunch of different users. That could result it a lot of queries and a ton of load time. Instead, the comments could be added (chronologically of course) to the post object itself, and also to the user if you want to keep a reference there. Unlike in MySQL, NoSQL databases can have quite a bit of this data duplication.

MS Access Form and Tables

I have a specific question regarding the utilization of three tables in a database. Table 1 is called Personnel, and lists the names of the staff.
Tables 2 and 3 are identical, just listing two different types of overtime (long and short), along with the hours of the OT, Date of the OT, and Assigned to/Picked fields that are empty.
Here is the idea, I just dont know how to implement it. I would like to create a form for people to enter their OT picks, then automatically move to the next person on the list. So Rich Riphon, as an example, would be up first, would click on the link I would send, and a form would open up, showing his name, populated by the first table, and showing two drop down menus, populated from the Long OT and Short OT tables. He would select one from each (or None, which would be a option) and Submit it.
The form action would be to place his name in the Assigned field for the OT he picked, and place a Yes in the Picked field.
When the next person in the list opens the form, it has moved down to number 2 on the Personnel list, Cheryl Peterson, and shows her the remaining OT selections (excluding those that have a Yes in the Picked column).
Any suggestions or comments or better ways to do this would be appreciated.
First, I don't think ms access would be able to (easily) kick off the process based on a hyperlink. You may be able to do something by passing a macro name to a cmd prompt but it would take some mastery to get it working properly. Could you instead create a login form to get the current user? If you do that you don't really need to display the personnel list, just keep track of who has not yet responded to the OT request. Essentially at that point all you would need on your form is a listing of the available OT and a button that creates the assignment. Also it may be easier (and a better design) to only have one table for the OT listings and add a column for the type of overtime (long/short).
What if Cheryl isn't the 2nd person to get the form? Your concept goes out the window.
Instead, I would keep a table of all user names, and their security level. managers can see everything, individual users can only see their record. This would be done by using a query behind the OT Picks form, and either filtering by the current user or not filtering at all. I have done many of these types of "user control" databases and they all have worked well.
As for the actual OT tracking, I agree with Steve's post in that it should be done in one table This would be the preferred method of a concept referred to as "normalizing data". You really want to store as little data as possible to keep the size of your database down. As an example, your Login table would have the following fields:
UserID
FirstName
LastName
SecurityLevel
Address1
Address2
City
State
Phone
Etc... (whatever relevant info pertains to that person)
Your OT table would look like this:
UserID
OTDate
OTHours
OTType
Etc... (whatever else is relevant to OT)
You would then join those 2 tables on the UserID fields in both tables any time you needed to write a query to report OT hours or whatever.

perl dbi submit checkbox values

I have a form with checkboxes and I need to know what the best way to submit them to the database is. I have the following table setup:
roles users user_roles
----- ----- ----------
id id user_id
role_id
I have a page where you can edit a user and assign them different roles via checkbox, then those checkboxes are saved in the user_roles table. Since editing a user's roles can involve either deleting rows or adding rows, this is how I currently handle it:
my $form_vals = (1=>1,2=>2); #submitted by user
my $db_vals = (3=>3); #gotten out of db
So I have these two hashes and I will compare the keys in $form_vals with the keys in $db_vals, then I see that I have two extra values that are not present in the database so I add them. And vice versa I find which values are no longer selected on the form by comparing the keys in $db_vals with the keys in $form_vals and then I delete those rows from the database. My question is, does anyone know of a better/easier way to do this? It's never really seemed obvious to me how to handle checkboxes and I'd like to know what best practice is. Thanks!
I wouldn't say that this has much to do with check boxes per se.
Basically what you have is two array of arrays, [ (uid, rid), (uid, rid) ], and you want to make array1 (the one in your database) a copy of array2 (the user input from the checkboxes). You could have a multi select or a comma separated string, and the case would be the same. You have a user id, and you want that user to have only the roles supplied.
Two ways to achieve that would be to either
Put both arrays in one hash each, do foreach key on the submitted, if not present in the database one do insert. Then do the same for the database hash and delete those not present in the submitted hash
Delete everything from the member_role table and insert what's submitted.
You really have to know everything in the database and everything submitted and check twice if you don't want to delete everything and do a fresh insert. You can of course make a function doing this for you, hiding the ugliness a bit. Think about how you'd do if it was just two arrays and no database was around.

Searches (and general querying) with HBase and/or Cassandra (best practices?)

I have User model object with quite few fields (properties, if you wish) in it. Say "firstname", "lastname", "city" and "year-of-birth". Each user also gets "unique id".
I want to be able to search by them. How do I do that properly? How to do that at all?
My understanding (will work for pretty much any key-value storage -- first goes key, then value)
u:123456789 = serialized_json_object
("u" as a simple prefix for user's keys, 123456789 is "unique id").
Now, thinking that I want to be able to search by firstname and lastname, I can save in:
f:Steve = u:384734807,u:2398248764,u:23276263
f:Alex = u:12324355,u:121324334
so key is "f" - which is prefix for firstnames, and "Steve" is actual firstname.
For "u:Steve" we save as value all user id's who are "Steve's".
That makes every search very-very easy. Querying by few fields (properties) -- say by firstname (i.e. "Steve") and lastname (i.e. "l:Anything") is still easy - first get list of user ids from "f:Steve", then list from "l:Anything", find crossing user ids, an here you go.
Problems (and there are quite a few):
Saving, updating, deleting user is a pain. It has to be atomic and consistent operation. Also, if we have size of value limited to some value - then we are in (potential) trouble. And really not of an answer here. Only zipping the list of user ids? Not too cool, though.
What id we want to add new field to search by. Eventually. Say by "city". We certainly can do the same way "c:Los Angeles" = ..., "c:Chicago" = ..., but if we didn't foresee all those "search choices" from the very beginning, then we will have to be able to create some night job or something to go by all existing User records and update those "c:CITY" for them... Quite a big job!
Problems with locking. User "u:123" updates his name "Alex", and user "u:456" updates his name "Alex". They both have to update "f:Alex" with their id's. That means either we get into overwriting problem, or one update will wait for another (and imaging if there are many of them?!).
What's the best way of doing that? Keeping in mind that I want to search by many fields?
P.S. Please, the question is about HBase/Cassandra/NoSQL/Key-Value storages. Please please - no advices to use MySQL and "read about" SELECTs; and worry about scaling problems "later". There is a reason why I asked MY question exactly the way I did. :-)
Being able to query properties directly is one of the features you lose when moving away from SQL, so you need a way to maintain your own index to let you find records.
If your datastore does not have built in indexing or atomic list operations, you will need to deal with the locking issues you mention. However, indexing doesn't necessarily need to be synchronous - maintain a queue of updated records to be reindexed and you have a solution for 3 that can be reused to solve 2 also.
If the index list for a particular value becomes too large for the system to handle in a single list, you can replace the list of users with a list of lists. However, if you have that many records with the same value it probably isn't a particularly useful search criteria anyway.
Another option that is useful in some cases is to use a seperate system for the indexing - for example you could set up lucene to index the records in your main datastore.
I guess i would have implemented this as a MapReduce job, which would run on schedule.
Each search word, would be a row-key with lookup to UID.
Rowkey:uid1
profile:firstName: Joe
profile:lastName: Doe
profile:nick: DoeMaster
Rowkey: uid2
profile:firstName: Jane
profile:lastName: Doe
profile:nick: SuperBabe
MapReduse indexes all searchable properties and add them with search word as row key
Rowkey: Jane
lookup:uid: uid2
Rowkey: Doe
lookup:uid: uid2, uid1
Rowkey: DoeMaster
lookup:uid: uid1
..etc
Now, if you need to update the index list on the fly as a user change, you would write the change directly to the index base, by remove uid value from index and add to another row key. In case of this happens at the same time, temporary locking could be implemented.
For users being removed, an additional attribute telling the state of the user could be use to filter them out from search.
Adding additional search word isn't very hard, since its just about which name:value you want to index. you could filter search more also by adding type attribute to your row key/keyword. i.e boston - lookup:type: city.
The idea is to maintain your own row key based search index inside hbase.