how to query sqlite for certain rows, i.e. dividing it into pages (perl DBI) - perl

sorry for my noob question,
I'm currently writing a perl web application with sqlite database behind it.
I would like to be able to show in my app query results which might get thousands of rows - these should be split in pages - routing should be like /webapp/N - where N is the page number.
what is the correct way to query the sqlite db using DBI, in order to fetch only the relavent rows.
for instance, if I show 25 rows per page so I want to query the db for 1-25 rows in the first page, 26-50 in the second page etc....

Using the LIMIT/OFFSET construction will show pages, but the OFFSET makes the query inefficient, and makes the page contents move off when the data changes.
It is more efficient and consistent if the next page starts the query at the position where the last one ended, like this:
SELECT *
FROM mytable
ORDER BY mycolumn
WHERE mycolumn > :lastvalue
LIMIT 25
This implies that your links are not /webapp?Page=N but /webapp?StartAfter=LastKey.
This is explained in detail on the Scrolling Cursor page.

You should do something like this:
SELECT column FROM table ORDER BY somethingelse LIMIT 0, 25
and when the user clicks on page 2, you should do:
SELECT column FROM table ORDER BY somethingelse LIMIT 25, 50
and so on..

You'd most likely be using the LIMIT and OFFSET keywords, something like this:
$sth->prepare("SELECT foo FROM bar WHERE something LIMIT ? OFFSET ?");
$sth->execute($limit, $offset);
while ( my #row = $sth->fetchrow_array ) { # loop contains 25 items
The $limit and $offset variables would be controlled by the parameters passed to your script by html/cgi/whatever features.

Pagination is one of those problems a lot of CPAN modules have already solved. If you're using straight SQL, you could look at something like DBIx::Pager. You might also want to check out something like Data::Pageset to help you manage creating the links to your various pages. If you're using DBIx::Class (which is an excellent tool) for your SQL queries, then DBIx::Class::ResultSet::Data::Pageset will make this very easy for you.
Essentially handling the SQL is one end of it, but you'll also need to solve various problems in the templating aspect of it. I'd encourage you to have a look at these modules and maybe even poke around CPAN a little bit more to see where somebody else has already done the heavy lifting for you with respect to pagination.

Related

Aggregate data while inserting into raw table

I'm currently building a forum alike application. Users will be able to see recent posts with the total like count. If the post is interesting to the user, they can like it as well and contribute to the total like count.
The normalized approach would be to have two tables: user_post(contains id, metadata ...), liked_post(which includes the user id + post id). When posts are getting queried, the like count would be determined with the COUNT() statement on the liked_post table grouped by the post id.
Im thinking of another approach, which requires no group by on a potential huge table. That would be to add a like_count column to the user_post table and break the normalization. This column would be always updated when a new liked_post entry gets inserted or deleted. That means: Every time a user likes a post -> there will be an update on the user_post table (increment the like_count column) + insert/delete entity in liked_post table (With a trigger or code in App layer).
Would this aggregation on the fly approach have any disadvantages, except for consistency concerns? This would enable very simple and fast select queries but Im not sure if the additional update would be an issue.
What are your thoughts ?
Im really interested in the performance impact and not if you should do this from the project begin or not.
Your idea is correct and widely used. Problem that you will face:
how do you make sure that like_count is valid? Can this number be delayed or approximated somehow?
In general you can do this following ways
update like_count within application code
update like_count by triggers
If you want to have exact values correct you could accumulate those sums by triggers or do it programatically ensuring that like count update is always within same transaction that insert to liked_posts
Using triggers it could be something like this:
CREATE FUNCTION public.update_like_count() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
UPDATE user_post SET user_post.liked_count = user_post.liked_count + 1
WHERE user_post.id = NEW.post_id;
RETURN NEW;
END;
$$;
CREATE TRIGGER update_like_counts
AFTER INSERT ON public.liked_posts
FOR EACH ROW EXECUTE PROCEDURE public.update_like_count();
Also you should handle AFTER DELETE by separate trigger.
Be aware that depending on transaction isolation level you might enter concurrency problem here (if 2 inserts are done at the same time - like_count may be exactly same number for two transactions) and end up with invalid total.
So I've had a problem similar to this in the past, the solution I went with is similar to what you've described, which is having an aggregated stored value like_count. Like you mentioned the only downside would be consistency concerns however this problem exists even in the former.
The solution to something like this lies more in the application dev, so utilizing something like web-sockets to keep posts up to date, without too much fluff
When a user's browser/client loads a post they join a room with the post id, and when user interacts with a post ( like, dislike etc ) that interaction is broadcasted to all users in that room ( post id ).
Finally when it comes to finding out which users liked this post, you can query/load at the point of when the user clicks to find out. ~ cheers

Selecting records with a huge "where data set"

Background Info
C#
MS MVC 4
Sql Azure
Linq - Identities
Problem at hand:
Selecting records in an Items table where zip code is within a certain range of miles.
Items Table
id (PK)
Title
Body
ZipCode (Int)
Summary of Progress:
I have a class which uses the 2013 US Gazatteer zip code and tabulation areas to gather zip codes and assess distances between zip codes. It is basically a .csv/.txt file that I open into a stream and convert to POCOs in order to process distances. That much of the equation is working fine; however, selecting a list of Items from an Items table based on this list of zip codes is where I'm not sure what to do.
Scenario
User A wants to search for items within a 25 miles radius of area code 46324.
User A hits search and in the background my class returns a list of 124 zip codes within a 25 mile radius.
Question: What is the best way (performance wise) to retrieve items in my Item table using this list of zipcodes?
Possible Solutions
I thought about creating a dynamic query using the tsql in keyword within my where clause and simply supplying this list as the where parameters. This does not seem to be a very performance oriented way of doing this; however, considering my current architecture I do not see any other way.
I also thought about incorporating a sort of paging functionality that will only take the first 5 zip codes to return results followed by the next 5 and so on and so on. This will involve more work but it definitely would seem to be a better performance choice.
Any ideas?
I stumbled across your question purely by chance searching for something else, and also I see it's quite old, but I thought I'd give you a comment none the less:
What I would do in this case is actually allow the database to do the search and the C# to do the calcs. You have a class in C# which calculates the distances? Then why not save the distance from each zip code to each zip code in a "lookup table" in sql.
Doing it this way makes sure that the data is calculated once but you let the sql find the right data for you.
ie:
Create a table with from_zip, to_zip, distance fields
Calculate and populate table once at the beginning
Query by saying "select * from zip_lookup where zip_from = bla and distance between 0 and 100" or something like that

What's the best way to temporarily persist results of a long running SP?

I have a TSQL stored procedure that can run for a few minutes and return a few million records, I need to display that data in an ASP.NET Grid (Infragistics WebDataGrid to be precise). Obviously I don't want return all data at once and need to setup some kind of paging options - every time user selects another page - another portion of data is loaded from the DB. But I can't run the SP every time new page is requested - it would take too much time.
What would be the best way to persist data from the SP, so when user selects a new page - new data portion would be loaded by a simple SELECT... WHERE from that temp data storage?
A few options
One:
If the user only pages forward then you could just hold the connection open and use a DataReader. Just .Read() as needed.
Two:
Create a #temp table using the userID as part of the name to store the results. I don't like this as if user aborts sometimes tables are left over. About 1/2 second hit to create and drop the #temp. Store the entire results or just the PK and create the page detail on demand.
Three:
Use a DataReader to read the the PK into a List<>. It is faster than you would guess. That List is only going to IIS (not to the browser). List can be referenced by ordinal [] and preserves the sort. Get the detail for a page as required. The problem here is where PK in (3,9,2,6) will not return them in that order. I use TVP to pass the order, PK so the page is sorted by order. I do exactly this and get pages loads for objects with 20 properties 40 rows at a time and it takes less than 1/2 second. Do one query per table (NOT one per row) then assemble assign properties in .NET. Use DataReader (not DataTable). And you can even run the reader on a backgroundworker and pass back the first page of PKs using progresschanged.
Have you look at Server Side Paging (article is 2005, but will work with 2008 and CTEs). Also - just wondering, is there any reason you are returning that many rows? I can't see a very good use of a human paging through a million records even if the page size was 1000.

coldfusion - bind a form to the database

I have a large table which inserts data into the database. The problem is when the user edits the table I have to:
run the query
use lots of lines like value="<cfoutput>getData.firstname#</cfoutput> in the input boxes.
Is there a way to bind the form input boxes to the database via a cfc or cfm file?
Many Thanks,
R
Query objects include the columnList, which is a comma-delimited list of returned columns.
If security and readability aren't an issue, you can always loop over this. However, it basically removes your opportunity to do things like locking certain columns, reduces your ability to do any validation, and means you either just label the form boxes with the column names or you find a way to store labels for each column.
You can then do an insert/update/whatever with them.
I don't recommend this, as it would be nearly impossible to secure, but it might get you where you are going.
If you are using CF 9 you can use the ORM (Object Relation Management) functionality (via CFCs)
as described in this online chapter
https://www.packtpub.com/sites/default/files/0249-chapter-4-ORM-Database-Interaction.pdf
(starting on page 6 of the pdf)
Take a look at <cfgrid>, it will be the easiest if you're editing table and it can fire 1 update per row.
For security against XSS, you should use <input value="#xmlFormat(getData.firstname)#">, minimize # of <cfoutput> tags. XmlFormat() not needed if you use <cfinput>.
If you are looking for an easy way to not have to specify all the column names in the insert query cfinsert will try to map all the form names you submit to the database column names.
http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7c78.html
This is indeed a very good question. I have no doubt that the answers given so far are helpful. I was faced with the same problem, only my table does not have that many fields though.
Per the docs EntityNew() the syntax shows that you can include the data when instantiating the object:
artistObj = entityNew("Artists",{FirstName="Tom",LastName="Ron"});
instead of having to instantiate and then add the data field by field. In my case all I had to do is:
artistObj = entityNew( "Artists", FORM );
EntitySave( artistObj );
ORMFlush();
NOTE
It does appear from your question that you may be running insert or update queries. When using ORM you do not need to do that. But I may be mistaken.

Filemaker: Best way to set a certain field in every related record

I have a FileMaker script which calculates a value. I have 1 record from table A from which a relation points to n records of table B. What is the best way to set B::Field to this value for each of these n related records?
Doing Set Field [B::Field; $Value] will only set the value of the first of the n related records. What works however is the following:
Go to Related Record [Show only related records; From table: "B"; Using layout: "B_layout" (B)]
Loop
Set Field [B::Field; $Value]
Go To Record/Request/Page [Next; Exit after last]
End Loop
Go to Layout [original layout]
Is there a better way to accomplish this? I dislike the fact that in order to set some value (model) programmatically (controller), I have to create a layout (view) and switch to it, even though the user is not supposed to notice anything like a changing view.
FileMaker always was primarily an end-user tool, so all its scripts are more like macros that repeat user actions. It nowhere near as flexible as programmer-oriented environments. To go to another layout is, actually, a standard method to manipulate related values. You would have to do this anyway if you, say, want to duplicate a related record or print a report.
So:
Your script is quite good, except that you can use the Replace Field Contents script step. Also add Freeze Window script step in the beginning; it will prevent the screen from updating.
If you have a portal to the related table, you may loop over portal rows.
FileMaker plug-in API can execute SQL and there are some plug-ins that expose this functionality. So if you really want, this is also an option.
I myself would prefer the first variant.
Loop through a Portal of Related Records
Looping through a portal that has the related records and setting the field has a couple of advantages over either Replace or Go To Record, Set Field Loop.
You don't have to leave the layout. The portal can be hidden or place off screen if it isn't already on the layout.
You can do it transactionally. IE you can make sure that either all the records get edited or none of them do. This is important since in a multi-user networked solution, records may not always be editable. Neither replace or looping through the records without a portal is transaction safe.
Here is some info on FileMaker transactions.
You can loop through a portal using Go To Portal Row. Like so:
Go To Portal Row [First]
Loop
Set Field [B::Field; $Value]
Go To Portal Row [Next; Exit after last]
End Loop
It depends on what you're using the value for. If you need to hard wire a certain field, then it doesn't sound like you've got a very normalised data structure. The simplest way would be a calculation in TableB instead of a stored field, or if this is something that is stored, could it be a lookup field instead that is set on record creation?
What is the field in TableB being used for and how?