iPhone Dev - Trying to access every row of a sqlite3 table sequentially - iphone

this is my first time using SQL at all, so this might sound basic. I'm making an iPhone app that creates and uses a sqlite3 database (I'm using the libsqlite3.dylib database as well as importing "sqlite3.h"). I've been able to correctly created the database and a table in it, but now I need to know the best way to get stuff back from it.
How would I go about retrieving all the information in the table? It's very important that I be able to access each row in the order that it is in the table. What I want to do (if this helps) is get all the info from the various fields in a single row, put all that into one object, and then store the object in an array, and then do the same for the next row, and the next, etc. At the end, I should have an array with the same number of elements as I have rows in my sql table. Thank you.

My SQL is rusty, but I think you can use SELECT * FROM myTable and then iterate through the results. You can also use a LIMIT/OFFSET(1) structure if you do not want to retrieve all elements at one from your table (for example due to memory concerns).
(1) Note that this can perform unexpectedly bad, depending on your use case. Look here for more info...

How would I go about retrieving all the information in the table? It's
very important that I be able to access each row in the order that it
is in the table.
That is not how SQL works. Rows are not kept in the table in a specific order as far as SQL is concerned. The order of rows returned by a query is determined by the ORDER BY clause in the query, e.g. ORDER BY DateCreated, or ORDER BY Price.
But SQLite has a rowid virtual column that can be used for this purpose. It reflects the sequence in which the rows were inserted. Except that it might change with a VACUUM. If you make it an INTEGER PRIMARY KEY it should stay constant.
order by rowid

Related

Postgres count(*) optimization idea

I'm currently working on a project involving keeping track of users and their actions with my database (PostgreSQL as the RDMS), and I have run into an issue when trying to perform COUNT(*) on occurrences of each user. What I want is to be able to, efficiently, count the number of times each user appears from every record, and also be able to achieve looking at counts on a particular date range.
So, the problem is how do we achieve counting the total number of times a user appears from the tables contents, and how do we count the total number on a date range.
What I've tried
As you might know, Postgres doesn't support COUNT(*) very well using indices, so we have to consider other ways to reduce the # of records it looks at in order to speed up the query. So my first approach is to create a table to keep track of the number of times a user has a log message associated with them, and on what day (similar to the idea behind a materialized view, but I dont want continually refresh the materialized view with my count query). Here is what I've come up with:
CREATE TABLE users_counts(user varchar(65536), counter int default 0, day date);
CREATE RULE inc_user_date_count
AS ON INSERT TO main_table
DO ALSO UPDATE users_counts SET counter = counter + 1
WHERE user = NEW.user AND day = DATE(NEW.date_);
What this does is every time a new record is inserted into my 'main_table', we update the current users_counts table to increment the records whose date is equal to the new records date, and the user names are the same.
NOTE: the date_ column in 'main_table' is a timestamp so I must cast the new records date_ to be a DATE type.
The problem is, what if the user column value doesn't already exist in my new table 'users_count' for the current day, then nothing is updated.
Here is my question:
How do I write the rule such that we check if a user exists for the current day, if so increment that counter, otherwise insert new row with user, day, and counter of 1;
I also would like to know if my approach makes sense to do, or is there any ideas I am missing that I just haven't thought about. As my database grows, it is increasingly inefficient to perform counting, so I want to avoid any performance bottlenecks.
EDIT 1: I was able to actually figure this out by creating a separate RULE but I'm not sure if this is correct:
CREATE RULE test_insert AS ON INSERT TO main_table
DO ALSO INSERT INTO users_counts(user, counter, day)
SELECT NEW.user, 1, DATE(NEW.date)
WHERE NOT EXISTS (SELECT user FROM users.log_messages WHERE user = NEW.user_);
Basically, an insert happens if the user doesn't already exist in my CACHED table called user_counts, and the first rule above updates the count.
What I'm unsure of is how do I know when which rule is called first, the update rule or insert.. And there must be a better way, how do I combine the two rules? Can this be done with a function?
It is true that postgresql is notoriously slow when it comes to count(*) queries. However if you do have a where clause that limits the number of entries the query will be much faster. If you are using postgresql 9.2 or newer this query will be just as fast as it's in mysql because of index only scans which was added in 9.2 but it's best to explain analyze your query to make sure.
Does my solution make sense?
Very much so provided that your explain analyze show that index only scans are not being used. Trigger based solutions like the one that you have adapted find wide usage. But as you have realized the problem with the initial state arises (whether to do an update or an insert).
which rule is called first
Multiple rules on the same table and same event type are applied in
alphabetical name order.
from http://www.postgresql.org/docs/9.1/static/sql-createrule.html
the same applies for triggers. If you want a particular rule to be executed first change it's name so that it comes up higher in the alphabetical order.
how do I combine the two rules?
One solution is to modify your rule to perform an upsert (Look right at the bottom of that page for a sample upsert ). The other is to populate the counter table with initial values. The trick is to create the trigger at the same time to avoid errors. This blog post explains it really well.
While the initial setup will be slow each individual insert will probably be faster. The two opposing factors being the slowness of a WHERE NOT EXISTS query vs the overhead of catching an exception.
Tip: A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don't
use EXCEPTION without need.
Source the postgresql documentation page linked above.

Is it relevant to put "version" on a separate sql server table?

I have a table with several fields, this table almost never change but for one field, "version" which change very often.
Would it be relevant to put that single field into a separate table in order to reduce how often locks are put on the main table?
For instance I have a table tType and a table tEntry.
Whenever I add/deleted/update any row of tEntry, I need to update the "version" field of tType. There might be thousand of rows inside tEntry for a single tType referenced row. Meaning the version number could change very often, though any other data of tType (such as name, id, etc.) doesn't change.
Your Referral to tType and tEntry sounds like you are implementing a key-value store in a rdbms. There are several discussions you can google about this topic. In the www there seems to be consesus, that cons overweight pros on that. An option would be to look at key value stores, no sql, multi column DBs, etc (wikipedia)...
The next "anti-pattern" I recognized is that you try to mix transactional data with 'master data' in the table tType. Try to avoid this, even if your selects get more uncomfortable and need to be tuned better. Keep off the version info from the tType, if this changes extremely often. Look here to get the concept: MySQL JOIN the most recent row only?

Silverlight WCF RIA Service select from SQL View vs SQL Table

I have arrived at this dilemma via a tortuous and frustrating route, but I'll start with where I am right now. For information I'm using VS2010, Silverlight 5 and the latest versions of the Silverlight and RIA Toolkits, SDKs etc.
I have a view in my database (it's actually now an indexed view, but that has made no difference to the behaviour). For testing purposes (and that includes testing my sanity) I have duplicated the view as a Table (ie identical column names and definitions), and inserted all the view rows into the table. So if I SELECT * from the view or the table in Query Analyzer, I get identical results. So far so good.
I create an EDF model in my Silverlight Business Application web project, including all objects.
I create a Domain Service based on the model, and it creates ContextTypes and metadata for both the View and the Table, and associated Query objects.
If I populate a Silverlight ListBox in my Silverlight project via the Table Query, it returns all the data in the table.
If I populate the same ListBox via the View Query, it returns one row only, always the first row in the collection, however it is ordered. In fact, if I delve into the inner workings via the debugger, when it executes the ObjectContext Query in the service, it returns a result set of the correct number of rows, but all the rows are identical! If I order ascending I get n copies of the first row, descending I get n copies of the last row.
Can anyone put me out of my misery here, and tell me why the View doesn't work?
Ade
OK, well that was predictable - nearly every time I ask a question on a forum I stumble across the answer while I'm waiting for responses to flood in!
Despite having been through the metadata and model.designer files and made sure that all "view" and "table" class/method definitions etc were identical, it was still showing the exasperating difference in behaviour between view and table queries. So the problem just had to be caused by the database, right?
Sure enough, I hadn't noticed myself creating NOT NULL columns when I created the "identical" Table version of my view! Even though I was using a SELECT NEWID() to create a unique key column on the view, the database insisted that the ID column in the view was NULLABLE, and it was apparently this which was causing the problem.
To save some storage space I switched from using NEWID() to using ROW_NUMBER() to create my key column, but still had the "NULLABLE" property problem. SO I then changed it to
SELECT ISNULL(ROW_NUMBER() (OVER...) , -1)
for the ID column, and at last the column in the view was created NOT NULL! Even though neither NEWID() nor ROW_NUMBER() can ever generate NULL output, it seems you have to hold SQL Server's hand and reassure it by using the ISNULL operator before it will believe itself.
Having done this, deleted/recreated my model and service files, everything burst into glorious technicolour life without any manual additions of [Key()] properties or anything else. The problem had been with the database all along, and NOT with the Model/Service/Metadata definitions.
Hope this saves someone some time. Now all I need to do is work out why the original stored procedure method I started with two days ago doesn't work - but at least I now have a hint!
Ade

Postgres default sort by id - worldship

I need to setup worldship to pull from one of our postgres databases. I need to have it so that the packages are sorted by id. I have no way (that i am aware of) of having worldship send the order by clause so I need to have the default for the records returned to be returned by id.
On a second note I have no idea how postgres default sorts it looks like it by the last time the record was changed so if i write a two records id 1,2 then change record 2 when I run the query it returns them with record 2 being first.
Rows are returned in an unspecified order, per sql specs, unless you add an order by clause. In Postgres, that means you'll get rows in, basically, the order that live rows read on the disk.
If you want a consistent order without needing to add an order by clause, create a view as suggested in Jack's comment.
There is no such thing as a "default sort". Rows in a table are not sorted.
You could fake this with a view (as suggested by Jack Maney) there is no way you can influence the order of the rows that are returned.
But if you do that, be aware that adding an additional ORDER BY to a SELECT based on that view will sort the data twice.
Another option might be to run the CLUSTER command on that table to physically order the rows on the disk according to the column you want. But this sill does not guarantee that the rows are returned in that order. Not even with a plain SELECT * FROM your_table (but chances are reasonably high for that).
You will need to re-run this statement on a regular basis because the order created by the CLUSTER command is not automatically maintained.
For what it's worth, which probably isn't much, from my testing, it appears that PostgreSQL's "default" ordering is based on the time the records were last updated. The most recently updated records will appear last. Note that I couldn't find any documentation to support this. It's just what I've found from my own testing.
You could eventually use a sorted index, which should guarantee you order of retrieved rows in case the query plan hits the index, or if you force it, but this approach will be more than circuitous :). ORDER BY clause is the way to go as mentioned already.

Postgres full text search across multiple related tables

This may be a very simplistic question, so apologies in advance, but I am very new to database usage.
I'd like to have Postgres run its full text search across multiple joined tables. Imagine something like a model User, with related models UserProfile and UserInfo. The search would only be for Users, but would include information from UserProfile and UserInfo.
I'm planning on using a gin index for the search. I'm unclear, however, on whether I'm going to need a separate tsvector column in the User table to hold the aggregated tsvectors from across the tables, and to setup triggers to keep it up to date. Or if it's possible to create an index without a tsvector column that'll keep itself up to date whenever any of the relevant fields in any of the relevant tables change. Also, any tips on the syntax of the command to create all this would be much appreciated as well.
Your best answer is probably to have a separate tsvector column in each table (with an index on, of course). If you aggregate the data up to a shared tsvector, that'll create a lot of updates on that shared one whenever the individual ones update.
You will need one index per table. Then when you query it, obviously you need multiple WHERE clauses, one for each field. PostgreSQL will then automatically figure out which combination of indexes to use to give you the quickest results - likely using bitmap scanning. It will make your queries a little more complex to write (since you need multiple column matching clauses), but that keeps the flexibility to only query some of the fields in the cases where you want.
You cannot create one index that tracks multiple tables. To do that you need the separate tsvector column and triggers on each table to update it.