Postgres full text search across multiple related tables - postgresql

This may be a very simplistic question, so apologies in advance, but I am very new to database usage.
I'd like to have Postgres run its full text search across multiple joined tables. Imagine something like a model User, with related models UserProfile and UserInfo. The search would only be for Users, but would include information from UserProfile and UserInfo.
I'm planning on using a gin index for the search. I'm unclear, however, on whether I'm going to need a separate tsvector column in the User table to hold the aggregated tsvectors from across the tables, and to setup triggers to keep it up to date. Or if it's possible to create an index without a tsvector column that'll keep itself up to date whenever any of the relevant fields in any of the relevant tables change. Also, any tips on the syntax of the command to create all this would be much appreciated as well.

Your best answer is probably to have a separate tsvector column in each table (with an index on, of course). If you aggregate the data up to a shared tsvector, that'll create a lot of updates on that shared one whenever the individual ones update.
You will need one index per table. Then when you query it, obviously you need multiple WHERE clauses, one for each field. PostgreSQL will then automatically figure out which combination of indexes to use to give you the quickest results - likely using bitmap scanning. It will make your queries a little more complex to write (since you need multiple column matching clauses), but that keeps the flexibility to only query some of the fields in the cases where you want.
You cannot create one index that tracks multiple tables. To do that you need the separate tsvector column and triggers on each table to update it.

Related

Ignoring space characters when linking tables

I’m experiancing a problem when trying to link to tables in the database expert. The two fields that link the tables have exactly the same information except one table always has an additional space. For example;
Table 1 = Multivitamin/Tablets
Table 2 = Multivitamin//Tablets
‘/‘ are representing spaces
Formulas won’t help (e.g. extractstring etc) as it’s the tables themselves I need to link together
This is preventing me from retrieving the information I need. Any advice on how I can get around this?
There are some ways to come across this:
Consider using a command as datasource instead of tables. When writing the query of the command you can define the join condition yourself.
If you have access to the data source, you could add a calculated field to the tables to contain the normalized field values and then use these for linking in CR.
Alternatively, one could create views in the database, either adding normalized "linking fields" or providing the joined tables results.
If it's only a few rows in CR, you could consider using SQL fields or subreports to retrieve data from Table 2.

Is it relevant to put "version" on a separate sql server table?

I have a table with several fields, this table almost never change but for one field, "version" which change very often.
Would it be relevant to put that single field into a separate table in order to reduce how often locks are put on the main table?
For instance I have a table tType and a table tEntry.
Whenever I add/deleted/update any row of tEntry, I need to update the "version" field of tType. There might be thousand of rows inside tEntry for a single tType referenced row. Meaning the version number could change very often, though any other data of tType (such as name, id, etc.) doesn't change.
Your Referral to tType and tEntry sounds like you are implementing a key-value store in a rdbms. There are several discussions you can google about this topic. In the www there seems to be consesus, that cons overweight pros on that. An option would be to look at key value stores, no sql, multi column DBs, etc (wikipedia)...
The next "anti-pattern" I recognized is that you try to mix transactional data with 'master data' in the table tType. Try to avoid this, even if your selects get more uncomfortable and need to be tuned better. Keep off the version info from the tType, if this changes extremely often. Look here to get the concept: MySQL JOIN the most recent row only?

FullText Index - Searching values from another table

Is it possible, in SQL Server 2008, using the full text index syntax, to run a query such as this one?
SELECT *
FROM TABLE_TO_SEARCH S,
TABLE_WITH_STRINGS_TO_SEARCH SS
WHERE
CONTAINS(S.WHOLE_NAME,SS.FIRST_NAME)
OR CONTAINS(S.WHOLE_NAME,SS.LAST_NAME)
I need to search for the FIRST_NAME in table TABLE_TO_SEARCH, column WHOLE_NAME that has an full text index on it. It doesn't seem to be a valid query though... Is there any workaround to it by using the full text index search?
LATER EDIT:
Here is the business case: each night I am downloading from several websites information about "blacklisted" individuals and insert it into a table in this format: WholeName, LastName, FirstName, MiddleName. But the data is chaotic as WholeName does not necessarily contain either the last, first or middle name or the WholeName is null while the other 3 fields have values, or every of these 4 fields is null and so on. Also, the data may repeat itself as one blacklisted individual may come from 2+ of these websites. What I need to do is to compare this data, as chaotic as it is, against our customer data based on our customer's First and Last name and give it a matching score (rank) against the files we download from these websites.
First I tried with charindex or like operators but I couldn't create a scoring algorithm based on this and also it took 6 hours to compare just our customer's first and last name with only the WholeName column from the TABLE_TO_SEARCH table. I thought that perhaps implementing the full_text index it would get easier and faster but ... apparently I was wrong.
Has anyone dealt with a task like this? And if so, what was the best approach?
After skimming http://technet.microsoft.com/en-us/library/ms187787.aspx and http://technet.microsoft.com/en-us/library/ms142571.aspx I don't think it is possible to do your search in this way. Not only that, but it seems this type of index wouldn't work well with names anyway.
If you care about checking one name then all you have to do is set those values to variables. This method would allow you to use the full-text index.
Otherwise, I would suggest splitting the WHOLE_NAME column (if there is a space or unique character between the first and last name) and comparing each part to those other columns. If you are working with a huge data set, you may want to experiment with doing this at a temp table level and creating an index.
Good luck!

Structure a dynamoDB table to enable ASC or DESC ordered pagination on * items in a table

I want to ORDER_BY by time/date, and paginate through all items in a table. Scan seems designed to paginate through everything, but does not seem to have a "ASC/DESC" equiv. Query has ScanIndexForward but requires specific primary keys. (no way to SELECT * ?)
Based on the first comment of this question I'm thinking the only way to achieve this is to use a common primary key (!?) and then Query based on that, focusing on the Range key. Is this really how it's supposed to work? I'd have to make a whole separate table with mirrored attributes if I wanted to Query an individual item based on a unique primary key.
Please excuse my NoSQL noobness. I'm a front-end dev who's only dabbled in MySQL and SimpleDB.
Yes, this is what Query is for. The hash key identifies the list of things to page over, and the range key indicates the position within the list. If you can tolerate the latency hit, all you need to store in the table is primary keys where all the data being paged over lives, you can then issue a BatchGetItem to read a pageful of data in parallel.
Duplicate data isn't the sin in NoSQL that it is in the relational model, you're essentially crafting a MySQL style index by hand.

iPhone Dev - Trying to access every row of a sqlite3 table sequentially

this is my first time using SQL at all, so this might sound basic. I'm making an iPhone app that creates and uses a sqlite3 database (I'm using the libsqlite3.dylib database as well as importing "sqlite3.h"). I've been able to correctly created the database and a table in it, but now I need to know the best way to get stuff back from it.
How would I go about retrieving all the information in the table? It's very important that I be able to access each row in the order that it is in the table. What I want to do (if this helps) is get all the info from the various fields in a single row, put all that into one object, and then store the object in an array, and then do the same for the next row, and the next, etc. At the end, I should have an array with the same number of elements as I have rows in my sql table. Thank you.
My SQL is rusty, but I think you can use SELECT * FROM myTable and then iterate through the results. You can also use a LIMIT/OFFSET(1) structure if you do not want to retrieve all elements at one from your table (for example due to memory concerns).
(1) Note that this can perform unexpectedly bad, depending on your use case. Look here for more info...
How would I go about retrieving all the information in the table? It's
very important that I be able to access each row in the order that it
is in the table.
That is not how SQL works. Rows are not kept in the table in a specific order as far as SQL is concerned. The order of rows returned by a query is determined by the ORDER BY clause in the query, e.g. ORDER BY DateCreated, or ORDER BY Price.
But SQLite has a rowid virtual column that can be used for this purpose. It reflects the sequence in which the rows were inserted. Except that it might change with a VACUUM. If you make it an INTEGER PRIMARY KEY it should stay constant.
order by rowid