I couldn't find enough information on Postgres documentations but curious to know how does btree index look like for a postgres varchar column.
Any links / explanations might be helpful.
PS: Sorry for the vague question
One good place to start is in the code ( and its contained documentation ).
Download here:
http://www.postgresql.org/ftp/source/v9.4.1/
unpack the archive and look for the README under src/backend/access/nbtree/. That should give you quite a good introduction into what they're up to.
If you're ambitious and can read C code, you can have a look at the implementation in there too. Hope this helps.
Related
I'm having trouble finding information about how to look up records by an index using sequelize/postgres for node.js.
The only documentation of indexes appears to be here: http://sequelizejs.com/documentation#migrations-functions
To illustrate what I'm asking, let's take a simple model where there is a person there are Persons, Projects, and Tasks. Each person references a number of assigned tasks, and each project has a number of assigned tasks. Each task has a back-reference to the project and person. We'll assume that each person only has one task per project.
Let's say I have a person and project, and I need to find if there is a task associated. I've tried implementing this through an index on task of person/project.
I've found through searches that you can also create indexes through the slightly unintuitive syntax:
global.db.sequelize.getQueryInterface().addIndex('Tasks',
['ProjectId', 'PersonId'],
{
indexName: 'IndexName',
indicesType: 'UNIQUE'
}
This seems to work, and the index is created. However, I can't find a reference anywhere in the docs or even on the internet about how to use this index to find the task.
Any suggestions?
You have a fundamental misunderstanding of how a RDBMS is supposed to work.
It is supposed to pick the best indexes for each query based upon the pattern of database access required. This is performed by the "planner" in the RDBMS.
Some terms you will find useful to search against as you use PostgreSQL:
- Primary Key
- Foreign Key
- Constraint (both the above are these)
- EXPLAIN ANALYSE (or ANALYZE depending on your dialect of English)
- http://explain.depesz.com/ - a useful site to colour the above explains
- pg_dump / pg_restore - make sure you can use these tools to backup your database
Finally, make yourself a good hot cup of tea or coffee and sit down and at least skim through the PostgreSQL manuals. At least it will give you an idea of where to find further information.
Good Luck!
True, I'm coming from Cache's database structure, which very few people actually use.
I think the best answer to the question is that you just do the lookup as normal, and PostgreSQL takes care of the rest. Good to know!
I would like to create an application which searches for similar documents in its database; eg. the user uploads a document (text, image, etc.), and I would like to query my application for similar ones.
I have already created the neccesseary algorithms for the process (fingerprinting, feature extraction, hashing, hash compare, etc.), I'm looking for a framework, which couples all of these.
For example, if I would implement it in Lucene, I would do the following:
Create a custom "tokenizer" and "stemmer" (~ feature extraction and fingerprinting)
Than adding the created elements to the Lucene index
And finally using the MoreLikeThis class to find the similar documents
So, basically Lucene might be a good choice - but as far as I know, Lucene is not meant to be a document similarity search engine, but rather a term-based searchengine.
My question is: are the any applications/frameworks, which might fit for the above mentioned problem?
Thanks,
krisy
UPDATE: It seems like the process I described above is called Content Based Media (Sound, Image, Video.) Retrieval.
There are many projects that use Lucene for this, see: http://wiki.apache.org/lucene-java/PoweredBy (Lire, Alike, etc.), but still didn't found any dedicated framework ...
Since you're using Lucene, you might take a look at SOLR. I do realize it's not a dedicated framework for your purpose either, but it does add stuff on top of Lucene that comes in quite handy. Given the pluggability of Lucene, its track record and the fact that there are a lot of useful resources out there, SOLR might help you get your job done.
Also, the answer that #mindas pointed to, links to the blog post describing the technical details at how to accomplish your goal with SOLR (but you probably already read that in meantime).
If I am getting correctly you have your own database, and you are searching if its duplicate, or copy/similar, in database while/after user uploads.
If That is the case, the domain is very big in comparison..
1) For Image you must use pattern matching, there are few papers available for image duplicate finder, on net, search for them you will get many options for that,
2) for Document there is again characteristically division
DOC(x)
PDF
TXT
RTF, etc..
Each document carry different property, now here Lucene may help you but its search engine,
While searching for Language pattern, there are many things we need to check, as you are searching for similar(not exact same).
So, fuzzy language program will come handy.
This requirement is too large that the forum page will not be enough to explain everything anyways, I hope this much will do
I am trying to prepare autocomplete with correction with Sphinx and found this article: http://www.ivinco.com/blog/sphinx-in-action-did-you-mean/ , what is very helpful but not so easy to understand.
The questions I have are:
How to build this bigrams and trigrams from keywords I already have in my sphinx index?
How to prepare the quesry to the Sphinx deamon and not SphinxSE like in the example article.
Does anyone have made such spelling "Did you mean ..." project with Sphinx?
Should look in the misc/suggest/ folder in the download.
Also viewable here http://sphinxsearch.googlecode.com/svn/trunk/misc/suggest/
I am green on postgresql, so when reading source code of pg, I am very confused....Is there some useful material on postgresql source code? Thank you.
There are some nice presentations about some basic concepts like Datum, V1 Functions Calls and source code
http://www.postgresql.org/developer/coding
http://www.postgresql.org/files/developer/internalpics.pdf
this master thesis is very good document http://www.ic.unicamp.br/~celio/livrobd/postgres/ansi_sql_implementation_postgresql.pdf
http://www.postgresql.org/developer/ext.backend_dirs.html
It seems you are also unversed in using the internet!? ;-) First look should be the project homepage http://www.postgresql.org/. There you will find a "Developers" link which directs you to the available resources. One of them is the Developer FAQ which should be more than sufficient for the beginning.
Seems very usefull book - The Internals of PostgreSQL
A useful link can be found here for Postgres 14 Postgres Professional. It discusses about Isolation, MVCC, Buffer Cache, WAL, Locks and Query Execution.
Also refer a 2 day free course here PostgresPro Course
I have a large table with a text field, and want to make queries to this table, to find records that contain a given substring, using ILIKE. It works perfectly on small tables, but in my case it is a rather time-consuming operation, and I need it work fast, because I use it in a live-search field in my website. Any ideas would be appreciated...
Check Waiting for 9.1 – Faster LIKE/ILIKE blog post from depesz for a solution using trigrams.
You'd need to use yet unreleased Postgresql 9.1 for this. And your writes would be much slower then, as trigram indexes are huge.
Full text search suggested by user12861 would help only if you're searching for words, not substrings.
You probably want to look into full text indexing. It's a bit complicated, maybe someone else can give a better description, or you might try some links, like this one for example:
http://wiki.postgresql.org/wiki/Full_Text_Indexing_with_PostgreSQL