As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am looking for a document DB supporting Windows XP 32 bits, satisfying the following requirements:
The support must not be discontinued. I.e. I want to be able to install the most recent version of the DB. MongoDB does not fit, since they dropped supporting XP and CouchDB does not fit since they dropped supporting any Windows 32 bits.
It should be relatively simple. Obviously, the application is not an enterprise one, so a complex DB, like Cassandra, is out. In fact, I would like to avoid column databases, since I think they exist to solve enterprise level problems, which is not the case here. On the other hand, I do not want relational DBs, because I want to avoid DB upgrades each time new fields are added (and they will be added).
It should support indexing on part of the document, like MongoDB. I could use a relational DB, like hsqldb to store the data as json string. This lets adding new fields easy - no schema needs to be changed. But these fields would not be indexable by the database. Again, unlike MongoDB.
Finally, the DB will run on the same machine as the application itself - one more down for MongoDB, which would steal all the RAM from the application to itself.
So, in a sense, I am looking for something like MongoDB, but with the support of Windows XP 32 bits.
Any advices?
P.S.
I know that Windows XP has one year to live before MS drops supporting it. However, I have to support XP anyway.
With HSQLDB and some other relational databases, you store the document as a CLOB. This clob can be accessed via a single table which contains the index for all the indexed fields. For example
CREATE TABLE DATAINDEX(DOCID BIGINT GENERATED BY DEFAULT AS IDENTITY, FIELDNAME VARCHAR(128), FIELD VARCHAR(10000),
DOCUMENT CLOB, PRIMARY KEY (DOCID, FIELDNAME))
CREATE INDEX IDS ON (FIELDNAME, FIELD);
The whole document is the CLOB. A copy of selected fields that need an index for searching is stored in the (fieldnname, field) columns. The rows with the same DOCID will have the same CLOB in the DOCUMENT column. One row is inserted with the first field and the clob, then it is duplicated by selecting and inserting the existing DOCID and clob with the second field and so on.
-- use this to insert the CLOB with the first field
INSERT INTO DATAINDEX VALUES DEFAULT, 'f1', 'fieldvalue 1', ?
-- use this to insert the second, third and other fields
INSERT INTO DATAINDEX VALUES
IDENTITY(), 'f2', 'filedvalue 2',
(SELECT DOCUMENT FROM DATAINDEX WHERE DOCID = IDENTITY() LIMIT 1)
The above is just one example. You can create your own DOCID. The principle is to use the same DOCID and to insert the first row with the CLOB. The second and third rows select the DOCID and the clob from the previously inserted row to create new rows with the other fields. You will probably use JDBC parameters to insert into the FIELDNAME and FIELD columns.
This allows you to perform searches such as:
SELECT DOCID, DOCUMENT FROM DATAINDEX
WHERE FIELDNAME = 'COMPANY NAME' AND FIELD LIKE 'Corp%'
This may not satisfy all your requirements, but the answer is intended to cover what is possible with HSQLDB.
Which programming framework are you using? If .NET is a possibility you can try RavenDB. It can be used as both an embedded and standalone database.
For Java you can try out OrientDB. It is also embeddable: https://github.com/nuvolabase/orientdb/wiki/Embedded-Server
Related
I have a Postgre table “tasks” with the fields “start”:timestamptz, “finish”:timestamptz, “type”:int (and a lot of others). It contains about 200m records. Start, finish and type fields have a separate b-tree indexes.
I’d like to build a report “Tasks for a period” and need to get all tasks which lay (fully or partially) inside the reporting period. Report could be built for all task types or for the specific one.
So I wrote the SQL:
SELECT * FROM tasks
WHERE start<={report_to}
AND finish>={report_from}
AND ({report_tasktype} IS NULL OR type={report_tasktype})
and it runs for ages even on short reporting periods.
Please advice if there a way to improve performance by altering the query or by creating new indexes on the table? For some reasons I can’t change the structure of the “tasks” table
You would want a GiST index on the range. Since you already have it stored as two end points rather than as a range, you could do a functional index to convert them on the fly.
ON task USING GIST (tstzrange(start,finish))
And then compare the ranges for overlap with &&
It may also improve things to add "type" as a second column to the index, which would require the btree_gist extension.
This question already has answers here:
Why postgres returns unordered data in select query, after updation of row?
(2 answers)
Postgresql: row number changes on update
(1 answer)
What is the default select order in PostgreSQL or MySQL?
(4 answers)
Different Default ordering between ORACLE and PostgreSQL
(1 answer)
Closed 2 years ago.
I had this question on the job interview in some smell company. Now I understand that it's a wrong question itself.
My suggestion was that the PostgreSQL is copying the row and then deleting the previous one, i.e. using transaction, and then sorting rows by system hidden index.
They said that it's not a right answer, but they didn't say the right answer anyway because the interview was like a ping-pong with a high speed in one direction.
I've asked guys from core-team who make PostgreSQL in IRC. They said that the result order could be unpredictable and gave me documentation of PostgreSQL in very low-level in C so I didn't understand anything.
Now I found this statement https://www.postgresql.org/docs/current/sql-select.html :
If the ORDER BY clause is specified, the returned rows are sorted in
the specified order. If ORDER BY is not given, the rows are returned
in whatever order the system finds fastest to produce. (See ORDER BY
Clause below.)
Okay, but what if we use very simple table without any relations in the scheme, like
USER (id, name). What the point is there? Why the updated row would be on the last place? What should I answer on the interview?
First of all, not only PostgreSQL, but all other databases, the order of tuples isn't guaranteed. When you do insertion, each tuple is been written to page in PostgreSQL. Then, the page will be written into the disk and following the corresponding file system mechanism. The newly updated row isn't necessary to be the last place, unless the ORDER BY is specified. The reason the newly updated row appear at the last place, that might be the fact of LRU replacement policy is been adopted in that file system.
I am trying to index already created columns with over 5 million data in my table. My question is if I add index with the migration will the already created data be indexed as well ? Or do I need to re-index the created data if so how ?
This is my migration
add_index :data_prods, :date_field
add_index :data_prods, :entity_id
Thank you.
Edit
I am using PostgreSQL dbms.
The process of adding an index re-indexes the entire tables contents. A table with 5 million rows may take some time, I suggest testing in a staging environment (with a similar amount of data) to see how long this migration will take, as well as impact to the application.
Re: your comment about improving query times
Indexes will make queries faster, where the indexed columns are commonly referenced in "where" clauses. In your case, any query where you filter by date_field OR entity_id will be faster, but other queries will not be improved. It should be noted that each query will only use 1 index, if the majority of your queries use both date_field AND entity_id at the same time to filter data, you might be better off using a composite index. Id check out this post for further reading on composite indexes.
Index on multiple columns in Ruby on Rails
I am using Postgresql database for our project and doing some performance testing. We need to insert millions of record with indexed columns. We have 5 columns in table. I created index on integer only then performance is good but when I created index on text column as well then the performance reduced to 1/8th times. My question is how I can improve performance when inserting data using index on text column?
Short answer is you can't.
It is well known that adding indexes on db columns is like a 2 edged sword:
on one (positive) side it adds improved speed to you read queries
on the other, it adds performance penalty to insert/update/delete operations and your data will occupy a little more disk space
A possible solution would be to use some full text search engines like Sphinx which will index your text entities in your DB
This question already has answers here:
Change postgres to case insensitive
(2 answers)
Closed last year.
I'm developing an app in Rails on OS X using PostgreSQL 8.4. I need to setup the database for the app so that standard text queries are case-insensitive. For example:
SELECT * FROM documents WHERE title = 'incredible document'
should return the same result as:
SELECT * FROM documents WHERE title = 'Incredible Document'
Just to be clear, I don't want to use:
(1) LIKE in the where clause or any other type of special comparison operators
(2) citext for the column datatype or any other special column index
(3) any type of full-text software like Sphinx
What I do want is to set the database locale to support case-insensitive text comparison. I'm on Mac OS X (10.5 Leopard) and have already tried setting the Encoding to "LATIN1", with the Collation and Ctype both set to "en_US.ISO8859-1". No success so far.
Any help or suggestions are greatly appreciated.
Thanks!
Update
I have marked one of the answers given as the correct answer out of respect for the folks who responded. However, I've chosen to solve this issue differently than suggested. After further review of the application, there are only a few instances where I need case-insensitive comparison against a database field, so I'll be creating shadow database fields for the ones I need to compare case-insensitively. For example, name and name_lower. I believe I came across this solution on the web somewhere. Hopefully PostgreSQL will allow similar collation options to what SQL Server provides in the future (i.e. DOCI).
Special thanks to all who responded.
You will likely need to do something like use a column function to convert your text e.g. convert to uppercase - an example :
SELECT * FROM documents WHERE upper(title) = upper('incredible document')
Note that this may mess up performance that used index scanning, but if it becomes a problem you can define an index including column functions on target columns e.g.
CREATE INDEX I1 on documents (upper(title))
With all the limitations you have set, possibly the only way to make it work is to define your own = operator for text. It is very likely that it will create other problems, such as creating broken indexes. Other than that, your best bet seems to be to use the citext datatype; that would still let the ORM stuff you're using generate the SQL.
(I am not mentioning the possibility of creating your own locale definition because I haven't ever heard of anyone doing it.)
Your problem and your exclusives are like saying "I want to swim, but I don't want to have to move my arms.".
You will drown trying.
I don't think that is what local or encoding is used for. Encoding is more for picking a character set and not determining how to deal with characters. If there were a setting it would be in the config, but I haven't seen one.
If you do not want to use ilike for fear of not being able to port to another database then I would suggest you look into what ORM options might be available with ActiveRecord if you are using that.
here is something from one of the top postgres guys: http://archives.postgresql.org/pgsql-php/2003-05/msg00045.php
edit: fixed specific references to locale.
SELECT * FROM documents WHERE title ~* 'incredible document'