I'm using Elastic search to store large amount of data to make it searchable, but for configuration items I'm still using HSQL DB.
Is it possible to eliminate HSQL DB completely and use my existing Elastic search in combination with Crate DB?
Things I have tried:
tried connecting to my existing Elastic search using Crate driver and Crate client but I got an exception No handler found for action "crate_sql". Does that mean I cannot use my existing ES and have to use inbuilt ES in crateDB??
After connecting to crateDB elastic search (and not my existing ES). I was able to get connection using CrateDriver and run SQL queries. But in one of module I'm creating table using below command:
create table some_table_name
(
id VARCHAR(256),
userName VARCHAR(256),
fieldName VARCHAR(256),
primary key (id),
unique (userName, fieldName)
);
...but then I got an exception:
io.crate.action.sql.SQLActionException: line 1:28: no viable alternative at input 'VARCHAR'
Does that mean I cannot write create table query using SQL syntax and SQL data types?
I know it will work if I use string data type instead of varchar, but I don't want to change all those queries now.
1)
No you cannot use existing ES nodes together with Crate. The whole SQL analyzer/planner/execution layer is done server side, not client side. In fact the crate clients are rather dumb.
2)
You'll have to change the types and also remove / change anything that isn't supported by crate. For example defaults or unique constraints aren't supported (up to 0.39 - in the future support might be added)
In your case the varchar type isn't valid in Crate, instead you'll have to use "string".
See Data Types Documentation for a list of supported data types.
Related
I'm seeking some advice.
I've migrated a database from SQL Server to Aurora PostgreSQL using AWS DMS. In most of the tables in SQL Server, the primary keys are a uniqueidentifier (GUID). When migrated to Postgres these columns are converted to VARCHAR(36). This seems to be as expected, per the AWS DMS documentation.
In our .NET application, we use Entity Framework 6, which I have added a new dbContext to use the npgsql provider. Note that we are still keeping existing SQL Server EF6 providers. Essentially, the application will use both SQL Server and PostgreSQL. This is all hooked up fine.
Where I run into some issues is when my Postgres context is making fetches to the PostgreSQL database, it encounters a lot of errors
Npgsql.PostgresException: 42883: operator does not exist: character varying = uuid
I understand the issue, where the application using EF makes a fetch by Id (GUID), and the Postgres table has an Id that is VARCHAR type...
My feeling is the problem is not on the application or EF side, rather the column on the table should be something like a UUID. Which I can do, on post migration, I can simply alter the column to become a UUID type, but is this the way, and will it resolve my issues? I also feel like this can't be a unique case I'm dealing with; seems like a common issue for anyone also migrating a .NET app from SQL Server to PostgreSQL...
I look forward to hearing some of your ideas, comments, thoughts on this. Thanks in advance.
It seems that this migration procedure is not quite up to the task, as a GUID (which is Microsoft's confusing term for UUID) should be migrated to uuid. Not only would you save 21 bytes of storage space per row, but you also wouldn't have this problem.
It seems that your application is comparing a uuid value with one of the migrated varchars:
WHERE uniqueidentifier = UUID '87595807-3157-4a81-ac89-3e09e83c0c0a'
You have to add an explicit cast, like the error message says:
WHERE uniqueidentifier = CAST (UUID '87595807-3157-4a81-ac89-3e09e83c0c0a' AS text)
You would cast to text, not to varchar, because there is no equality operator for varchar. varchar is coerced to text when you compare it, because the storage for these types is identical.
I'm trying to add a GIN index that includes a UUID in a Postgres 9.6 database. Technically it is a composite index, with composite GIN support coming from the btree_gin plugin.
I try to create the index with this statement:
CREATE EXTENSION btree_gin;
CREATE INDEX ix_tsv ON text_information USING GIN (client_id, text_search_vector);
but I get this error:
ERROR: data type uuid has no default operator class for access method "gin"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
client_id is data type uuid and text_search_vector is a tsvector. I don't think the composite/btree_gin factor is actually relevant, as I get the same error trying to create the index on just client_id alone, but hopefully if there is a solution to this, it is one that will work with a composite index also.
I found PostgreSQL GIN index on array of uuid , which seems to suggest that it should be possible (if an array of UUIDs can be done, then surely an individual UUID can be done). However, the solution there was pretty opaque to me - it's not immediately obvious how to modify this solution to support a single UUID.
I would prefer a solution that doesn't involve casting the UUID to another type in the index or in another column, as I would rather not have to write specialized queries with casts in them (we are using django ORM to generate queries atm).
It is possible for GIN indexes. But not before Postgres 11, where it was added. The release notes:
Allow btree_gin to index bool, bpchar, name and uuid data types (Matheus Oliveira)
So the simple solution is to upgrade to Postgres 11. This should be good news for you:
April 9, 2019: Cloud SQL now supports PostgreSQL version 11.1 Beta
Or, in many cases you can alternatively use a GiST index, for which the same was introduced with Postgres 10, already. The release notes:
Add indexing support to btree_gist for the UUID data type (Paul Jungwirth)
Related:
How to use uuid with postgresql gist index type?
If neither is an option, you are back to what you wanted to avoid:
casting the uuid to another type in the index
You can create an expression index on a (consistent!) text representation or, theoretically, on two bigint columns derived from the uuid. But the first makes the index considerably bigger and slower and the second creates much more complication ...
The syntax of the cast is simple enough though: uuid::text. In an index expression that requires an extra set of parentheses. With the additional module btree_gin installed:
CREATE INDEX ix_uuid_tsv ON text_information USING GIN ((client_id::uuid), tsv);
Related:
Postgres using an index for one table but not another
What is the optimal data type for an MD5 field?
Would index lookup be noticeably faster with char vs varchar when all values are 36 chars
Or you could backport the feature from Postgres 11 - which is not an option with a hosted service like Google Cloud SQL for PostgreSQL as you mentioned in a comment. And I hardly see the use case where one would be skilled enough to implement the backport, but not to upgrade to Postgres 11.
We are using Slick in a Scala Project. There's a module where we need to do Upsert operation ( Insert/Update) . One of the ways we know is to simply use SQL statements and do it, but for now we would like to stick to using Slick to do it instead.
Since Slick's version we use supports InsertOrUpdate() operation, we want to use that. Now here's an issue : -
Our table has one primary key, and that is the index which is set to auto-increment. And the upsert operation which we want to do is on a transaction id which though is expected to be unique but is not marked unique or primary key in the database.
How to handle this situation? I tried manually implementing Upsert by first checking the table, and then insert/update but that is failing in production environment when many requests are coming in every second and we are having 2 inserts for about 1% of the results.
I am accessing data from a different DB via fdw_postgres. It works well:
CREATE FOREIGN TABLE fdw_table
(
name TEXT,
area double precision,
use TEXT,
geom GEOMETRY
)
SERVER foreign_db
OPTIONS (schema_name 'schema_A', table_name 'table_B')
However, when I query for the data_type of the fdw_table I get the following result:
name text
area double precision
use text
geom USER-DEFINED
Can fdw_postgres not handle the GEOMETRY data type of PostGIS? What does USER-DEFINED mean in this context?
From the documentation on the data_type column:
Data type of the column, if it is a built-in type, or ARRAY if it is
some array (in that case, see the view element_types), else
USER-DEFINED (in that case, the type is identified in udt_name and
associated columns).
So this is not specific to FDWs; you'd see the same definition for a physical table.
postgres_fdw can handle custom datatypes just fine, but there is currently one caveat: if you query the foreign table with a WHERE condition involving a user-defined type, it will not push this condition to the foreign server.
In other words, if your WHERE clause only references built-in types, e.g.:
SELECT *
FROM fdw_table
WHERE name = $1
... then the WHERE clause will be sent to the foreign server, and only the matching rows will be retrieved. But when a user-defined type is involved, e.g.:
SELECT *
FROM fdw_table
WHERE geom = $1
... then the entire table is retrieved from the foreign server, and the filtering is performed locally.
Postgres 9.6 will resolve this, by allowing you to attach a list of extensions to your foreign server object.
Well, obviously you are going to need any non-standard types defined at both ends. Don't forget the FDW functionality is supposed to support a variety of different database platforms, so there isn't any magic way to import remote operations on a datatype. Actually, given that one end could be running on MS-Windows and the other on ARM-based Linux there's not even a sensible way of doing it just with PostgreSQL.
I'm just getting started with PostgreSQL, and I'm new to database design.
I'm writing software in which I have various plugins that update a database. Each plugin periodically updates its own designated table in the database. So a plugin named 'KeyboardPlugin' will update the 'KeyboardTable', and 'MousePlugin' will update the 'MouseTable'. I'd like for my database to store these 'plugin-table' relationships while enforcing referential integrity. So ideally, I'd like a configuration table with the following columns:
Plugin-Name (type 'text')
Table-Name (type ?)
My software will read from this configuration table to help the plugins determine which table to update. Originally, my idea was to have the second column (Table-Name) be of type 'text'. But then, if someone mistypes the table name, or an existing relationship becomes invalid because of someone deleting a table, we have problems. I'd like for the 'Table-Name' column to act as a reference to another table, while enforcing referential integrity.
What is the best way to do this in PostgreSQL? Feel free to suggest an entirely new way to setup my database, different from what I'm currently exploring. Also, if it helps you answer my question, I'm using the pgAdmin tool to setup my database.
I appreciate your help.
I would go with your original plan to store the name as text. Possibly enhanced by additionally storing the schema name:
addin text
,sch text
,tbl text
Tables have an OID in the system catalog (pg_catalog.pg_class). You can get those with a nifty special cast:
SELECT 'myschema.mytable'::regclass
But the OID can change over a dump / restore. So just store the names as text and verify the table is there by casting it like demonstrated at application time.
Of course, if you use each tables for multiple addins it might pay to make a separate table
CREATE TABLE tbl (
,tbl_id serial PRIMARY KEY
,sch text
,name text
);
and reference it in ...
CREATE TABLE addin (
,addin_id serial PRIMARY KEY
,addin text
,tbl_id integer REFERENCES tbl(tbl_id) ON UPDATE CASCADE ON DELETE CASCADE
);
Or even make it an n:m relationship if addins have multiple tables. But be aware, as #OMG_Ponies commented, that a setup like this will require you to execute a lot of dynamic SQL because you don't know the identifiers beforehand.
I guess all plugins have a set of basic attributes and then each plugin will have a set of plugin-specific attributes. If this is the case you can use a single table together with the hstore datatype (a standard extension that just needs to be installed).
Something like this:
CREATE TABLE plugins
(
plugin_name text not null primary key,
common_int_attribute integer not null,
common_text_attribute text not null,
plugin_atttributes hstore
)
Then you can do something like this:
INSERT INTO plugins
(plugin_name, common_int_attribute, common_text_attribute, hstore)
VALUES
('plugin_1', 42, 'foobar', 'some_key => "the fish", other_key => 24'),
('plugin_2', 100, 'foobar', 'weird_key => 12345, more_info => "10.2.4"');
This creates two plugins named plugin_1 and plugin_2
Plugin_1 has the additional attributes "some_key" and "other_key", while plugin_2 stores the keys "weird_key" and "more_info".
You can index those hstore columns and query them very efficiently.
The following will select all plugins that have a key "weird_key" defined.
SELECT *
FROM plugins
WHERE plugin_attributes ? 'weird_key'
The following statement will select all plugins that have a key some_key with the value the fish:
SELECT *
FROM plugins
WHERE plugin_attributes #> ('some_key => "the fish"')
Much more convenient than using an EAV model in my opinion (and most probably a lot faster as well).
The only drawback is that you lose type-safety with this approach (but usually you'd lose that with the EAV concept as well).
You don't need an application catalog. Just add the application name to the keys of the table. This of course assumes that all the tables have the same structure. If not: use the application name for a table name, or as others have suggested: as a schema name( which also would allow for multiple tables per application).
EDIT:
But the real issue is of course that you should first model your data, and than build the applications to manipulate it. The data should not serve the code; the code should serve the data.