Kafka/KsqlDb : Why is PRIMARY KEY appending chars? - apache-kafka

I intend to create a TABLE called WEB_TICKETS where the PRIMARY KEY is equal to the key->ID value. For some reason, when I run the CREATE TABLE instruction the PRIMARY KEY value is appended with the chars 'JO' - why is this happening?
KsqlDb Statements
These work as expected
CREATE STREAM STREAM_WEB_TICKETS (
ID_TICKET STRUCT<ID STRING> KEY
)
WITH (KAFKA_TOPIC='web.mongodb.tickets', FORMAT='AVRO');
CREATE STREAM WEB_TICKETS_REKEYED
WITH (KAFKA_TOPIC='web_tickets_by_id') AS
SELECT *
FROM STREAM_WEB_TICKETS
PARTITION BY ID_TICKET->ID;
PRINT 'web_tickets_by_id' FROM BEGINNING LIMIT 1;
key: 5d0c2416b326fe00515408b8
The following successfully creates the table but the PRIMARY KEY value isn't what I expect:
CREATE TABLE web_tickets (
id_pk STRING PRIMARY KEY
)
WITH (KAFKA_TOPIC = 'web_tickets_by_id', VALUE_FORMAT = 'AVRO');
select id_pk from web_tickets EMIT CHANGES LIMIT 1;
|ID_PK|
|J05d0c2416b326fe00515408b8
As you can see the ID_PK value has the characters JO appended to it. Why is this?

It appears as though I wasn't properly setting the KEY FORMAT. The following command produces the expected result.
CREATE TABLE web_tickets_test_2 (
id_pk VARCHAR PRIMARY KEY
)
WITH (KAFKA_TOPIC = 'web_tickets_by_id', FORMAT = 'AVRO');

Related

duplicate key value violates unique constraint "pk_user_governments"

I am trying to insert a record with many to many relationship in EfCore to postgres table
When adding a simple record to Users...it works but when I introduced 1:N with User_Governments
It started giving me duplicate key value violates unique constraint "pk_user_governments"
I have tried a few things:
SELECT MAX(user_governments_id) FROM user_governments;
SELECT nextval('users_gov_user_id_seq');
This keeps incrementing everytime I run it in postgres..but the issue does not go
I am inserting it as follows:
User user = new();
user.Organisation = organisation;
user.Name = userName;
user.Email = email;
user.IsSafetyDashboardUser = isSafetyFlag;
if (isSafetyFlag)
{
List<UserGovernment> userGovernments = new List<UserGovernment>();
foreach (var govId in lgas)
{
userGovernments.Add(new UserGovernment()
{
LocalGovId = govId,
StateId = 7
});
}
user.UserGovernments = userGovernments;
}
_context.Users.Add(user);
int rows_affected = _context.SaveChanges();
Table and column in db is as follows:
CREATE TABLE IF NOT EXISTS user_governments
(
user_government_id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
user_id integer NOT NULL,
state_id integer NOT NULL,
local_gov_id integer NOT NULL,
CONSTRAINT pk_user_governments PRIMARY KEY (user_government_id),
CONSTRAINT fk_user_governments_local_govs_local_gov_id FOREIGN KEY (local_gov_id)
REFERENCES local_govs (local_gov_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE CASCADE,
CONSTRAINT fk_user_governments_states_state_id FOREIGN KEY (state_id)
REFERENCES states (state_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE CASCADE,
CONSTRAINT fk_user_governments_users_user_id FOREIGN KEY (user_id)
REFERENCES users (user_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE CASCADE
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
I have also tried running following command as per this post
SELECT SETVAL((SELECT PG_GET_SERIAL_SEQUENCE('user_governments', 'user_government_id')), (SELECT (MAX("user_government_id") + 1) FROM "user_governments"), FALSE);
but I get error:
ERROR: relation "user_governments" does not exist
IDENTITY is an table integrated automatic increment. No needs to use PG_GET_SERIAL_SEQUENCE wich is dedicated for SEQUENCES that is another way to have increment outside the table. So you cannot use a query like :
SELECT SETVAL((SELECT PG_GET_SERIAL_SEQUENCE('user_governments', 'user_government_id')),
(SELECT (MAX("user_government_id") + 1) FROM "user_governments"), FALSE)
If your purpose is to assigne the seed for an IDENTITY, the ways to do that is :
You must use a syntax like this one :
ALTER TABLE user_governments
ALTER COLUMN user_government_id RESTART WITH (select Max(user_government_id) + 1 from user_governments);
It turned out that I did not build the model correctly.
The user_government table had an incremental key, but I had defined the model as follows
modelBuilder.Entity<UserGovernment>()
.HasKey(bc => new { bc.UserId, bc.LocalGovId });
I replaced it with:
modelBuilder.Entity<UserGovernment>()
.HasKey(bc => new { bc.UserGovernmentId});
The Journey :)
Initially I found out that once I commented the following line
_context.UserGovernments.AddRange(userGovernments);
It just inserted data with user_government_id as 0
Then I tried manually giving a value to user_government_id and it also went successfully, this lead me to check my modelbuilder code!!

How to reset the auto generated primary key in PostgreSQL

My class for the table topics is as below. The primary key is autogenerated serial key. While testing, I deleted rows from the table and was trying to re-insert them again. The UUID is not getting reset.
class Topics(db.Model):
""" User Model for different topics """
__tablename__ = 'topics'
uuid = db.Column(db.Integer, primary_key=True)
topics_name = db.Column(db.String(256),index=True)
def __repr__(self):
return '<Post %r>' % self.topics_name
I tried the below command to reset the key
ALTER SEQUENCE topics_uuid_seq RESTART WITH 1;
It did not work.
I would appreciate any form of suggestion!
If it's indeed a serial ID, you can reset the owned SEQUENCE with:
SELECT setval(pg_get_serial_sequence('topics', 'uuid'), max(uuid)) FROM topics;
See:
How to reset postgres' primary key sequence when it falls out of sync?
But why would the name be uuid? UUID are not integer numbers and not serial. Also, it's not entirely clear what's going wrong, when you write:
The UUID is not getting reset.
About ALTER SEQUENCE ... RESTART:
Postgres manually alter sequence
In order to avoid duplicate id errors that may arise when resetting the sequence try:
UPDATE table SET id = DEFAULT;
ALTER SEQUENCE seq RESTART;
UPDATE table SET id = DEFAULT;
For added context:
'table' = your table name
'id' = your id column name
'seq' = find the name of your sequence with:
SELECT pg_get_serial_sequence('table', 'id');

key size exceeds implementation restriction for index on expression with UDF call

Firebird allows indexing on expressions since version 2.0. That includes using calls to user defined functions (UDF).
Currently, I am trying to add an expression index to this table:
CREATE TABLE M_ADSN_STRING_DATA (
ID DMN_AUTOINC NOT NULL /* DMN_AUTOINC = INTEGER NOT NULL */,
CLTREF DMN_REFID /* DMN_REFID = INTEGER NOT NULL */,
ATTRIBUTEDATA DMN_AFT_STRING /* DMN_AFT_STRING = VARCHAR(320) NOT NULL */
);
/******************************************************************************/
/**** Unique constraints ****/
/******************************************************************************/
ALTER TABLE M_ADSN_STRING_DATA ADD CONSTRAINT UNQ_M_ADSN_STRING_DATA UNIQUE (CLTREF, ATTRIBUTEDATA);
/******************************************************************************/
/**** Primary keys ****/
/******************************************************************************/
ALTER TABLE M_ADSN_STRING_DATA ADD CONSTRAINT PK_M_ADSN_STRING_DATA PRIMARY KEY (ID);
/******************************************************************************/
/**** Foreign keys ****/
/******************************************************************************/
ALTER TABLE M_ADSN_STRING_DATA ADD CONSTRAINT FK_M_ADSN_STRING_DATA_CLT FOREIGN KEY (CLTREF) REFERENCES M_CLIENT (ID) ON DELETE CASCADE ON UPDATE CASCADE;
/******************************************************************************/
/**** Indices ****/
/******************************************************************************/
CREATE INDEX M_ADSN_STRING_DATA_AD_UC ON M_ADSN_STRING_DATA COMPUTED BY (UPPER(ATTRIBUTEDATA));
Note, that it already has an expression index called M_ADSN_STRING_DATA_AD_UC.
The index I want to use should look like this:
CREATE INDEX M_ADSN_STRING_DATA_AD_DIG
ON M_ADSN_STRING_DATA
COMPUTED BY (F_DIGITS(ATTRIBUTEDATA));
Unfortunately, this gives me an error message.
Unsuccessful metadata update
key size exceeds implementation restriction for index "M_ADSN_STRING_DATA_AD_DIG"
I read Firebird FAQ entrys #213 and #211, and this SO question as well.
F_DIGITS is a UDF of FreeAdhocUDF library. Initially, it was declared as
DECLARE EXTERNAL FUNCTION F_DIGITS
CSTRING(32760)
RETURNS CSTRING(32760) FREE_IT
ENTRY_POINT 'digits' MODULE_NAME 'FreeAdhocUDF';
As my maximum input and output length is only 320 chars, I changed it to
DECLARE EXTERNAL FUNCTION F_DIGITS
CSTRING(320)
RETURNS CSTRING(320) FREE_IT
ENTRY_POINT 'digits' MODULE_NAME 'FreeAdhocUDF';
to fit the index size requirements. My databases pagesize is 16384. So, I'd think, my key can be up to 4096 bytes.
The domain DMN_AFT_STRING of column ATTRIBUTEDATA is declared as
CREATE DOMAIN DMN_AFT_STRING AS
VARCHAR(320) CHARACTER SET ISO8859_1
NOT NULL
COLLATE DE_DE_CS_SF;
Why does the key size exceed?
Long story short: Have you tried to turn it off and on again?
It looks like one has to disconnect and connect after changing the UDF declaration and before adding the expression index.
Now, it works properly. The key size does not exceed anymore.

CSV file data into a PostgreSQL table

I am trying to create a database for movielens (http://grouplens.org/datasets/movielens/). We've got movies and ratings. Movies have multiple genres. I splitted those out into a separate table since it's a 1:many relationship. There's a many:many relationship as well, users to movies. I need to be able to query this table multiple ways.
So I created:
CREATE TABLE genre (
genre_id serial NOT NULL,
genre_name char(20) DEFAULT NULL,
PRIMARY KEY (genre_id)
)
.
INSERT INTO genre VALUES
(1,'Action'),(2,'Adventure'),(3,'Animation'),(4,'Children\s'),(5,'Comedy'),(6,'Crime'),
(7,'Documentary'),(8,'Drama'),(9,'Fantasy'),(10,'Film-Noir'),(11,'Horror'),(12,'Musical'),
(13,'Mystery'),(14,'Romance'),(15,'Sci-Fi'),(16,'Thriller'),(17,'War'),(18,'Western');
.
CREATE TABLE movie (
movie_id int NOT NULL DEFAULT '0',
movie_name char(75) DEFAULT NULL,
movie_year smallint DEFAULT NULL,
PRIMARY KEY (movie_id)
);
.
CREATE TABLE moviegenre (
movie_id int NOT NULL DEFAULT '0',
genre_id tinyint NOT NULL DEFAULT '0',
PRIMARY KEY (movie_id, genre_id)
);
I dont know how to import my movies.csv with columns movie_id, movie_name and movie_genre For example, the first row is (1;Toy Story (1995);Animation|Children's|Comedy)
If I INSERT manually, it should be look like:
INSERT INTO moviegenre VALUES (1,3),(1,4),(1,5)
Because 3 is Animation, 4 is Children and 5 is Comedy
How can I import all data set this way?
You should first create a table that can ingest the data from the CSV file:
CREATE TABLE movies_csv (
movie_id integer,
movie_name varchar,
movie_genre varchar
);
Note that any single quotes (Children's) should be doubled (Children''s). Once the data is in this staging table you can copy the data over to the movie table, which should have the following structure:
CREATE TABLE movie (
movie_id integer, -- A primary key has implicit NOT NULL and should not have default
movie_name varchar NOT NULL, -- Movie should have a name, varchar more flexible
movie_year integer, -- Regular integer is more efficient
PRIMARY KEY (movie_id)
);
Sanitize your other tables likewise.
Now copy the data over, extracting the unadorned name and the year from the CSV name:
INSERT INTO movie (movie_id, movie_name)
SELECT parts[1], parts[2]::integer
FROM movies_csv, regexp_matches(movie_name, '([[:ascii:]]*)\s\(([\d]*)\)$') p(parts)
Here the regular expression says:
([[:ascii:]]*) - Capture all characters until the matches below
\s - Read past a space
\( - Read past an opening parenthesis
([\d]*) - Capture any digits
\) - Read past a closing parenthesis
$ - Match from the end of the string
So on input "Die Hard 17 (John lives forever) (2074)" it creates a string array with {'Die Hard 17 (John lives forever)', '2074'}. The scanning has to be from the end $, assuming all movie titles end with the year of publication in parentheses, in order to preserve parentheses and numbers in movie titles.
Now you can work on the movie genres. You have to split the string on the bar | using the regex_split_to_table() function and then join to the genre table on the genre name:
INSERT INTO moviegenre
SELECT movie_id, genre_id
FROM movies_csv, regexp_split_to_table(movie_genre, '\|') p(genre) -- escape the |
JOIN genre ON genre.genre_name = p.genre;
After all is done and dusted you can delete the movies_csv table.

an empty row with null-like values in not-null field

I'm using postgresql 9.0 beta 4.
After inserting a lot of data into a partitioned table, i found a weird thing. When I query the table, i can see an empty row with null-like values in 'not-null' fields.
That weird query result is like below.
689th row is empty. The first 3 fields, (stid, d, ticker), are composing primary key. So they should not be null. The query i used is this.
select * from st_daily2 where stid=267408 order by d
I can even do the group by on this data.
select stid, date_trunc('month', d) ym, count(*) from st_daily2
where stid=267408 group by stid, date_trunc('month', d)
The 'group by' results still has the empty row.
The 1st row is empty.
But if i query where 'stid' or 'd' is null, then it returns nothing.
Is this a bug of postgresql 9b4? Or some data corruption?
EDIT :
I added my table definition.
CREATE TABLE st_daily
(
stid integer NOT NULL,
d date NOT NULL,
ticker character varying(15) NOT NULL,
mp integer NOT NULL,
settlep double precision NOT NULL,
prft integer NOT NULL,
atr20 double precision NOT NULL,
upd timestamp with time zone,
ntrds double precision
)
WITH (
OIDS=FALSE
);
CREATE TABLE st_daily2
(
CONSTRAINT st_daily2_pk PRIMARY KEY (stid, d, ticker),
CONSTRAINT st_daily2_strgs_fk FOREIGN KEY (stid)
REFERENCES strgs (stid) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT st_daily2_ck CHECK (stid >= 200000 AND stid < 300000)
)
INHERITS (st_daily)
WITH (
OIDS=FALSE
);
The data in this table is simulation results. Multithreaded multiple simulation engines written in c# insert data into the database using Npgsql.
psql also shows the empty row.
You'd better leave a posting at http://www.postgresql.org/support/submitbug
Some questions:
Could you show use the table
definitions and constraints for the
partions?
How did you load your data?
You get the same result when using
another tool, like psql?
The answer to your problem may very well lie in your first sentence:
I'm using postgresql 9.0 beta 4.
Why would you do that? Upgrade to a stable release. Preferably the latest point-release of the current version.
This is 9.1.4 as of today.
I got to the same point: "what in the heck is that blank value?"
No, it's not a NULL, it's a -infinity.
To filter for such a row use:
WHERE
case when mytestcolumn = '-infinity'::timestamp or
mytestcolumn = 'infinity'::timestamp
then NULL else mytestcolumn end IS NULL
instead of:
WHERE mytestcolumn IS NULL