Scala Regex ReplaceAll: How to Replace All Groups? - scala

The following regex replace works beautifully:
var line = "PRIMARY INDEX XPKDLRSRC_PMT_CLMPMT ( DLRSRC_PMT_ID ,DLRSRC_PMT_CLMPMT_ID );"
println(line.replaceAll("""[UNIQUE\s]{0,1}PRIMARY INDEX [^\s]* \(""", "PRIMARY KEY ("))
It returns: PRIMARY KEY ( DLRSRC_PMT_ID ,DLRSRC_PMT_CLMPMT_ID );
The point of the first group [UNIQUE\s] was to take care of the following as well
line = "UNIQUE PRIMARY INDEX XPKDLRSRC_PMT_CLMPMT ( DLRSRC_PMT_ID ,DLRSRC_PMT_CLMPMT_ID );"
println(line.replaceAll("""[UNIQUE\s]{0,1}PRIMARY INDEX [^\s]* \(""", "PRIMARY KEY ("))
But the word UNIQUE does not get replaced and I end up with
UNIQUEPRIMARY KEY ( DLRSRC_PMT_ID ,DLRSRC_PMT_CLMPMT_ID );
When I expected
PRIMARY KEY ( DLRSRC_PMT_ID ,DLRSRC_PMT_CLMPMT_ID );
How to I get all groups in the regex replaced by a string?

[UNIQUE\s] represents a single character consisting of any of the enclosed characters. For what you need, replacing it with (?:UNIQUE\s+)? should do.
val line = "UNIQUE PRIMARY INDEX XPKDLRSRC_PMT_CLMPMT ( DLRSRC_PMT_ID ,DLRSRC_PMT_CLMPMT_ID );"
line.replaceAll("""(?:UNIQUE\s+)?PRIMARY INDEX [^\s]* \(""", "PRIMARY KEY (")
// res1: String = PRIMARY KEY ( DLRSRC_PMT_ID ,DLRSRC_PMT_CLMPMT_ID );
(?:regex) represents a non-capturing group and appending the group with a ? makes it an optional match.

Related

Updating key constraints on multiple records simultaneously

We have a table with a unique key which gets updated by ‘aging’ older records, as mentioned by #Tony O’Hagan here.
The table looks as follows:
-- auto-generated definition
create table abc
(
key uuid not null,
hash text not null,
age integer not null,
value varchar(50),
constraint abc_pkey
primary key (key, age)
);
We can simulate an ‘aged’ record with the following dummy data:
INSERT INTO public.abc (key, hash, age, value) VALUES ('bec619bb-451c-49d8-b555-4d16e1f724fb', 'asdf', 0, '1');
INSERT INTO public.abc (key, hash, age, value) VALUES ('bec619bb-451c-49d8-b555-4d16e1f724fb', 'asdf', 1, '2');
INSERT INTO public.abc (key, hash, age, value) VALUES ('bec619bb-451c-49d8-b555-4d16e1f724fb', 'asdf', 2, '3');
When I want to add a new record, I must first ‘age’ the older records before inserting a new record with age=0
However I get the following error message when I run the query below:
[23505] ERROR: duplicate key value violates unique constraint "abc_pkey" Detail: Key (key, age)=(bec619bb-451c-49d8-b555-4d16e1f724fb, 2) already exists.
UPDATE abc
SET age = age +1
WHERE key IN (
'bec619bb-451c-49d8-b555-4d16e1f724fb'
)
How can I update/age these records?
We can disable the CONSTRAINTS with the commande
SET CONSTRAINTS ALL DEFERRED
✓
which lets us run our update
UPDATE public.abc SET age = age + 1;
3 rows affected
we can then reactivate the CONSTRAINTS with
SET CONSTRAINTS ALL IMMEDIATE
✓

Kafka/KsqlDb : Why is PRIMARY KEY appending chars?

I intend to create a TABLE called WEB_TICKETS where the PRIMARY KEY is equal to the key->ID value. For some reason, when I run the CREATE TABLE instruction the PRIMARY KEY value is appended with the chars 'JO' - why is this happening?
KsqlDb Statements
These work as expected
CREATE STREAM STREAM_WEB_TICKETS (
ID_TICKET STRUCT<ID STRING> KEY
)
WITH (KAFKA_TOPIC='web.mongodb.tickets', FORMAT='AVRO');
CREATE STREAM WEB_TICKETS_REKEYED
WITH (KAFKA_TOPIC='web_tickets_by_id') AS
SELECT *
FROM STREAM_WEB_TICKETS
PARTITION BY ID_TICKET->ID;
PRINT 'web_tickets_by_id' FROM BEGINNING LIMIT 1;
key: 5d0c2416b326fe00515408b8
The following successfully creates the table but the PRIMARY KEY value isn't what I expect:
CREATE TABLE web_tickets (
id_pk STRING PRIMARY KEY
)
WITH (KAFKA_TOPIC = 'web_tickets_by_id', VALUE_FORMAT = 'AVRO');
select id_pk from web_tickets EMIT CHANGES LIMIT 1;
|ID_PK|
|J05d0c2416b326fe00515408b8
As you can see the ID_PK value has the characters JO appended to it. Why is this?
It appears as though I wasn't properly setting the KEY FORMAT. The following command produces the expected result.
CREATE TABLE web_tickets_test_2 (
id_pk VARCHAR PRIMARY KEY
)
WITH (KAFKA_TOPIC = 'web_tickets_by_id', FORMAT = 'AVRO');

Why postgresql encount duplicate key when key not exists?

When I am inserting data into Postgresql(9.6),throw this error:
ERROR: duplicate key value violates unique constraint "book_intial_name_isbn_isbn10_key"
DETAIL: Key (name, isbn, isbn10)=(三銃士, , ) already exists.
SQL state: 23505
I add uniq constraint on columns name, isbn, isbn10.But when I check the distination table,it does not contains the record:
select * from public.book where name like '%三銃%';
How to fix?This is my insert sql:
insert into public.book
select *
from public.book_backup20190405 legacy
where legacy."name" not in
(
select name
from public.book
)
limit 1000
An educated guess, there may be more than one row in the source table book_backup20190405 which has the unique key tuple ('三銃', '', '').
Since the bulk INSERT INTO ... SELECT ... will be be transactional, you'll be none the wiser to the error, since all data will have been rolled back when the constraint fails.
You can verify this by running a dupe check on the source table:
SELECT name, isbn, isbn10, COUNT(*)
FROM public.book_backup20190405
WHERE name = '三銃'
GROUP BY name, isbn, isbn10
HAVING COUNT(*) > 1;
To see if there are duplicates.
Here's an example of how the source table can be the sole source of duplicates:
http://sqlfiddle.com/#!17/29ba3

CSV file data into a PostgreSQL table

I am trying to create a database for movielens (http://grouplens.org/datasets/movielens/). We've got movies and ratings. Movies have multiple genres. I splitted those out into a separate table since it's a 1:many relationship. There's a many:many relationship as well, users to movies. I need to be able to query this table multiple ways.
So I created:
CREATE TABLE genre (
genre_id serial NOT NULL,
genre_name char(20) DEFAULT NULL,
PRIMARY KEY (genre_id)
)
.
INSERT INTO genre VALUES
(1,'Action'),(2,'Adventure'),(3,'Animation'),(4,'Children\s'),(5,'Comedy'),(6,'Crime'),
(7,'Documentary'),(8,'Drama'),(9,'Fantasy'),(10,'Film-Noir'),(11,'Horror'),(12,'Musical'),
(13,'Mystery'),(14,'Romance'),(15,'Sci-Fi'),(16,'Thriller'),(17,'War'),(18,'Western');
.
CREATE TABLE movie (
movie_id int NOT NULL DEFAULT '0',
movie_name char(75) DEFAULT NULL,
movie_year smallint DEFAULT NULL,
PRIMARY KEY (movie_id)
);
.
CREATE TABLE moviegenre (
movie_id int NOT NULL DEFAULT '0',
genre_id tinyint NOT NULL DEFAULT '0',
PRIMARY KEY (movie_id, genre_id)
);
I dont know how to import my movies.csv with columns movie_id, movie_name and movie_genre For example, the first row is (1;Toy Story (1995);Animation|Children's|Comedy)
If I INSERT manually, it should be look like:
INSERT INTO moviegenre VALUES (1,3),(1,4),(1,5)
Because 3 is Animation, 4 is Children and 5 is Comedy
How can I import all data set this way?
You should first create a table that can ingest the data from the CSV file:
CREATE TABLE movies_csv (
movie_id integer,
movie_name varchar,
movie_genre varchar
);
Note that any single quotes (Children's) should be doubled (Children''s). Once the data is in this staging table you can copy the data over to the movie table, which should have the following structure:
CREATE TABLE movie (
movie_id integer, -- A primary key has implicit NOT NULL and should not have default
movie_name varchar NOT NULL, -- Movie should have a name, varchar more flexible
movie_year integer, -- Regular integer is more efficient
PRIMARY KEY (movie_id)
);
Sanitize your other tables likewise.
Now copy the data over, extracting the unadorned name and the year from the CSV name:
INSERT INTO movie (movie_id, movie_name)
SELECT parts[1], parts[2]::integer
FROM movies_csv, regexp_matches(movie_name, '([[:ascii:]]*)\s\(([\d]*)\)$') p(parts)
Here the regular expression says:
([[:ascii:]]*) - Capture all characters until the matches below
\s - Read past a space
\( - Read past an opening parenthesis
([\d]*) - Capture any digits
\) - Read past a closing parenthesis
$ - Match from the end of the string
So on input "Die Hard 17 (John lives forever) (2074)" it creates a string array with {'Die Hard 17 (John lives forever)', '2074'}. The scanning has to be from the end $, assuming all movie titles end with the year of publication in parentheses, in order to preserve parentheses and numbers in movie titles.
Now you can work on the movie genres. You have to split the string on the bar | using the regex_split_to_table() function and then join to the genre table on the genre name:
INSERT INTO moviegenre
SELECT movie_id, genre_id
FROM movies_csv, regexp_split_to_table(movie_genre, '\|') p(genre) -- escape the |
JOIN genre ON genre.genre_name = p.genre;
After all is done and dusted you can delete the movies_csv table.

Why is SQLite not incrementing my primary key?

Here is my table,
sqlStmt = [ [ NSString stringWithFormat:#"Create Table %# (recordNo INTEGER PRIMARY KEY AUTOINCREMENT, noOfPlayer INTEGER, smallBlindAmt INTEGER, bigBlindAmt INTEGER , startingChips INTEGER, roundTimer INTEGER) " , SETTINGS_TABLE ] StringUsingEncoding:NSUTF8StringEncoding ] ;
insert query,
sqlStmt = [ [ NSString stringWithFormat: #"insert into %# values (NULL,%d, %d ,%d ,%d ,%d )" , strTableName ,noOfPlayers, SmallAmt, BigAmt, StartingChips, RoundTime ] UTF8String ] ;
get last record's record id,
lastRecordNo = sqlite3_last_insert_rowid(dbObj);
The lastRecordNo is always 0, and I am unable insert another values because it gives the error primary key must be unique.
I am unable to get the problem associated with it?
How to fetch the last record id which is primary key and autoincrement.?
Is there any problem in my insert query?
Can anyone explain me with example create, insert and select queries where primary is an autoincrement?
Don't explicitly set the primary key in your insert statement, let the system assign it using auto-increment.
Change the insert statement to:
insert into %# (noOfPlayer, smallBlindAmt, bigBlindAmt, startingChips, roundTimer) values (%d, %d ,%d ,%d ,%d)
check your table for records. you can make use of sqlite3_mprintf. here for more details
I don't really know why it doesn't work, you might be reconnecting or something.
I would also suggest not looking for last row id as another process may increase it in parallel.