Slick 3.0.0 Delete all Entries From a Table - sql-delete

The following query:
val resultValue = Await.result(db.run(MyTable.myTableItems.delete), 2.seconds)
Strangely, the above query when run gives me the max value for the Id that I have in that particular table. Why is that? What is the Slick equivalent of delete all entries from a Table?

The result from delete is the number of rows affected by the delete. If your id starts from 1, it could be a coincidence that the number of rows affected happens to be the same as largest id in the table.
Calling delete on a table query is the way to delete all entries from a table. For example, if you have
val coffees = TableQuery[Coffees]
...then coffees.delete is the action to remove all rows.
Useful link: Deleting, in the Slick Manual.

Related

PostgreSQL: Return auto-generated ids from COPY FROM insertion

I have a non-empty PostgreSQL table with a GENERATED ALWAYS AS IDENTITY column id. I do a bulk insert with the C++ binding pqxx::stream_to, which I'm assuming uses COPY FROM. My problem is that I want to know the ids of the newly created rows, but COPY FROM has no RETURNING clause. I see several possible solutions, but I'm not sure if any of them is good, or which one is the least bad:
Provide the ids manually through COPY FROM, taking care to give the values which the identity sequence would have provided, then afterwards synchronize the sequence with setval(...).
First stream the data to a temp-table with a custom index column for ordering. Then do something likeINSERT INTO foo (col1, col2)
SELECT ttFoo.col1, ttFoo.col2 FROM ttFoo
ORDER BY ttFoo.idx RETURNING foo.id
and depend on the fact that the identity sequence produces ascending numbers to correlate them with ttFoo.idx (I cannot do RETURNING ttFoo.idx too because only the inserted row is available for that which doesn't contain idx)
Query the current value of the identity sequence prior to insertion, then check afterwards which rows are new.
I would assume that this is a common situation, yet I don't see an obviously correct solution. What do you recommend?
You can find out which rows have been affected by your current transaction using the system columns. The xmin column contains the ID of the inserting transaction, so to return the id values you just copied, you could:
BEGIN;
COPY foo(col1,col2) FROM STDIN;
SELECT id FROM foo
WHERE xmin::text = (txid_current() % (2^32)::bigint)::text
ORDER BY id;
COMMIT;
The WHERE clause comes from this answer, which explains the reasoning behind it.
I don't think there's any way to optimise this with an index, so it might be too slow on a large table. If so, I think your second option would be the way to go, i.e. stream into a temp table and INSERT ... RETURNING.
I think you can create id with type is uuid.
The first step, you should random your ids after that bulk insert them, by this way your will not need to return ids from database.

Spark Scala Dataframe join and modification

I have a table which has employee details and another table project which has the project details and employee id assigned.
Employee
EmployeeName|Id|Address|Assigned
Joan|101|xxxx|y
Project
ProjectCode|Number of days|Employee
XX1223|24|101
I have a csv file which will load the employee details in the employee table. While loading the employee details,
I need to identify if his employee id is assigned to the project table:
if the employee id is available in the project table, insert y to Assigned in the Employee table.
if not, insert n to Assigned in the Employee table.
I have a dataframe for Employee as,
var employeeDF = Employee_TABLE
And,
var employeeAssignedDF = Employee_Join_Project
At the moment, I insert to Employee first then do the join and then update Employee again. But I can do the
employeeDF.except(employeeAssignedDF)
which will have a minimum number of rows.
Is it possible to change few of the data frame column alone?
I want to insert to the table only once, so when I join and do the except I should have all the records which can be inserted to DB. Is that feasible?
Thanks
You could try this, But not sure whether this could solve your problem or not -
val newDf = df.withColumn("Column", when(CONDITION, 'Y').otherwise('N'))
You could also use any method at the place of "when(CONDITION, 'Y')"

Remove Duplicate rows from a large table - PostgreSQL

I want to remove duplicates from a large table having about 1million rows and increasing every hour. It has no unique id and has about ~575 columns but sparsely filled.
The table is 'like' a log table where new entries are appended every hour without unique timestamp.
The duplicates are like 1-3% but I want to remove it anyway ;) Any ideas?
I tried ctid column (as here) but its very slow.
The basic idea that works generally well with PostgreSQL is to create an index on the hash of the set of columns as a whole.
Example:
CREATE INDEX index_name ON tablename (md5((tablename.*)::text));
This will work unless there are columns that don't play well with the requirement of immutability (mostly timestamp with time zone because their cast-to-text value is session-dependent).
Once this index is created, duplicates can be found quickly by self-joining with the hash, with a query looking like this:
SELECT t1.ctid, t2.ctid
FROM tablename t1 JOIN tablename t2
ON (md5((t1.*)::text) = md5((t2.*)::text))
WHERE t1.ctid > t2.ctid;
You may also use this index to avoid duplicates rows in the future rather than periodically de-duplicating them, by making it UNIQUE (duplicate rows would be rejected at INSERT or UPDATE time).

Sybase complains of duplicate insertion where none exists

I have moved some records from my SOURCE table in DB_1 into an ARCHIVE table in another DB_2 (ie. INSERTED the records from SOURCE into ARCHIVE and then DELETED the records from SOURCE.)
My SOURCE table has the following index created as SOURCE_1:
CREATE UNIQUE NONCLUSTERED INDEX SOURCE_1
ON dbo.SOURCE(TRADE_SET_ID, ORDER_ID)
The problem is - when I try to insert the rows back into SOURCE from ARCHIVE, Sybase throws the following error:
Attempt to insert duplicate key row in object 'SOURCE' with unique index 'SOURCE_1'
And, of course, subsequently fails the insertions.
I confirmed that my SOURCE table does not have these duplicates because the following query returned empty:
select * from DB_1.dbo.SOURCE
join DB_2.dbo.ARCHIVE
on DB_1.dbo.SOURCE.TRADE_SET_ID = DB_2.dbo.ARCHIVE.TRADE_SET_ID
AND DB_1.dbo.SOURCE.ORDER_ID = DB_2.dbo.ARCHIVE.ORDER_ID
If the above query returned nothing, then that means I haven not violated my unique index constraint on the 2 columns, however Sybase complains that I have.
Does anyone have any ideas on why this is happening?
If Sybase is anything like SQL Server in this regard (Which I'm more familiar with), I would suspect that the index is blocking the insert. Try disabling the index (along with any other indexes or autoincrement columns) on your archive version before copying over to it, then re-enabling. Its probable that Sybase would try to automatically create IDs for the insertions, which would interfere with the existing records.

Insert into table with Identity and foreign key columns

I was trying to insert values from one table to another from two different databases.
My issue is I have two tables with a relation and the first table is having an identity column also.
eg table first(id, Name) - table second(id, address)
So now both the table exist with values in a db and i am trying to copy values from this db to another db.
So when I insert values from first db to second db the the first table will insert values for the Id column by itself so now I have to link that id to the second table.
How can I do that?
UPDATE using MSSQL server 2000
You can use #scope_identity immediately after your insert in SQL server 2000 which will give you the last id within the current scope but I'm not sure how that would work with bulk inserting of data
http://msdn.microsoft.com/en-us/library/ms190315.aspx
If this were SQL Server 2005 or later I would suggest using the output clause in your insert statement to retrieve the ids just inserted, but that was not available in SQL Server 2000.
If your data contains some column or series of columns which is unique other than the identity column, then you can query your first table based on that series of columns to get the ids and use that to populate your second table.
If the target tables were empty you could use SET IDENTITY_INSERT ON - this would allow to insert original values to identity columns, and you will not have to update referenced IDs. Of course if there is any existing ids that can overlap inserted ids - that is not the solution.
If names in first tables are unique, you could boild mapping between new and old ids and perform update something like this:
UPDATE S
SET S.id = F.id
FROM second S
INNER JOIN first_original FO ON FO.id = S.id
INNER JOIN first F ON F.name = FO.name
If names are not unique, then original ids should be saved in "first" in order to provide mapping between old and new ids. It can be temporary new column that can be deleted after ids in "second" will be updated.
Or as Rich Andrews said you could use #scope_identity, but in this case you will have to perform insert one by one - declare a cursor on source table, insert each record, get its new id and insert it into "second" table.