Clickhouse 'lightweight deletes' are not filtered from subsequent queries as expected, only filtered once asynchronous call is complete

Clickhouse 'lightweight deletes' are not filtered from subsequent queries as expected, only filtered once asynchronous call is complete - sql-delete

I am trying out the new clickhouse lightweight deletes, which deletes rows on the next merge but asynchronously 'marks' them immediately so that they are not shown in subsequent queries.
The guide i am following is here: https://clickhouse.com/docs/en/sql-reference/statements/delete/
But this doesn't seem to be happening as expected. After deletion it takes about 2-3 minutes before my returned queries are excluded from the results. am I missing something here?
E.g.
I have a collapsingMergeTree table called 'tests' with the following three rows:
Id
name
syGGJVGETbzKMkayoYYaAg
kieren
wFhZdsdjf1xmcHGqK1CQQf
mike
abfZrhYkiafg7qr9jAwseG
peter
I attempt to run
DELETE FROM test WHERE Id = 'syGGJVGETbzKMkayoYYaAg' SETTINGS allow_experimental_lightweight_delete=1;
// OR
SET allow_experimental_lightweight_delete=1;
DELETE FROM test WHERE Id = 'syGGJVGETbzKMkayoYYaAg'
However when I run the query I still get the deleted record in results, until about 5 minutes later.
I am using the clickhouse docker image (clickhouse/clickhouse-server) for my tests locally.
Thanks in advance for your help.
--- what did you try ---
Tried the documents code and expected that the results would be hidden immediately and deleted on the background as the page indicates so.
Please let me know if I am doing anything wrong or missing something

DELETE FROM test is async by default. Not only process of deletion but DELETE itself as well.
you can use mutations_sync=2:
DELETE FROM test
WHERE Id = 'syGGJVGETbzKMkayoYYaAg'
SETTINGS allow_experimental_lightweight_delete=1, mutations_sync=2;
It makes your delete synchronous.
( this behaviour was changed today BTW https://github.com/ClickHouse/ClickHouse/pull/44718 )
So currently you run DELETE FROM test ... it returns control immediately and does nothing!!! except it creates a mutation. This mutation asynchronously in the background changes the invisible column _deleted for rows matching WHERE condition, this process can take seconds, minutes, hours, days. After that your selects will not see these rows. After that eventually OR NEVER the marked rows will be deleted during merges, this can take months.

Related

is there any way of simulating an autocommit in functions in postgres

I know you can't control transactions in functions or procedures, but I'm wondering if there's something like that or some alternative.
The problem: I have a function that's very expensive that turns things like a customer id into a nice html report. Trouble is - it takes seconds... so I've put into the function something that basically looks at a cache to see if a pre-rendered one exists, returning it if it does - and if it doesn't - it adds to the cache afterwards - so it will only ever render things once.
Now - given most things will never change - I sort of want to do it across everything - but given the time - it will probably take about 1 year to run - which is ok actually - this system has to run for ten. Trouble is - I don't want it to lock anything on the database, so I sort of want it trickle along, doing 1 at a time and committing immediately.
I investigated pg_cron, because that seemed an option, but the version of aurora I am using doesn't support it. Any ideas how I'd do this inside the database?

By all means, don't code that as a function running inside the database. It is fine to do the calculations in the database, but generating a report and iterating over customers belong into client code. That way, committing each report is not a problem.

Add a text column to the customer table to hold the html report.
Put trigger(s) on table(s) whose content influences the html report that refreshes the report column.
This gives you instant retrieval and only (re)calculates it when needed.
Or if data is stable:
create table customer_report (
customer_id int not null primary key,
report text not null
);
insert into customer_report
select id, my_report_proc(id)
from customer;

User is getting "You tried to lock table '' while opening it, but the table cannot be locked because it is currently in use."

Getting this error, with no table listed, when the user logs in to the order system.
The query is, roughly
SELECT FormParts.*, qrySomeOtherColumns.*, CBool(0) AS ObjMissing INTO Temp_SessionFormParts
FROM FormParts INNER JOIN qrySomeOtherColumns ON FormParts.ID = qrySomeOtherColumns.FormPartID
WHERE EmpId = EmpID();
The data is coming from linked tables with a Postgres back end.
The data is going into the local table "Temp_SessionFormParts"
This is a local table in the front end, so no one else should have access to it.
Everyone else has a copy of the same database, but no one else gets the error.
Where it gets weird is that she only gets the error when she starts the app with our standard shortcut that copies the latest version.
copy "\\NASdiskstation\Install\Deploy\pgFrontEnd.accde" "C:\Merge Documents\"
start "" MSAccess.exe "C:\Merge Documents\pgFrontEnd.accde"
But if she starts Access from the Start Menu, and then opens the same database from the selections, it works fine.
Seems to me that those two routes should give the same result since she's already copied the latest to C:.

I went ahead an posted this question, since I had it written up, because I've seen this error a few times. So this may be helpful.
Normally, you can't SELECT INTO an existing table. But Access detects that situation and helpfully deletes the table before executing the insert.
Somehow this was failing.
DROP TABLE Temp_SessionFormParts
before the INSERT fixed the issue.
No idea why it sometimes didn't work for some people or why running Access directly changed that.

Simple query in controller not getting updated results

I have a very strange issue where I am querying the database for the last result however it is showing the last result from some time in the past. This database is updated each hour via rake tasks however it doesn't seem that the query is being rerun but instead somehow cached? Its only happening on Heroku and now when I run it from my local which suggests that it may have something to do with my environment config. To make things stranger, if I reset the deployed database and push the content back into it the query pulls the right data however then becomes stale again... Any ideas? Is it caching on the database?
Update: The query is input = IonitTankInput.last
It works just fine on my local however not on production where it instead is grabbing the last record the database was aware of days ago and doesn't recheck for the latest new "last" record added via rake tasks...If I run the query directly on the heroku rails console it returns the correct newly added last record. No idea what's causing this.
To make things stranger, this query is working on a different page as intended.
UPDATE: Ok I fixed it however the solution makes no sense unless someone can explain it. I changed the action from:
def show
#tank = IonitTank.find(params[:id])
if something
return false
end
#input = #tank.ionit_tank_inputs.last
end
To:
def show
#tank = IonitTank.find(params[:id])
#input = #tank.ionit_tank_inputs.last
if something
blah blah
end
end
What is odd is that the if statement wasnt returning false so I know #input was being defined but for some reason it needed to be defined before the if statement is even checked...

atk4.2 form submit-how to get new record id before insert to pass in arguments

I am referencing the 2 step newsletter example at http://agiletoolkit.org/codepad/newsletter. I modified the example into a 4 step process. The following page class is step 1, and it works to insert a new record and get the new record id. The problem is I don't want to insert this record into the database until the final step. I am not sure how to retrieve this id without using the save() function. Any ideas would be helpful.
class page_Ssp_Step1 extends Page {
function init(){
parent::init();
$p=$this;
$m=$p->add(Model_Publishers);
$form=$p->add('Form');
$form->setModel($m);
$form->addSubmit();
if($form->isSubmitted()){
$m->save();//inserts new record into db.
$new_id=$m->get('id');//gets id of new record
$this->api->memorize('new_id',$new_id);//carries id across pages
$this->js()->atk4_load($this->api->url('./Step2'))->execute();
}
}
}

There are several ways you could do this, either using atk4 functionality, mysql transactions or as a part of the design of your application.
1) Manage the id column yourself
I assume you are using an auto increment column in MySQL so one option would be to not make this auto increment but use a sequence and select the next value and save this in your memorize statement and add it in the model as a defaultValue using ->defaultValue($this->api->recall('new_id')
2) Turn off autocommit and create a transaction around the inserts
I'm from an oracle background rather than MySQL but MySQL also allows you to wrap several statements in a transaction which either saves everything or rollsback so this would also be an option if you can create a transaction, then you might still be able to save but only a complete transaction populating several tables would be committed if all steps complete.
In atk 4.1, the DBlite/mysql.php class contains some functions for transaction support but the documentation on agiletoolkit.org is incomplete and it's unclear how you change the dbConnect being used as currently you connect to a database in lib/Frontend.php using $this->dbConnect() but there is no option to pass a parameter.
It looks like you may be able to do the needed transaction commands using this at the start of the first page
$this->api->db->query('SET AUTOCOMMIT=0');
$this->api->db->query('START TRANSACTION');
then do inserts in various pages as needed. Note that everything done will be contained in a transaccion so if the user doesnt complete the process, nothing will be saved.
On the last insert,
$this->api->db->query('COMMIT');
Then if you want to, turn back on autocommit so each SQL statement is committed
$this->api->db->query('SET AUTOCOMMIT=1');
I havent tried this but hopefully that helps.
3) use beforeInsert or afterInsert
You can also look at overriding the beforeInsert function on your model which has an array of the data but I think if your id is an auto increment column, it won't have a value until the afterInsert function which has a parameter of the Id inserted.
4) use a status to indicate complete record
Finally you could use a status column on your record to indicate it is only at the first stage and this only gets updated to a complete status when the final stage is completed. Then you can have a housekeeping job that runs at intervals to remove records that didn't complete all stages. Any grid or crud where you display these records would be limited with AddCondition('status','C') in the model or added in the page so that incomplete ones never get shown.
5) Manage the transaction as non sql
As suggested by Romans, you could store the result of the form processing in session variables instead of directly into the database and then use a SQL to insert it once the last step is completed.

Sybase select variable logic

Ok, I have a question relating to an issue I've previously had. I know how to fix it, but we are having problems trying to reproduce the error.
We have a series of procedures that create records based on other records. The records are linked to the primary record by way of a link_id. In a procedure that grabs this link_id, the query is
select #p_link_id = id --of the parent
from table
where thingy_id = (blah)
Now, there are multiple rows in the table for the activity. Some can be cancelled. The code I have doesn't disinclude cancelled rows in the select statement, so if there are previously cancelled rows, those ids will appear in the select. There is always going to be one 'open' record that is selected if I disinclude cancelled rows. (append where status != 'C')
This solves this issue. However, I need to be able to reproduce the issue in our development environment.
I've gone through a process where I've entered a whole heap of data, opening, cancelling, etc to try and get this select statement to return an invalid id. However, whenever I run the select, the ids are in order (sequence generated), but in the case where this error occured, the select statement returned what seems to be the first value into the variable.
For example.
ID Status
1 Cancelled
2 Cancelled
3 Cancelled
4 Open
Given the above, if I do a select for the ID I want, I want to get '4'. In the error, the result is 1. However, even if I enter in 10 cancelled records, I still get the last one in the select.
In oracle, I know that if you select into a variable and more than one record is returned, you get an error (I think). Sybase apparently can assign multiple values into a variable without erroring.
I'm thinking that there's either something to do with how the data is selected from the table, where the id's without a sort order don't return in ascending order, or there's a dboption where a select into a variable will save the first or last value queried.
Edit: it looks like we can reproduce this error by rolling back stored procedure changes. However, the procs don't go anywhere near this link_id column. Is it possible that changes to the database architecture could break an index or something?

If more than one row is returned, the value that is stored will be the last value in the list, according to this.
If you haven't specified an order for retrieval via ORDER BY, then the order returned will be at the convenience of the database engine. It may very well vary by the database instance. It may be in the order created, or even appear "random" because of where the data is placed within the database block structure.
The moral of the story:
Always make singleton SELECTs return a single row
When #1 can't be done, use an ORDER BY to make sure the one you care about comes last

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse