Applying updates to a KDB table in thread safe manner in C - kdb

I need to update a KDB table with new/updated/deleted rows while it is being read by other threads. Since writing to K structures while other threads access will not be thread safe, the only way I can think of is to clone the whole table and apply new changes to that. Even to do that, I need to first clone the table, then find a way to insert/update/delete rows from it.
I'd like to know if there are functions in C to:
1. Clone the whole table
2. Delete existing rows
3. Insert new rows easily
4. Update existing rows
Appreciate suggestions on new approaches to the same problem as well.

Based on the comments...
You need to do a set of operations on the KDB database "atomically"
You don't have "control" of the database, so you can't set functions (though you don't actually need to be an admin to do this, but that's a different story)
You have a separate C process that is connecting to the database to do the operations you require. (Given you said you don't have "access" to the database as admin, you can't get KDB to load your C binary to use within-process anyway).
Firstly I'm going to assume you know how to connect to KDB+ and issue via the C API (found here).
All you need to do then is to concatenate your "atomic" operation into a set of statements that you are going to issue in one call from C. For example say you want to update a table and then delete some entry. This is what your call might look like:
{update name:`me from table where name=`you; delete from `table where name=`other;}[]
(Caution: this is just a dummy example, I've assumed your table is in-memory so that the delete operation here would work just fine, and not saved to disk, etc. If you need specific help with the actual statements you require in your use case then that's a different question for this forum).
Notice that this is an anonymous function that will get called immediately on issue ([]). There is the assumption that your operations within the function will succeed. Again, if you need actual q query help it's a different question for this forum.
Even if your KDB database is multithreaded (started with -s or negative port number), it will not let you update global variables inside a peach thread. Therefore your operation should work just fine. But just in case there's something else that could interfere with your new anonymous function, you can wrap the function with protected evaluation.

Related

TYPO3 backend workflow when avoiding the storage of data in intermediate table

I have a situation as described in the ExtbaseFluid book:
I would like to store information in the intermediate table which is not recommended at all.
Here is a cite from the warning box of the above linked book chapter:
Do not store data in the Intermediate Table that concern the Domain. Though TYPO3 supports this (especially in combination with Inline Relational Record Editing (IRRE) but this is always a sign that further improvements can be made to your Domain Model. Intermediate Tables are and should always be tools for storing relationships and nothing else.
Let’s say you want to store a CD with its containing music tracks: CD -- m:n (Intermediate Table) -- Song. The track number may be stored in a field of the Intermediate Table. However, the track should be stored as a separate domain object, and the connection be realized as CD -- 1:n -- Track -- n:1 -- Song.
So I want not to do what is not recommended. But thinking about the workflow for the editor that results of the recommended solution rises a few question for me.
To stay with this example I would need the following tables:
tx_extname_domain_model_cd
tx_extname_domain_model_cd_track_mm
tx_extname_domain_model_track (which holds the track number)
tx_extname_domain_model_track_song_mm
tx_extname_domain_model_song
From what I know this would end in the situation that the editor would need to create following records:
one record for the cd
one record for the song
now the editor can create one record for the track.
There the track number is added.
Furthermore the cd record needs to be assigned as well as the song.
So here are my questions:
I guess this workflow cannot be improved with some (to me unknown) TCA setup?
An editor cannot directly reach the song when the cd record is opened?
Instead first she / he has to open the track record and can from there navigate to the song?
Is it really that bad to store data in the intermediate table? The TYPO3 table sys_file_reference does the same!? But I wonder how those data could be shown (because IRRE is not possible because it shall only be used for 1:n relations (source).
The question you have to ask yourself is: Do I want to do coding by the book, or do I want to create a pragmatic approach to solve a customer's problem?
In this specific case the additional problem is, that the people who originally invented Extbase had a quite sophisticated and academic approach, but when it comes to a pragmatic use and performance, they were blocked by their own rules and stuck with coding by the book.
Especially this example and the warning message shows a way of thinking that was one of the reasons, why I never actually used Extbase but went for Core-API methods to create performant and pragmatic queries to get the desired result sets. Now that we've got Doctrine under the hood, this works like a charm even with quite exotic DB flavors.
Of course intermediate tables are a good idea and of course those intermediate tables can and should be enriched with additional data fields, that do not require a 3rd, 4th or nth table to store i.e. a simple set of dropdown options, since this can easily be handled with attributes configured in TCA, as it is shown here: https://docs.typo3.org/m/typo3/reference-tca/master/en-us/ColumnsConfig/Type/Inline/Examples.html
sys_file_reference is the most prominent example since it provides exactly that kind of additional information that should not be pumped into additional tables - and guess what, the TYPO3 core does not make use of a single line of Extbase code to deal with that data or almost any other data of the core tables.
To answer your last question: Take a look at the good old IRRE Tutorial to get a clue how to do m:n connections with intermediate inline tables.
https://docs.typo3.org/typo3cms/extensions/irre_tutorial/0.4.0/Manual/Index.html#intermediate-tables-for-m-n-relations
Depends on the issue, sometimes the intermediate table is an entity, sometimes not. In this example the intermediate table is the track, which would contain: [uid, cd, song, track_no, ... (whatever else needed to define the track)]
Be carefull when you define your data, that you do not make it too advanced.

How to get column name and data type returned by a custom query in postgres?

How to get column name and data type returned by a custom query in postgres? We have inbuilt functions for table/views but not for custom queries. For more clarification I would say that I need a postgres function which will take sql string as parameter and will return colnames and their datatype.
I don't think there's any built-in SQL function which does this for you.
If you want to do this purely at the SQL level, the simplest and cheapest way is probably to CREATE TEMP VIEW AS (<your_query>), dig the column definitions out of the catalog tables, and drop the view when you're done. However, this can have a non-trivial overhead depending on how often you do it (as it needs to write view definitions to the catalogs), can't be run in a read-only transaction, and can't be done on a standby server.
The ideal solution, if it fits your use case, is to build a prepared query on the client side, and make use of the metadata returned by the server (in the form of a RowDescription message passed as part of the query protocol). Unfortunately, this depends very much on which client library you're using, and how much of this information it chooses to expose. For example, libpq will give you access to everything, whereas the JDBC driver limits you to the public methods on its ResultSetMetadata object (though you could probably pull more information from its private fields via reflection, if you're determined enough).
If you want a read-only, low-overhead, client-independent solution, then you could also write a server-side C function to prepare and describe the query via SPI. Writing and building C functions comes with a bit of a learning curve, but you can find numerous examples on PGXN, or within Postgres' own contrib modules.

Detecting concurrent data modification of document between read and write

I'm interested in a scenario where a document is fetched from the database, some computations are run based on some external conditions, one of the fields of the document gets updated and then the document gets saved, all in a system that might have concurrent threads accessing the DB.
To make it easier to understand, here's a very simplistic example. Suppose I have the following document:
{
...
items_average: 1234,
last_10_items: [10,2187,2133, ...]
...
}
Suppose a new item (X) comes in, five things will need to be done:
read the document from the DB
remove the first (oldest) item in the last_10_items
add X to the end of the array
re-compute the average* and save it in items_average.
write the document to the DB
* NOTE: the average computation was chosen as a very simple example, but the question should take into account more complex operations based on data existing in the document and on new data (i.e. not something solvable with the $inc operator)
This certainly is something easy to implement in a single-threaded system, but in a concurrent system, if 2 threads would like to follow the above steps, inconsistencies might occur since both will update the last_10_items and items_average values without considering and/or overwriting the concurrent changes.
So, my question is how can such a scenario be handled? Is there a way to check or react-upon the fact that the underlying document was changed between steps 1 and 5? Is there such a thing as WATCH from redis or 'Concurrent Modification Error' from relational DBs?
Thanks
In database system,it uses a memory inspection and roll back scheme which is similar to transactional memory.
Briefly speaking, it simply monitors the share memory parts you specified and do something like compare and swap or load and link or test and set.
Therefore,if any memory content is changed during transaction,it will abort and try again until there is no conflict operation for that shared memory.
For example,GCC implements the following:
https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
type __sync_lock_test_and_set (type *ptr, type value, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
For more info about transactional memory,
http://en.wikipedia.org/wiki/Software_transactional_memory

variable table or column names in a function

I'm trying to search all tables and columns in a database, a la here. The suggested technique is to construct SQL query strings and then EXEC them. This works well, as a stored procedure. (Another example of variable table/column names is here. Again, EXEC is used to execute "dynamic SQL".)
However, my app requires that I do this in a function, not an SP. (Our development framework has trouble obtaining results from an SP.) But in a function, at least on SQL Server 2008 R2, you can't use EXEC; I get this error:
Invalid use of a side-effecting operator 'INSERT EXEC' within a function.
According to the answer to this post, apparently by a Microsoft developer, this is by design; it has nothing to do with the INSERT, only the fact that when you execute dynamically-constructed SQL code, the parser cannot guarantee a lack of side effects. Therefore it won't allow you to create such a function.
So... is there any way to iterate over many tables/columns within a function?
I see from BOL that
The following statements are valid in a function: ...
EXECUTE
statements calling extended stored procedures.
Huh - How could extended SP's be guaranteed side-effect free?
But that doesn't help me anyway:
The extended stored procedure, when it is called from inside a
function, cannot return result sets to the client. Any ODS APIs that
return result sets to the client will return FAIL. The extended stored
procedure could connect back to an instance of SQL Server; however, it
should not try to join the same transaction as the function that
invoked the extended stored procedure.
Since we need the function to return the results of the search, an ESP won't help.
I don't really want to get into extended SP's anyway: incrementing the number of programming languages in the environment would complicate our development environment more than it's worth.
I can think of a few solutions right now, none of which is very satisfactory:
First call an SP that produces the needed data and puts it in a table, then select from the function which merely reads the result from the table; this could be trouble if the search takes a while and two users' searches overlap. Or,
Have the application (not the function) generate a long query naming every table and column name from the db. I wonder if the JDBC driver can handle a query that long. Or,
Have the application (not the function) generate a long series of short queries naming every table and column name from the db. This will make the overall search a lot slower.
Thanks for any suggestions.
P.S. Upon further searching, I stumbled across this question which is closely related. It has no answers.
Update: No longer needed
I think this question is still valid, and we may again have a situation where we need it. However, I don't need an answer anymore for the present problem. After much trial-and-error I managed to get our application framework to retrieve row results from the RDBMS via the JDBC driver from the stored procedure. Therefore getting the thing to work as a function is unnecessary.
But if anyone posts an answer here that helps with the stated problem, I will be happy to upvote and/or accept it as appropriate.
An sp is basically a predefined sql statment with some add ons.
So if you had
PSEUDOCODE
Create SP_DoSomething As
Select * From MyTable
END
And you can't use the SP
Then you just execute the SQL as in "Select * From MyTable"
As for that naff sql code.
For start you could join table to column with a where clause, which would get rid of that line by line if stuff.
Ask another question. Like How could this be improved, there's lots of scope for more attempts than mine.

Qualified naming

I'm currently doing some maintenance on an application and I've come across a big issue regarding the qualified name in tsql. I wondering if someone could clear my confusion up for me.
From my understanding you want to use USE [DatabaseName] to declare which database you are using. I notice if u "rename" the databse it automatically updates these references in your code.
However, the developer who originally wrote this code used the USE [DatabaseName]. Then in later statements he wrote: SELECT * FROM [DatabaseName].[dbo].[Table]. Well this obviously breaks if I change the Database's name. From what i've read you want to qualify names only to the owner such as: [dbo].[TableName] so it knows where to look which increases performance.
Is there a reason he included the Database name in each statement?
From what i've read you want to qualify names only to the owner such as: [dbo].[TableName] so it knows where to look which increases performance.
Not that I'm aware of, rather it looks like someone is lazy.
I always use the three name format (unless accessing a linked server instance, then it's four).
The benefit is that the correct table from the correct database & schema will be used without concern for an errant USE [appropriate database] statement. As long as the object exists, and the permissions are valid based on the need, you can recreate a stored procedure, function, view, etc in other databases without needing to address the USE [appropriate database] statement each time.
But I'm working with data spread over numerous databases on the same instance. I wouldn't have necessarily designed it that way, but it wouldn't change that I use three (or four) part qualified name format.