Should KDB views contain side effects or not? - kdb

In this article on views (https://code.kx.com/q/learn/views/) it is explicitly stated that views should not contain side effects.
However, in this article (https://code.kx.com/q/style/sam/) it is stated "SAM is an abstract model of q applications. Think of SAM as having an inner core and an outer layer. The inner core of SAM consists of variables and constants interconnected by views. All functions, and all views expressed in terms of them are completely free of side effects. All side effects in the core are explicitly located in views."
These 2 statements seem to conflict. Which is correct?

Trust the article on views.
Apologies for the confusion. The SAM article was adapted from a 1995 paper “Remarks on Style” written by Stevan Apter, and should probably have been omitted, as were sections on windowing. The style ‘remarks’ would be better hosted at GitHub.com/qbists and maintained by the community; I’ll see about moving them there.
Update 2023.02.10 Remarks on Style now moved to GitHub qbists/style, with a new section on trailing semicolons.
While the source for this material has long been on GitHub and open for contribution, I’m hoping hosting it in Qbists will attract more content. For example, what is good style with tables? With IPC?
Comments to librarian#code.kx.com; PRs to

Related

Attention Text Generation in Character-by-Character fashion

I am searching the web for a couple of days for any text generation model that would use only attention mechanisms.
The Transformer architecture that made waves in the context of Seq-to-Seq models is actually based solely on Attention mechanisms but is mainly designed and used for translation or chat bot tasks so it doesn't fit to the purpose, but the principle does.
My question is:
Does anyone knows or heard of a text generation model based solely on Attention without any recurrence?
Thanks a lot!
P.S. I'm familiar with PyTorch.
Building a character-level self-attentive model is a challenging task. Character-level models are usually based on RNNs. Whereas in a word/subword model, it is clear from the beginning what are the units carrying meaning (and therefore the units the attention mechanism can attend to), a character-level model needs to learn word meaning in the following layers. This makes it quite difficult for the model to learn.
Text generation models are nothing more than conditional languages model. Google AI recently published a paper on Transformer character language model, but it is the only work I know of.
Anyway, you should consider either using subwords units (as BPE, SentencePiece) or if you really need to go for character level, use RNNs instead.

Why was support for multiple shadow roots removed and replaced with slots

During the Web Applications WG (WebApps) Web Components meeting in Mountain View CA US on Friday 24 April 2015, it was concluded that support multiple shadow roots should be removed. As I understand it, slots are supposed to be used as an alternative to using multiple shadow roots. However, the link provided in the meeting notes explaining the reasoning why using slots is better has been removed and I could not find any other documentation on how and why this decision was made. I suspect that it has to do with the confusing nature of handling multiple shadow roots, but I'm not sure. I would appreciate any explanation of the reasons why support for multiple shadow roots removed.
TLDR; What reasons were given for removing support for multiple shadow roots and requiring the use of slots instead.
It's because it was complex to implement.
From the W3C Web Components wiki:
Pros: enables consistent story for adding shadow trees to builtins /
provides
reasoning about subclassing DOM trees
Cons:
complexity /
performance: may result in "submerged" trees that aren't rendered but
still participate in style/layout
Cost/benefit of change:
Disables the use case for general inheritance-based component
composition and Firefox UI in XBL) / Makes implementing Shadow DOM
easier

Migrating to Bounded Context

I currently have a Web API project which currently has all the system processing in the same solution. I'm breaking this out into separate solutions so that they can be ran independently (e.g. an Azure WebJob) as I don't want to have to redploy the Web project if something in the backend has changed.
My issue with this is that even though I have separated the logic they are tied together by a single context so that if I make a change in one I will have to redeploy all as the migrations won't match up.
So that's why I've been looking at Bounded Context and DDD. I'm looking at how to break this up but having trouble understanding how relationships work.
A lot of the site is administrative (i.e. creating entities, no actual processing) so was going to split contexts around this e.g.:
A user adds and maintains currency conversion rates (this is two entities in
total).
A user adds and maintains details on how to process payments (note that is is not processing payments, it only holds information about paypal account details etc).
So I was splitting the context's up by this, does this sound reasonable to start with (there are a lot more like this such as tax bands, charge structures etc)?
If this is the way to go, how do I handle relationships between those two contexts? As an example:
A payment method requires a link to an 'active' currency conversion. I understand I can just have this as an Id, but I need to check it's state so need access to the model.
A currency conversion can only be set to 'Inactive' if there are no payment methods currently using it. Again this needs access to the other model.
So logically the models need access to each other, how would this be included in the context? Can I add navigation properties to a model in a different context? Or should I add it as a separate DbSet and possibly map using a view?
Thanks
So I was splitting the context's up by this, does this sound
reasonable to start with (there are a lot more like this such as tax
bands, charge structures etc)?
"So that they can be ran and deployed independently" may not be a sufficient heuristic to tell when you should split Bounded Contexts. This addresses one aspect of the solution space, but if you haven't looked well enough at the problem space, you'll suffer from a misalignment between BC's and subdomains that can cause a lot of friction. You might end up always deploying a cluster of seemingly unrelated "independently deployable units" together because you didn't realize they talk about the same thing.
Identifying subdomains is the product of distillating your business - separating the big functional areas and defining which parts are your core domain and which are ancillary activities. Each subdomain has its own specific semantics (Ubiquitous Language). In your case, as has been pointed out in the comments, Currency Conversion and Payment Methods might well be part of the same subdomain (Payment?). It does not automatically mean that they should also be in the same BC but it might be a good idea to keep subdomains aligned 1-to-1 with BC's, as additional BC's come at a cost.
Back to deployability, even if it can be one beneficial effect of Bounded Contexts, they are not always so easily translatable in terms of independent units of deployment. Context mapping patterns (Shared Kernel, Customer Supplier, etc.) and BC communication in general can lead to a model, and therefore a part of a codebase, being shared by multiple BC's. Code and API synchronization issues arise that can question a simplistic "deployable free electron" view.
Just because you're using the Bounded Context approach doesn't mean you have to use DDD's tactical patterns (Aggregate Root, invariants, etc.) inside each BC. Using them should be an educated decision to trade solution space complexity off for problem space manageability. If "Currency Conversion can only be set to inactive..." is the only rule pertaining to payment method and currency management in your business, it might not be worth the bother to give that Bounded Context a full-fledged rich domain model. CRUD could be better suited there.

Which CRDTs can be used to implement a full-featured collaborative rich text editor?

I have been studying CRDTs and understand that they have been used to build collaborative editors, including Ritzy, TreeDoc, WOOT and Logoot.
I'm interested in building such an editor, and need to know if CRDTs are known to be able to handle this problem in its generality.
To elaborate: A rich text document (think html) has a tree structure, but the nodes are heterogeneous. There are block elements, inline elements, tables, lists and so on. Further, there may be styles and stylesheets (e.g. css) embedded in a document. Finally, undo is essential.
The editors listed above do not handle the more advanced features, such as tables, embedded stylesheets and undo/redo.
The Ritzy documentation links to a paper describing CRDT-based causal trees (pdf) but I don't really understand this paper.
What is the basic principle behind a causal tree CRDT? Is it powerful enough to handle the heterogeneous trees described above? Alternatively, are there other CRDTs that could handle this scenario?
The implementation of a CRDT for rich-text is not very straight forward. Some CRDTs can be used to build trees. So the naive approach for rich-text would be to build it as a tree. A node would then represent a block of text with formats such as 'italic'. In order to format text, you usually have to delete it, and insert a new node with that format. But this does not always work as expected: For example, if two users concurrently format the same text, the formatted text is inserted two times after convergence (User1 deletes text, and inserts a new node. User2 deletes the same text, and inserts a new node). To my knowledge there are no CRDTs that solve this problem.
Actually a CRDT for linear structure does completely suffice. You can realize formats as markers (i.e. format start, and format end). This also has the advantage that you get the expected result when two users concurrently format/insert text.
For a working implementation of this approach you can check out Yjs. The examples section contains a working example of a rich text editor.
(Full disclosure: I am the author of Yjs)

What type of UML Diagram am I need to use in such case?

Just for my personal wiki, I want to draw a diagram that shows how a message is processed via a couple of Message Queues.
(Like, xml message comes from source1 to the Queue1, then it is passed to a system where the message is converted into another format and...)
What kind of UML diagram should I need here?
And additionally, how do I show a Queue in UML?
Order and time are best seen in an Sequence diagram. Also the communication between the different parts (source, queues) and parameters will be visible.
A queue is just an object (the squares at the top of the diagram).
Sequence Diagrams are a good choice but they have limitations when used for interactions with a large number of steps. They excel at describing the steps to a single operation, such that the actors are related to the behavior required. I try not to left any single sequence diagram take up more than one page. If I need more, I break it up into two serarate diagrams because I'm usually wasting whitespace due to the calling depth and the interacting quickly becomes harder to understand instead of easier.
You might use two types of diagrams. On a system-level diagram, show the interaction between the queues (or their hosts), and on a Sequence Diagram show the steps taken within a single host.
I think that the best good is an activity diagram. In my view it is the best way too show process flow. Sequence diagrams are harder to understand and also has a lot of clutter (the lifelines) which just bother the reader. And having two diagrams just makes things complicated