The issue of defining size of an alphabet in an number system - numbers

I ask for helping with next question:
I've got difficulties when studying the question.
The difficulties is that, firstly, i can't understand the building of the thought line when we define the set of number alphabet (example 0,1,2...9)
Nk = Pk + 1 / Pk, if Pk+1 / Pk divided completely, else (Pk+1 / Pk) + 1
It's the way of defining the set of alphabet in any position number system. Could you please make the representation with commentary of the line. In common case how it will be looking if one encounters with the task of defining such a set

Related

Trouble Understanding Xcodemath error messages

I have been working to understand why I am getting the error messages shown in the attachment.
The bottom-most message that indicates a comma is needed makes no sense to me at all.
The other two messages may well be related to a problem with data types, but I cannot determine what data type rules I have violated.
Many thanks for your time and attention.
It's a few different errors cropping up, and the error about the separator is not really indicative of the problem.
SecondPartFraction is being declared twice. If those are meant to be two different variables, they should have two different names. If you simply wish to reassign a new value to SecondPartFraction, just drop the var off the second time you use it (as is has already been declared, you simply need to refer to it again).
Doubles and Ints can't be together for division, so that error is correct. If you want to get a Double result, just change the 16 to 16.0. Then the compiler won't complain.
The numbers you're getting are originating from a textfield too, which might cause some problems. If the user enters text into your textfields, instead of numbers, the app will crash since StepFirstPart and StepSecondPart are force unwrapped. You will probably want to do some kind of Optional Chaining to handle the case where the entry is not numeric.
In the last line, the label text is being set to an Int - in order to do this, you'll have to use string interpolation instead, since the text for a label must be a string rather than a number:
TotalNumRisers.text = "\(TotalRisers)"
Just one last quick note - in Swift, camel casing is standard for naming, so the first letter of every variable should be lowercase, then the others upper. So StepFirstPart would become stepFirstPart instead.
You create the same variable twice here e.x
var x = 0
var x = value + x
instead it should be
var x = 0
x = value + x // remove var from here

Question about OCL invariants on a diagram

I'm new to OCL and currently trying to figure out how to do invariants.
I attached a picture with the diagramm I'm working on.
https://imgur.com/1ucZq5w
The invariants that I'm trying to resolve are :
a) A player has 0 or 2 cards in hand.
Context Player
inv i1: self.card->size()=0 or self.card->size()=2
b) A player, who has not played any rounds, can't have more Game Capital than the maximal Buy-In of the table.
Context Player
inv i2: self.numberOfRounds=0 implies (self.gameCapital < self.Table.maxBuyIn)
c) At every table can be only players that belong to different users
Context Player
inv i3: Player.UserAccount.allInstances().userID->isUnique()
I'm not sure if 'allInstances()' is supposed to go after Player or after PlayerAccount.
And I don't know what I'm supposed to do with the 'At every table' part of the text.
There are two more points that I really don't know how to do.
d) In the deck are 52 cards, which differ from eachother through color or value
e) The inputs of all players that still have cards in the hand are equal when bidDone True.
Can you please tell me if what I've done until now is correct and maybe some advice or solution for d) and e)?
Any help is appreciated!
Seems plausible, but I would recommend sensible names, since a validation tool will tend to report that e.g Constraint Player::i2 is not satisfied for ...
b) looks to have a < / <= bug
c) allInstances takes a type source so "Player." is wrong. allInstances is generally very inefficient to execute so should only be used as a last resort. In your case it is clearly wrong since your scope is "at every Table". You should be using context Table and then reasoning about the players at the table.
d) if you rephrase "differ from" as "is unique with respect to", you can perhaps see how you could use a Tuple of color+value as the basis for uniqueness.
e) no idea what an input is, but it just seems like a cascade of implies clauses.

why do vertices need to have properties for edges?

I found the discussion at https://groups.google.com/forum/#!topic/orient-database/Y0QJiXk7d1I to be very useful to help me set up a strict schema with edges in it. This is my code
val fileLink = schema.createClass(DefinedInS.label, g.getEdgeBaseType())
fileLink.setStrictMode(true)
fileLink.createProperty("out", OType.LINK, fqnSymbol).setNotNull(true)
fileLink.createProperty("in", OType.LINK, fileCheck).setNotNull(true)
fqnSymbol.createProperty("out_" + DefinedInS.label, OType.LINKBAG).setNotNull(true)
fileCheck.createProperty("in_" + DefinedInS.label, OType.LINKBAG).setNotNull(true)
but I am confused why I need the last two lines at all, aren't they redundant (or at least implied by the fileLink properties?). Could somebody please explain why they are needed?
In addition, for this example I want exactly one link from a fqnSymbol to a fileCheck but this seems to required that LINKBAG is used (it fails if I use LINK). Is that something I should be allowed to do?
Futhermore, is there any performance benefit to be gained from adding an index on the edge? My usecase is such that I will always have a fqnSymbol at hand when I want to lookup a fileCheck.
I raised https://github.com/orientechnologies/orientdb/issues/5494 to request better documentation in this area.
When one creates an edge (that is, an instance of E), the points of connection are stored at both endpoints (the vertices):
(vertex) -> [edge] -> (vertex)
It's my understanding that if the edge is an immediate instance of E, then those endpoints are properties named out_ and in_. (Similarly, if they are immediate instances of some subclass, say EE, of E, then they would be named out_EE and in_EE.) Often these details don't matter (e.g. outE() collects all outgoing edges), but sometimes they do (as when defining constraints on properties).
Regarding the multiplicity constraint:
I want exactly one link from a fqnSymbol to a fileCheck ...
This constraint can be enforced (at least to a degree) using MIN and MAX:
alter property fqnSymbol.out_ MIN 1;
alter property fqnSymbol.out_ MAX 1;
(Fortunately, the MIN and MAX constraints won't prevent an fqnSymbol vertex from being created in the first place :-)
Tighter enforcement may require writing hooks or triggers.

Fastest possible string key lookup for known set of keys

Consider a lookup function with the following signature, which needs to return an integer for a given string key:
int GetValue(string key) { ... }
Consider furthermore that the key-value mappings, numbering N, are known in advance when the source code for function is being written, e.g.:
// N=3
{ "foo", 1 },
{ "bar", 42 },
{ "bazz", 314159 }
So a valid (but not perfect!) implementation for the function for the input above would be:
int GetValue(string key)
{
switch (key)
{
case "foo": return 1;
case "bar": return 42;
case "bazz": return 314159;
}
// Doesn't matter what we do here, control will never come to this point
throw new Exception();
}
It is also known in advance exactly how many times (C>=1) the function will be called at run-time for every given key. For example:
C["foo"] = 1;
C["bar"] = 1;
C["bazz"] = 2;
The order of such calls is not known, however. E.g. the above could describe the following sequence of calls at run-time:
GetValue("foo");
GetValue("bazz");
GetValue("bar");
GetValue("bazz");
or any other sequence, provided the call counts match.
There is also a restriction M, specified in whatever units is most convenient, defining the upper memory bound of any lookup tables and other helper structures that can be used by the GetValue (the structures are initialized in advance; that initialization is not counted against the complexity of the function). For example, M=100 chars, or M=256 sizeof(object reference).
The question is, how to write the body of GetValue such that it is as fast as possible - in other words, the aggregate time of all GetValue calls (note that we know the total count, per everything above) is minimal, for given N, C and M?
The algorithm may require a reasonable minimal value for M, e.g. M >= char.MaxValue. It may also require that M be aligned to some reasonable boundary - for example, that it may only be a power of two. It may also require that M must be a function of N of a certain kind (for example, it may allow valid M=N, or M=2N, ...; or valid M=N, or M=N^2, ...; etc).
The algorithm can be expressed in any suitable language or other form. For runtime performance constrains for generated code, assume that the generated code for GetValue will be in C#, VB or Java (really, any language will do, so long as strings are treated as immutable arrays of characters - i.e. O(1) length and O(1) indexing, and no other data computed for them in advance). Also, to simplify this a bit, answers which assume that C=1 for all keys are considered valid, though those answers which cover the more general case are preferred.
Some musings on possible approaches
The obvious first answer to the above is using a perfect hash, but generic approaches to finding one seem to be imperfect. For example, one can easily generate a table for a minimal perfect hash using Pearson hashing for the sample data above, but then the input key would have to be hashed for every call to GetValue, and Pearson hash necessarily scans the entire input string. But all sample keys actually differ in their third character, so only that can be used as the input for the hash instead of the entire string. Furthermore, if M is required to be at least char.MaxValue, then the third character itself becomes a perfect hash.
For a different set of keys this may no longer be true, but it may still be possible to reduce the amount of characters considered before the precise answer can be given. Furthermore, in some cases where a minimal perfect hash would require inspecting the entire string, it may be possible to reduce the lookup to a subset, or otherwise make it faster (e.g. a less complex hashing function?) by making the hash non-minimal (i.e. M > N) - effectively sacrificing space for the sake of speed.
It may also be that traditional hashing is not such a good idea to begin with, and it's easier to structure the body of GetValue as a series of conditionals, arranged such that the first checks for the "most variable" character (the one that varies across most keys), with further nested checks as needed to determine the correct answer. Note that "variance" here can be influenced by the number of times each key is going to be looked up (C). Furthermore, it is not always readily obvious what the best structure of branches should be - it may be, for example, that the "most variable" character only lets you distinguish 10 keys out of 100, but for the remaining 90 that one extra check is unnecessary to distinguish between them, and on average (considering C) there are more checks per key than in a different solution which does not start with the "most variable" character. The goal then is to determine the perfect sequence of checks.
You could use the Boyer search, but I think that the Trie would be a much more effiecent method. You can modify the Trie to collapse the words as you make the hit count for a key zero, thus reducing the number of searches you would have to do the farther down the line you get. The biggest benefit you would get is that you are doing array lookups for the indexes, which is much faster than a comparison.
You've talked about a memory limitation when it comes to precomputation - is there also a time limitation?
I would consider a trie, but one where you didn't necessarily start with the first character. Instead, find the index which will cut down the search space most, and consider that first. So in your sample case ("foo", "bar", "bazz") you'd take the third character, which would immediately tell you which string it was. (If we know we'll always be given one of the input words, we can return as soon as we've found a unique potential match.)
Now assuming that there isn't a single index which will get you down to a unique string, you need to determine the character to look at after that. In theory you precompute the trie to work out for each branch what the optimal character to look at next is (e.g. "if the third character was 'a', we need to look at the second character next; if it was 'o' we need to look at the first character next) but that potentially takes a lot more time and space. On the other hand, it could save a lot of time - because having gone down one character, each of the branches may have an index to pick which will uniquely identify the final string, but be a different index each time. The amount of space required by this approach would depend on how similar the strings were, and might be hard to predict in advance. It would be nice to be able to dynamically do this for all the trie nodes you can, but then when you find you're running out of construction space, determine a single order for "everything under this node". (So you don't end up storing a "next character index" on each node underneath that node, just the single sequence.) Let me know if this isn't clear, and I can try to elaborate...
How you represent the trie will depend on the range of input characters. If they're all in the range 'a'-'z' then a simple array would be incredibly fast to navigate, and reasonably efficient for trie nodes where there are possibilities for most of the available options. Later on, when there are only two or three possible branches, that becomes wasteful in memory. I would suggest a polymorphic Trie node class, such that you can build the most appropriate type of node depending on how many sub-branches there are.
None of this performs any culling - it's not clear how much can be achieved by culling quickly. One situation where I can see it helping is when the number of branches from one trie node drops to 1 (because of the removal of a branch which is exhausted), that branch can be eliminated completely. Over time this could make a big difference, and shouldn't be too hard to compute. Basically as you build the trie you can predict how many times each branch will be taken, and as you navigate the trie you can subtract one from that count per branch when you navigate it.
That's all I've come up with so far, and it's not exactly a full implementation - but I hope it helps...
Is a binary search of the table really so awful? I would take the list of potential strings and "minimize" them, the sort them, and finally do a binary search upon the block of them.
By minimize I mean reducing them to the minimum they need to be, kind of a custom stemming.
For example if you had the strings: "alfred", "bob", "bill", "joe", I'd knock them down to "a", "bi", "bo", "j".
Then put those in to a contiguous block of memory, for example:
char *table = "a\0bi\0bo\0j\0"; // last 0 is really redundant..but
char *keys[4];
keys[0] = table;
keys[1] = table + 2;
keys[2] = table + 5;
keys[3] = table + 8;
Ideally the compiler would do all this for you if you simply go:
keys[0] = "a";
keys[1] = "bi";
keys[2] = "bo";
keys[3] = "j";
But I can't say if that's true or not.
Now you can bsearch that table, and the keys are as short as possible. If you hit the end of the key, you match. If not, then follow the standard bsearch algorithm.
The goal is to get all of the data close together and keep the code itty bitty so that it all fits in to the CPU cache. You can process the key from the program directly, no pre-processing or adding anything up.
For a reasonably large number of keys that are reasonably distributed, I think this would be quite fast. It really depends on the number of strings involved. For smaller numbers, the overhead of computing hash values etc is more than search something like this. For larger values, it's worth it. Just what those number are all depends on the algorithms etc.
This, however, is likely the smallest solution in terms of memory, if that's important.
This also has the benefit of simplicity.
Addenda:
You don't have any specifications on the inputs beyond 'strings'. There's also no discussion about how many strings you expect to use, their length, their commonality or their frequency of use. These can perhaps all be derived from the "source", but not planned upon by the algorithm designer. You're asking for an algorithm that creates something like this:
inline int GetValue(char *key) {
return 1234;
}
For a small program that happens to use only one key all the time, all the way up to something that creates a perfect hash algorithm for millions of strings. That's a pretty tall order.
Any design going after "squeezing every single bit of performance possible" needs to know more about the inputs than "any and all strings". That problem space is simply too large if you want it the fastest possible for any condition.
An algorithm that handles strings with extremely long identical prefixes might be quite different than one that works on completely random strings. The algorithm could say "if the key starts with "a", skip the next 100 chars, since they're all a's".
But if these strings are sourced by human beings, and they're using long strings of the same letters, and not going insane trying to maintain that data, then when they complain that the algorithm is performing badly, you reply that "you're doing silly things, don't do that". But we don't know the source of these strings either.
So, you need to pick a problem space to target the algorithm. We have all sorts of algorithms that ostensibly do the same thing because they address different constraints and work better in different situations.
Hashing is expensive, laying out hashmaps is expensive. If there's not enough data involved, there are better techniques than hashing. If you have large memory budget, you could make an enormous state machine, based upon N states per node (N being your character set size -- which you don't specify -- BAUDOT? 7-bit ASCII? UTF-32?). That will run very quickly, unless the amount of memory consumed by the states smashes the CPU cache or squeezes out other things.
You could possibly generate code for all of this, but you may run in to code size limits (you don't say what language either -- Java has a 64K method byte code limit for example).
But you don't specify any of these constraints. So, it's kind of hard to get the most performant solution for your needs.
What you want is a look-up table of look-up tables.
If memory cost is not an issue you can go all out.
const int POSSIBLE_CHARCODES = 256; //256 for ascii //65536 for unicode 16bit
struct LutMap {
int value;
LutMap[POSSIBLE_CHARCODES] next;
}
int GetValue(string key) {
LutMap root = Global.AlreadyCreatedLutMap;
for(int x=0; x<key.length; x++) {
int c = key.charCodeAt(x);
if(root.next[c] == null) {
return root.value;
}
root = root.next[c];
}
}
I reckon that it's all about finding the right hash function. As long as you know what the key-value relationship is in advance, you can do an analysis to try and find a hash function to meet your requrements. Taking the example you've provided, treat the input strings as binary integers:
foo = 0x666F6F (hex value)
bar = 0x626172
bazz = 0x62617A7A
The last column present in all of them is different in each. Analyse further:
foo = 0xF = 1111
bar = 0x2 = 0010
bazz = 0xA = 1010
Bit-shift to the right twice, discarding overflow, you get a distinct value for each of them:
foo = 0011
bar = 0000
bazz = 0010
Bit-shift to the right twice again, adding the overflow to a new buffer:
foo = 0010
bar = 0000
bazz = 0001
You can use those to query a static 3-entry lookup table. I reckon this highly personal hash function would take 9 very basic operations to get the nibble (2), bit-shift (2), bit-shift and add (4) and query (1), and a lot of these operations can be compressed further through clever assembly usage. This might well be faster than taking run-time infomation into account.
Have you looked at TCB . Perhaps the algorithm used there can be used to retrieve your values. It sounds a lot like the problem you are trying to solve. And from experience I can say tcb is one of the fastest key store lookups I have used. It is a constant lookup time, regardless of the number of keys stored.
Consider using Knuth–Morris–Pratt algorithm.
Pre-process given map to a large string like below
String string = "{foo:1}{bar:42}{bazz:314159}";
int length = string.length();
According KMP preprocessing time for the string will take O(length).
For searching with any word/key will take O(w) complexity, where w is length of the word/key.
You will be needed to make 2 modification to KMP algorithm:
key should be appear ordered in the joined string
instead of returning true/false it should parse the number and return it
Wish it can give a good hints.
Here's a feasible approach to determine the smallest subset of chars to target for your hash routine:
let:
k be the amount of distinct chars across all your keywords
c be the max keyword length
n be the number of keywords
in your example (padded shorter keywords w/spaces):
"foo "
"bar "
"bazz"
k = 7 (f,o,b,a,r,z, ), c = 4, n = 3
We can use this to compute a lower bound for our search. We need at least log_k(n) chars to uniquely identify a keyword, if log_k(n) >= c then you'll need to use the whole keyword and there's no reason to proceed.
Next, eliminate one column at a time and check if there are still n distinct values remaining. Use the distinct chars in each column as a heuristic to optimize our search:
2 2 3 2
f o o .
b a r .
b a z z
Eliminate columns with the lowest distinct chars first. If you have <= log_k(n) columns remaining you can stop. Optionally you could randomize a bit and eliminate the 2nd lowest distinct col or try to recover if the eliminated col results in less than n distinct words. This algorithm is roughly O(n!) depending on how much you try to recover. It's not guaranteed to find an optimal solution but it's a good tradeoff.
Once you have your subset of chars, proceed with the usual routines for generating a perfect hash. The result should be an optimal perfect hash.

Documenting Scala functional chains [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Scala (and functional programming, in general), advocates a style of programming where you produce functional "chains" of the form
collection.operation1(...).operation2(...)...
where the operations are various combinations of map, filter, etc.
Where the equivalent Java code might require 50 lines, the Scala code can be done in 1 or 2 lines. The functional chain can change an input collection to something completely different.
The disadvantage of the Scala code is that 10 minutes later (never mind 6 months later), I can't figure out what I was thinking, because the notation is so compact, and lacks type information (because of implied types).
How do you document this? Do you put a large block comment before the chain, changing an elegant 1 line solution into a bulky 40 line solution consisting of 39 lines of comment? Do you intersperse your comments like this?
collection.
// Select the items that meet condition X
filter(predicate_function).
// Change these items from A's to B's
map(transformation_function).
// etc.
Something else? No documentation? (Leave them guessing. They'll never "downsize" you then, because no one else can maintain the code. :-))
If you find yourself writing comments at that detail level, you're just repeating what the code says.
For long functional chains, define new functions to replace parts of the chain. Give these meaningful names. Then you might be able to avoid comments. The names of these functions themselves should explain what they do.
The best comments are the ones that explain why the code does something. Well-written code should make the "how" obvious from the code itself.
I don't write that code to begin with (unless it's a script for one-time use or playing around in the REPL).
If I can explain what the code does in one comment and the reads okay, then I keep it as a one liner:
// Find all real-valued square roots and group them in integer bins
ds.filter(_ >= 0).map(math.sqrt).groupBy(_.toInt).map(_._2)
If I can't understand this by reading carefully through the chain of commands, then I should break it up more into functionally distinct units. For example, if I expected someone to not realize that the square root of a negative number is not real-valued, I would say:
// Only non-negative numbers have a real-valued square root
val nonneg = ds.filter(_ >= 0)
// Find square roots and group them in integer bins
nonneg.map(math.sqrt).groupBy(_.toInt).map(_._2)
In particular, if someone doesn't know the Scala collections library well, and doesn't have the patience to spend five to ten minutes understanding one line of code, then either they shouldn't be working on my code (nor on anything else that accomplishes something nontrivial that they don't understand and don't have the patience to understand), or I should know in advance that I'm providing an e.g. language and mathematics tutorial in addition to writing working code, either by writing a paragraph explaining how the following line works, or breaking it out command by command, or including comments at the start of each anonymous function explaining what is going on (as appropriate).
Anyway, if you can't understand what it does, you probably need some intermediate values. They are very helpful for mental-resetting ("I can't see how to get from A to C!...but...okay, I can understand A to B. And I can understand B to C.")
If your chained operations are all monadic transforms: map, flatMap, filter, then it's often much, much clearer to rewrite the logic as a for-comprehension.
coll.filter(predicate).map(transform)
could become
for(elem <- coll if predicate) yield transform(elem)
it's even easier to show off the power of the technique if you have a longer sequence of operations, such as with Kassen's example:
def eligibleCustomers(products: Seq[Product]) = for {
product <- products
customer <- product.customers
paying <- customer if customer.isPremium
eligible <- paying if paying.age < 20
} yield eligible
If you don't want to split it in multiple methods as hammar suggested you can split the line and give the intermediate values names (and optionally types).
def eligibleCustomers: List[Customer] = {
val customers = products.flatMap(_.customers)
val paying = customers.filter(_.isPremium)
val eligible = paying.filter(_.age < 20)
eligible
}
The linelength is a somehow natural indicator, when your chain is getting too long. :)
Of course, it will depend upon how trivial the chain is:
customerdata.filter (_.age < 40).filter (_.city == "Rio").
filter (_.income > 3000).filter (_.joined < 2005)
filter (_.sex == 'f'). ...
I recently had your impression, where an application of 3 files, one of them a bit lengthy, consisting of 4 classes, one of them not trivial, and of about 10 to 20 methods. Each method was about 5 to 10 lines, and each 2 of them could have been easily combined to a lager one, but I had to convince myself, that although measuring the elegance in spared lines of codes isn't completely wrong, sparing lines isn't the goal itself.
But splitting a method into two often makes complexity per line lower, but not the overall complexity, to understand the whole program.
If the problem domain is complex - filter data at different levels, rowwise, columnwise, map it, group it, build averages, build graphs, paginate them ... - the complicated job has to be done somewhere.
The program isn't more easy to understand, you just have to hit page down less often. It is a readjustment, that you have to read a line of code more slowly.
It doesn't bother me that much now I'm used to Scala. If you want to be more explicit with types, you can always, for example, replace things like map(_.foo) with map { a:A => a.foo } to make the code more readable in lengthy/complex operations. Not that I usually find the need to do that.