Field size: natural language proportion of ems ('W's or 'm's) - user-input

This has been a dilemma for me for years, it just occurred that I should ask.
Say I have a character limit for an input or textarea I want to allow enough visible area for the user to enter the maximum amount number of characters.
Limit: 10
iiiiiiiiii
vs.
wwwwwwwww
Initially I would guess the size based on lorum or random text.
If you think it through: the maximum size is based on the number of em sized characters in the text.
So I went through a phase of making the minimum visible field size em x character limit, though obviously this is ridiculous as it's impossible to construct a sentence made entirely of 'w's and 'm's and textareas get stupidly big.
So my question is: what is the absolute most em-sized characters you could possibly put in a sentence?
Or how would you go about finding this information? As a developer I put the question here, but is there a language-person-overflow that would know?
Would a woodchuck be it?:
How much wood would a woodchuck chuck If a woodchuck would chuck wood? A woodchuck would chuck all the wood he could chuck If a woodchuck would chuck wood.
The proportion here is 14(ems)/155(chars) -- say just under 10%.
Is there some more technical way to determine this? Is there some outlier circumstance where there would be more than this ~10% (say www.www.com)?
Hopefully with this information I can then make inputs/textareas approximately as big as they need to be: no bigger; no smaller.

Related

Why does my Google Optimize experiment show no clear winner

I did a very simple text, the footer contact form on the left of the website or the right of the website. The results showed "no clear winner". But the below data shows that one has 5 conversions vs 1, which I consider to be significant (albeit low numbers). It also says there is a 95% probability that this one will be better.
What am I not understanding about this data? Is it that the numbers are too low in volume to give a reading or is it a bug or is there something I've missed?
Its probably because your AB Test did not have a lot of traffic, in each variant. So 5 conversions vs 1, is not really a big difference between the two.

Purpose of bitcoin mining puzzle

I understand how Bitcoin mining requires a long effort to guess the nounce until one is able to produce hash with leading zeros.
I have two particular questions here --
Why is Bitcoin mining made so computationally expensive in the first place? If the purpose is to just choose a random winner for block placement, why not use a simple and faster proof-of-work algorithm? (one example could be to generate a random number between 0-1 and the one with the smallest/largest value wins the round). By making the puzzle less computationally expensive, we should save lot of electric energy globally.
Is there any specific advantage of choosing a puzzle to produce resulting hash with leading zeros?
The difficulty of the algorithm is precisely what makes it difficult to cheat/steal on the Bitcoin network. If the algorithm was easy, then anyone could recreate old blocks and delete old spends so it looks like they never spent any Bitcoin after they buy something, for example. The purpose is not to pick a random winner, the purpose is to reward the miners doing the most work. It's true the winner is random, but the more work you do (more hashpower) the higher the probability that you will win. The probability is equal to the proportion of hashpower you're spending in relation to the total hashpower of the network.
Leading zeros is not what makes the hash valid, it just has to be under a threshold value. Leading zeros just happens because the number is low. It's like writing 1000 or 001000, it's still the same number, but the hash is 32 bytes so the leading zeros are there so you can see all 32 bytes.
I highly recommend reading the Bitcoin Whitepaper on proof-of-work. Also check out the Bitcoin Wiki - PoW

Image based steganography that survives resizing?

I am using a startech capture card for capturing video from the source machine..I have encoded that video using matlab so every frame of that video will contain that marker...I run that video on the source computer(HDMI out) connected via HDMI to my computer(HDMI IN) once i capture the frame as bitmap(1920*1080) i re-size it to 1280*720 i send it for processing , the processing code checks every pixel for that marker.
The issue is my capture card is able to capture only at 1920*1080 where as the video is of 1280*720. Hence in order to retain the marker I am down scaling the frame captured to 1280*720 which in turn alters the entire pixel array I believe and hence I am not able to retain marker I fed in to the video.
In that capturing process the image is going through up-scaling which in turn changes the pixel values.
I am going through few research papers on Steganography but it hasn't helped so far. Is there any technique that could survive image resizing and I could retain pixel values.
Any suggestions or pointers will be really appreciated.
My advice is to start with searching for an alternative software that doesn't rescale, compress or otherwise modify any extracted frames before handing them to your control. It may save you many headaches and days worth of time. If you insist on implementing, or are forced to implement a steganography algorithm that survives resizing, keep on reading.
I can't provide a specific solution because there are many ways this can be (possibly) achieved and they are complex. However, I'll describe the ingredients a solution will most likely involve and your limitations with such an approach.
Resizing a cover image is considered an attack as an attempt to destroy the secret. Other such examples include lossy compression, noise, cropping, rotation and smoothing. Robust steganography is the medicine for that, but it isn't all powerful; it may be able to provide resistance to only specific types attacks and/or only small scale attacks at that. You need to find or design an algorithm that suits your needs.
For example, let's take a simple pixel lsb substitution algorithm. It modifies the lsb of a pixel to be the same as the bit you want to embed. Now consider an attack where someone randomly applies a pixel change of -1 25% of the time, 0 50% of the time and +1 25% of the time. Effectively, half of the time it will flip your embedded bit, but you don't know which ones are affected. This makes extraction impossible. However, you can alter your embedding algorithm to be resistant against this type of attack. You know the absolute value of the maximum change is 1. If you embed your secret bit, s, in the 3rd lsb, along with setting the last 2 lsbs to 01, you guarantee to survive the attack. More specifically, you get xxxxxs01 in binary for 8 bits.
Let's examine what we have sacrificed in order to survive such an attack. Assuming our embedding bit and the lsbs that can be modified all have uniform probabilities, the probability of changing the original pixel value with the simple algorithm is
change | probability
-------+------------
0 | 1/2
1 | 1/2
and with the more robust algorithm
change | probability
-------+------------
0 | 1/8
1 | 1/4
2 | 3/16
3 | 1/8
4 | 1/8
5 | 1/8
6 | 1/16
That's going to affect our PSNR quite a bit if we embed a lot of information. But we can do a bit better than that if we employ the optimal pixel adjustment method. This algorithm minimises the Euclidean distance between the original value and the modified one. In simpler terms, it minimises the absolute difference. For example, assume you have a pixel with binary value xxxx0111 and you want to embed a 0. This means you have to make the last 3 lsbs 001. With a naive substitution, you get xxxx0001, which has a distance of 6 from the original value. But xxx1001 has only 2.
Now, let's assume that the attack can induce a change of 0 33.3% of the time, 1 33.3% of the time and 2 33.3%. Of that last 33.3%, half the time it will be -2 and the other half it will be +2. The algorithm we described above can actually survive a +2 modification, but not a -2. So 16.6% of the time our embedded bit will be flipped. But now we introduce error correcting codes. If we apply such a code that has the potential to correct on average 1 error every 6 bits, we are capable of successfully extracting our secret despite the attack altering it.
Error correction generally works by adding some sort of redundancy. So even if part of our bit stream is destroyed, we can refer to that redundancy to retrieve the original information. Naturally, the more redundancy you add, the better the error correction rate, but you may have to double the redundancy just to improve the correction rate by a few percent (just arbitrary numbers here).
Let's appreciate here how much information you can hide in a 1280x720 (grayscale) image. 1 bit per pixel, for 8 bits per letter, for ~5 letters per word and you can hide 20k words. That's a respectable portion of an average novel. It's enough to hide your stellar Masters dissertation, which you even published, in your graduation photo. But with a 4 bit redundancy per 1 bit of actual information, you're only looking at hiding that boring essay you wrote once, which didn't even get the best mark in the class.
There are other ways you can embed your information. For example, specific methods in the frequency domain can be more resistant to pixel modifications. The downside of such methods are an increased complexity in coding the algorithm and reduced hiding capacity. That's because some frequency coefficients are resistant to changes but make embedding modifications easily detectable, then there are those that are fragile to changes but they are hard to detect and some lie in the middle of all of this. So you compromise and use only a fraction of the available coefficients. Popular frequency transforms used in steganography are the Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT).
In summary, if you want a robust algorithm, the consistent themes that emerge are sacrificing capacity and applying stronger distortions to your cover medium. There have been quite a few studies done on robust steganography for watermarks. That's because you want your watermark to survive any attacks so you can prove ownership of the content and watermarks tend to be very small, e.g. a 64x64 binary image icon (that's only 4096 bits). Even then, some algorithms are robust enough to recover the watermark almost intact, say 70-90%, so that it's still comparable to the original watermark. In some case, this is considered good enough. You'd require an even more robust algorithm (bigger sacrifices) if you want a lossless retrieval of your secret data 100% of the time.
If you want such an algorithm, you want to comb the literature for one and test any possible candidates to see if they meet your needs. But don't expect anything that takes only 15 lines to code and 10 minutes of reading to understand. Here is a paper that looks like a good start: Mali et al. (2012). Robust and secured image-adaptive data hiding. Digital Signal Processing, 22(2), 314-323. Unfortunately, the paper is not open domain and you will either need a subscription, or academic access in order to read it. But then again, that's true for most of the papers out there. You said you've read some papers already and in previous questions you've stated you're working on a college project, so access for you may be likely.
For this specific paper, table 4 shows the results of resisting a resizing attack and section 4.4 discusses the results. They don't explicitly state 100% recovery, but only a faithful reproduction. Also notice that the attacks have been of the scale 5-20% resizing and that only allows for a few thousand embedding bits. Finally, the resizing method (nearest neighbour, cubic, etc) matters a lot in surviving the attack.
I have designed and implemented ChromaShift: https://www.facebook.com/ChromaShift/
If done right, steganography can resiliently (i.e. robustly) encode identifying information (e.g. downloader user id) in the image medium while keeping it essentially perceptually unmodified. Compared to watermarks, steganography is a subtler yet more powerful way of encoding information in images.
The information is dynamically multiplexed into the Cb Cr fabric of the JPEG by chroma-shifting pixels to a configurable small bump value. As the human eye is more sensitive to luminance changes than to chrominance changes, chroma-shifting is virtually imperceptible while providing a way to encode arbitrary information in the image. The ChromaShift engine does both watermarking and pure steganography. Both DRM subsystems are configurable via a rich set of of options.
The solution is developed in C, for the Linux platform, and uses SWIG to compile into a PHP loadable module. It can therefore be accessed by PHP scripts while providing the speed of a natively compiled program.

Choosing a minimum hash size for a given allowable number of collisions

I am parsing a large amount of network trace data. I want to split the trace into chunks, hash each chunk, and store a sequence of the resulting hashes rather than the original chunks. The purpose of my work is to identify identical chunks of data - I'm hashing the original chunks to reduce the data set size for later analysis. It is acceptable in my work that we trade off the possibility that collisions occasionally occur in order to reduce the hash size (e.g. 40 bit hash with 1% misidentification of identical chunks might beat 60 bit hash with 0.001% misidentification).
My question is, given a) number of chunks to be hashed and b) allowable percentage of misidentification, how can one go about choosing an appropriate hash size?
As an example:
1,000,000 chunks to be hashed, and we're prepared to have 1% misidentification (1% of hashed chunks appear identical when they are not identical in the original data). How do we choose a hash with the minimal number of bits that satisifies this?
I have looked at materials regarding the Birthday Paradox, though this is concerned specifically with the probability of a single collision. I have also looked at materials which discuss choosing a size based on an acceptable probability of a single collision, but have not been able to extrapolate from this how to choose a size based on an acceptable probability of n (or fewer) collisions.
Obviously, the quality of your hash function matters, but some easy probability theory will probably help you here.
The question is what exactly are you willing to accept, is it good enough that you have an expected number of collisions at only 1% of the data? Or, do you demand that the probability of the number of collisions going over some bound be something? If its the first, then back of the envelope style calculation will do:
Expected number of pairs that hash to the same thing out of your set is (1,000,000 C 2)*P(any two are a pair). Lets assume that second number is 1/d where d is the the size of the hashtable. (Note: expectations are linear, so I'm not cheating very much so far). Now, you say you want 1% collisions, so that is 10000 total. Well, you have (1,000,000 C 2)/d = 10,000, so d = (1,000,000 C 2)/10,000 which is according to google about 50,000,000.
So, you need a 50 million ish possible hash values. That is a less than 2^26, so you will get your desired performance with somewhere around 26 bits of hash (depending on quality of hashing algorithm). I probably have a factor of 2 mistake in there somewhere, so you know, its rough.
If this is an offline task, you cant be that space constrained.
Sounds like a fun exercise!
Someone else might have a better answer, but I'd go the brute force route, provided that there's ample time:
Run the hashing calculation using incremental hash size and record the collision percentage for each hash size.
You might want to use binary search to reduce the search space.

What's the biggest number in a computer?

Just asked by my 5 year old kid: what is the biggest number in the computer?
We are not talking about max number for a specific data types, but the biggest number that a computer can represent.
Infinity is not allowed.
UPDATE my kid always wants to print as
well, so lets say the computer needs
to print this number and the kid to
know that its a big number. Of course,
in practice we won't print because
theres not enough trees.
This question is actually a very interesting one which mathematicians have devoted a fair bit of thought to. You can read about it in this article, which is a fascinating and accessible read.
Briefly, a guy named Tibor Rado set out to find some really big, but still well-defined, numbers by defining a sequence called the Busy Beaver numbers. He defined BB(n) to be the largest number of steps any Turing Machine could take before halting, given an input of n symbols. Note that this sequence is by its very nature not computable, so the numbers themselves, while well-defined, are very difficult to pin down. Here are the first few:
BB(1) = 1
BB(2) = 6
BB(3) = 21
BB(4) = 107
... wait for it ...
BB(5) >= 8,690,333,381,690,951
No one is sure how big exactly BB(5) is, but it is finite. And no one has any idea how big BB(6) and above are. But at least these numbers are completely well-defined mathematically, unlike "the largest number any human has ever thought of, plus one." ;)
So how about this:
The biggest number a computer can represent is the most instructions a program small enough to fit in its available memory can perform before halting.
Squared.
No, wait, cubed. No, raised to the power of itself!
Dammit!
Bits are not numbers. You, as a programmer, give them the meaning you want, possibly numbers.
Now, I decide that 1 represents "the biggest number ever thought by a human plus one".
Errr this is a five year old?
How about something along the lines of: "I'd love to tell you but the number is so big and would take so long to say, I'd die before I finished telling you".
// wait to see
for(;;)
{
printf("9");
}
roughly 2^AVAILABLE_MEMORY_IN_BITS
EDIT: The above is for actually storing a number and treats all media (RAM, HD, cloud etc.) as memory. Subtracting the OS footprint (measured in KB) doesn't make "roughly" less accurate...
If you want to "represent" a number in a meaningful way, then you probably want to go with what the CPU provides: unsigned 32 bit integers (roughly 4 Gigs) or unsigned 64 bit integers for most computers your kid will come into contact with.
NOTE for talking to 5-year-olds: Often, they just want a factoid. Give him a really big and very accurate number (lots of digits), like 4'294'967'295. Then, once the glazing leaves his eyes, try to see how far you can get with explaining how computers represent numbers.
EDIT #2: I once read this article: Who Can Name the Bigger Number that should provide a whole lot of interesting information for your kid. Obviously he's not your normal five-year-old. So this might get you started in a cool direction about numbers and computation.
The answer to life (and this kids question): 42
That depends on the datatype you use to represent it. The computer only stores bits (0/1). We, as developers, give the bits meaning. (65 can be a number or the letter A).
For example, I can define my datatype as 1^N where N is unsigned and represented by an array of bits of arbitrary size. The next person can come up with 10^N which would be ten times larger than my biggest number.
Sure, there would be gaps but if you don't need them, that doesn't matter.
Therefore, the question is meaningless since it doesn't have context.
Well I had the same question earlier this day, so thought why not to make a little c++ codes to see where the computer gonna stop ...
But my laptop wasn't with me in class so I used another, well the number was to big but it never ends, i'll run it again for a night then i'll share the number
you can try the code is stupid
#include <stdlib.h>
#include <stdio.h>
int main() {
int i = 0;
for (i = 0; i <= i; i++) {
printf("%i\n", i);
i++;
}
}
And let it run till it stops ^^
The size will obviously be limited by the total size of hard drives you manage to put into your PC. After all, you can store a number in a text file occupying all disk space.
You can have 4x2Tb drives even in a simple box so around 8Tb available. if you store as binary, then the biggest number is 2 pow 64000000000000.
If your hard drive is 1 TB (8'000'000'000'000 bits), and you would print the number that fits on it on paper as hex digits (nobody would do that, but let's assume), that's 2,000,000,000,000 hex digits.
Each page would contain 4000 hex digits (40 x 100 digits). That's 500,000,000 pages.
Now stack the pages on top of each other (let's say each page is 0.004 inches / 0.1 mm thick), then the stack would be as 5 km (about 3 miles) tall.
I'll try to give a practical answer.
Common Lisp number crunching is particularly powerful. It has something called "bignums" which are integers that can be arbitrarily large, limited by the amount of available.
See: http://en.wikibooks.org/wiki/Common_Lisp/Advanced_topics/Numbers#Fixnums_and_Bignums
Don't know much about theory, but I far as I understood from your question, is: what is the largest number that the computer can represent (and I add: in a reasonable time, and not printing "9" until the Earth will "be eaten by the Sun"). And I put my PC to make one simple calculation (in PHP or whatever language): echo pow(2,1023) - resulting: 8.9884656743116E+307. So I guess this is the largest number that my PC can calculate. On the other side, I think the respresentation of the largest negative number can be: -0,(0)1
LE: That computed value was obataind through PHP, but I tried to figure out what's the largest number that my windows calculator can compute, and it is pow(2, 33219) = 8.2304951207588748764521361245002E+9999. Now I guess this is the largest number my PC can handle.
I think you should be very proud that your 5 year old is already asking questions like this.
And you should continue to promote that! This is truly amazing! With that said, I would say that saying Infinity does not
count is thinking incorrectly about what numbers mean in computer memory.
I feel like this way of thinking is a handicap.
Mathematicians will never be able to write out ALL the digits of pi or eulers number, BUT we FULLY understand it.
Pi, as an example, is perfectly represented by infinite this series: (Pi / 4) = 1 - 1/3 + 1/5 - 1/7 + 1/9 - …
Just because you literally can’t go to inf. or print every single digit in a console means nothing.
You could have printed the symbol representing pi and therefore capturing the inf. series.
Computer Algebra Systems (CAS) represent numbers symbolically all the time. Pi, for instance,
may be a Symbolic object in memory (the binary in memory did not DIRECTLY represent the number. It represents an "mathematical algorithm" for producing the answer to arbitrary precision).
Then you do some math with it, transforming from one expression to the next.
At no point in time did we not represent the number COMPLETELY.
At the end, you can do 2 things with this:
A) Evaluate the expression, turning it into a number of some kind (or Matrix or whatever).
BUT this number could very well be an approximation (say like 20 digits of pi).
B) Keep it in its symbolic form for reference. Obviously we don’t like staring at symbols because we
need to eventually turn the nobs on the apparatii.
NOTE: sometimes you can get a finite (non-irrational) number perfectly represented in memory (like number 1)
by taking limits or going to inf. Not literally having an inf. number in memory, but symbolically representing it.
Just throw this in Wolfram alpha: Lim[Exp[-x], x --> Inf]; It gives you the number 0. Which is EXACT.
In short:
It was the HUMANS need to have some binary in memory that DIRECTLY represented the number that caused
the number to degrade. Symbolically it was perfectly represented. You could design some algorithm that
just continues to calculate the next digits of pi or eulers number giving you an arbitrary amount of precision (Now, this is obviously not practical of course).
I hope this was at least somewhat useful or interesting to you, even if you disagree =)
Depends on how much the computer can handle. Although there are some times when the computer can handle numbers greater than (2^(bits-1)-1)... For example:
My computer is 64 bit (9223372036854775807), however the calculator that comes with the computer itself can handle numbers of up to 10^9999.
Many other supercomputers can exceed these limits, and the one with the most memory (bits) might as well be the one with the record (current largest number that can be held by computers).
Or, if it comes to visually seeing it on computers, you can just make a program that, on monitor, repeats writing 9 and not skips that line to form an ever-growing bunch of 9. :P
go on chrome then go on three dots above and click them then go on tools and then go on developer tool click on console and type Number.MAX_VALUE