I know that a program uses both temporal and spatial locality.
But how about the implementation of a computer memory in a memory hierarchy form? I was sure that only the temporal memory is used but the wiki about "Spatial and temporal locality usage" is pretty vague and it confused me.
Related
It has being shown that CRC32C provides better results (an improve Hamming distance and a faster implementation) than CRC32. Why Ethernet is stil using the old CRC 32 and not CRC32C?
"Faster implementation" is not true for the hardware that is normally used to implement the data link layer.
You may be referring to the fact that one particular processor architecture, x86-64, has a CRC instruction that uses the CRC-32C polynomial. However the ARM architecture (aarch64) CRC instruction uses the CRC-32 polynomial. Go figure.
One could argue that yet another polynomial should be used, since Koopman has characterized the performance of many polynomials with better performance than either of the ones you mention. But none of that really matters since ...
All of the legacy hardware has to support the original CRC, and there is little motivation to provide an alternate CRC would use would need to somehow be negotiated between the transmitter and receiver. There is not a noticeable performance advantage for typical noise sources, which are rare single bit errors.
I was trying to find evaluation mechanisms of collaborative K-Nearest neighbor algorithm, but i am confused how can I evaluate this algorithm. How can I be sure that the recommendation done by this algorithm is correct or good. Actually I have also developed an algorithm that i want to compare with it. but i am not sure how can i compare and evaluate both of them. The data set used by me is of movie lens.
your people help on evaluating this recomender system will be highly appreciated.
Evaluating recommender systems is a large concern of its research and industry communities. Look at "Evaluating collaborative filtering recommender systems", a Herlocker et al paper. The people who publish MovieLens data (the GroupLens research lab at the University of Minnesota) also publish many papers on recsys topics, and the PDFs are often free at http://grouplens.org/publications/.
Check out https://scholar.google.com/scholar?hl=en&q=evaluating+recommender+systems.
In short, you should use a method that hides some data. You will train your model on a portion of the data (called "training data") and test on the remainder of the data that your model has never seen before. There's a formal way to do this called cross-validation, but the general concept of visible training data versus hidden test data is the most important.
I also recommend https://www.coursera.org/learn/recommender-systems, a Coursera course on recommender systems taught by GroupLens folks. In that course you'll learn to use LensKit, a recommender systems framework in Java that includes a large evaluation suite. Even if you don't take the course, LensKit may be just what you want.
Hi I have collected some process data for 3 years and I want to mimic a EWMA prospective analysis, to see if my set smoothing parameter would have detect all the important changes (without too many false alarms).
It seems like most textbooks and literature that I have looked that use a mean and standard deviation to calculate the Control Limits. This is usually the "in-control" mean and standard deviation from some historical data, or the mean and sd of the population from which the samples are drawn. I don't have either information.
Is there another way to calculate the Control Limits?
Is there a variation of the EWMA chart that does not use mean and standard deviation?
Any creative ideas?
Thank you in advance
From a practical/operational perspective, the use of statistical analysis of historical data alone, is rare. Yes, it provides some guidance on how the process (and its control system) are performing, however the most important thing by far is to have a good understanding and knowledge of the "engineering limits".
I refer to the operational limits, which are determined by the specifications and performance characteristics of the various pieces of equipment. This allows one to develop a good understanding of how the process is supposed to behave (in terms of optimal operating point and upper/lower control limits) and where the areas of greatest deviation from optimal are. This has very little to do with statistical analysis of historical data, and a great deal to do with process engineering/metallurgy - depending on the type of process you are dealing with.
The control limits are ultimately determined from what the Process Manager / Process Engineer WANTS, which are usually (but not always) within the nameplate capacity of the equipment.
If you are working within the operational limits, and you are in the realm of process optimisation, then yes, statistical analysis is more widely used and can offer good insight. Depending upon the variability of your process, how well your control system is set up, and the homogeneity of your feed product, the upper/lower control limits that are selected will vary. A good starting point is the optimal operating point (e.g. 100 m3/hr), then use a sensible amount of historical data to calculate a standard deviation, and make your upper limit 100 + 1 standard dev, and your lower limit 100 - 1 standard dev. This is by no means a "hard and fast" rule, but it is a sensible starting point.
I was going through the K-means algorithm in mahout and when debugging, I noticed that when creating the first clusters it does this following code:
ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta);
ClusterClassifier prior = new ClusterClassifier(clusters, policy);
prior.writeToSeqFiles(priorClustersPath);
I was reading the description of these classes and it was not clear for me...
I was wondering what is the meaning of these cluster classifier and policy?
is it related with hierarchical clustering, centroid based clustering, distribution based
clustering etc?
Because I do not know what is the benefit or the reason of using this cluster classifier and policy when using K-means mahout implementation.
The implementation shares code with other variants of k-means and similar algorithms such as Canopy pre-clustering and GMM.
These classes encode only the difference between these algorithms.
Mahout is not a good place to study the k-means algorithm, the implementation is quite a mess. It's also slow. As in really really slow. Most of the time, a single CPU implementation will outright beat Mahout on anything that fits into memory. Maybe even on disk of a single machine. Because of all the map-reduce overhead.
Is there a machine learning concept (algorithm or multi-classifier system) that can detect the variance of network attacks(or try to).
One of the biggest problems for signature based intrusion detection systems is the inability to detect new or variant attacks.
Reading up, anomaly detection seems to still be a statistical based en-devour it refers to detecting patterns in a given data set which isn't the same as detecting variation in packet payloads. Anomaly based NIDS monitors network traffic and compares it against an established baseline of a normal traffic profile. The baseline characterizes what is "normal" for the network - such as the normal bandwidth usage, the common protocols used, correct combinations of ports numbers and devices etc
Say some one uses Virus A to propagate through a network then some one writes a rule to stop Virus A but another person writes a "variation" of Virus A called Virus B purely for the purposes of evading that initial rule but still using most if not all of the same tactics/code. Is there not a way to detect variance?
If there is whats the umbrella term it would come under, as ive been under the illusion that anomaly detection was it.
Could machine learning be used for pattern recognition(rather than pattern matching) at the packet payload level?
i think your intution to look at machine learning techniques is correct, or will turn out to be correct (One of the biggest problems for signature based intrusion detection systems is the inability to detect new or variant attacks.) The superior performance of ML techiques is in general due to the ability of these algorithms to generalize (a multiplicity of soft constraints rather than a few hard constraints). and to adapt (updates based on new training instances to frustrate simple countermeasures)--two attributes that i would imagine are crucial for identifying network attacks.
The theoretical promise aside, there are practical difficulties with applying ML techniques to problems like the one recited in the OP. By far the most significant is the difficultly in gathering data to train the classifier. In particular, reliably labeling data points as "intrusion" is probably not easy; likewise, my guess is that these instances are sparsely distributed in the raw data."
I suppose it's this limitation that has led to the increased interest (as evidenced at least by the published literature) in applying unsupervised ML techniques to problems like network intrusion detection.
Unsupervised techniques differ from supervised techniques in that the data is fed to the algorithms without a response variable (i.e., without the class labels). In these cases you are relying on the algorithm to discern structure in the data--i.e., some inherent ordering in the data into reasonably stable groups or clusters (possibly what you the OP had in mind by "variance." So with an unsupervised technique, there is no need to explicitly show the algorithm instances of each class, nor is it necessary to establish baseline measurements, etc.
The most frequently used unsupervised ML technique applied to problems of this type is probably the Kohonen Map (also sometimes called self-organizing map or SOM.)
i use Kohonen Maps frequently, but so far not for this purpose. There are however, numerous published reports of their successful application in your domain of interest, e.g.,
Dynamic Intrusion Detection Using Self-Organizing Maps
Multiple Self-Organizing Maps for Intrusion Detection
I know MATLAB has at least one available implementation of Kohonen Map--the SOM Toolbox. The homepage for this Toolbox also contains a brief introduction to Kohonen Maps.