How does the link type "adjusted complete" work for agglomerative hierachical clustering in WEKA? - cluster-analysis

The only descriptions I can find about "adjusted complete" linkage say something like: "same as complete linkage, but with largest within cluster distance"
What is meant by "within cluster distance"?
How is the distance between two clusters finally calculated using this linkage approach?
Thanks for your replies!

One of the great things about open-source software is that you can find out exactly how the software works. The code below shows Weka's source code of the HierarchicalClusterer algorithm, more specifically it shows the part which implements the COMPLETE and ADJCOMPLETE functionality. The difference is as follows:
Just like the COMPLETE linkage method, compute the maximum distance between one node from cluster 1 and one node from cluster 2 and store this in fBestDist
Then, find the largest distance between nodes within cluster 1 or cluster 2 and store this in fMaxDist
Finally subtract fMaxDist from fBestDist
So the distance between two clusters calculated using ADJCOMPLETE as linkType corresponds to the COMPLETE distance minus the largest distance between 2 nodes within either cluster 1 or cluster 2.
Adjusted Complete-Link was proposed in the following paper:
Sepandar Kamvar, Dan Klein and Christopher Manning (2002). Interpreting and Extending Classical Agglomerative Clustering Algorithms Using a Model-Based Approach. In Proceedings of 19th International Conference on Machine Learning (ICML-2002)
According to it (section 4.2), Adjusted Complete-Link is a version of Complete-Link which should be used if the clusters having varying radii (see Figure 10).
case COMPLETE:
case ADJCOMLPETE:
// find complete link distance aka maximum link, which is the largest distance between
// any item in cluster1 and any item in cluster2
fBestDist = 0;
for (int i = 0; i < cluster1.size(); i++) {
int i1 = cluster1.elementAt(i);
for (int j = 0; j < cluster2.size(); j++) {
int i2 = cluster2.elementAt(j);
double fDist = fDistance[i1][i2];
if (fBestDist < fDist) {
fBestDist = fDist;
}
}
}
if (m_nLinkType == COMPLETE) {
break;
}
// calculate adjustment, which is the largest within cluster distance
double fMaxDist = 0;
for (int i = 0; i < cluster1.size(); i++) {
int i1 = cluster1.elementAt(i);
for (int j = i+1; j < cluster1.size(); j++) {
int i2 = cluster1.elementAt(j);
double fDist = fDistance[i1][i2];
if (fMaxDist < fDist) {
fMaxDist = fDist;
}
}
}
for (int i = 0; i < cluster2.size(); i++) {
int i1 = cluster2.elementAt(i);
for (int j = i+1; j < cluster2.size(); j++) {
int i2 = cluster2.elementAt(j);
double fDist = fDistance[i1][i2];
if (fMaxDist < fDist) {
fMaxDist = fDist;
}
}
}
fBestDist -= fMaxDist;
break;

Related

How to calculate RMS in flutter?

Is there any technique or package that calculates Root Mean Square(RMS) in Flutter?
I searched many blogs and packages but didn't find useful resource related to RMS implementation in Dart
This is the steps of Root Mean Square according to the link in the below:
Step 1: Get the squares of all the values
Step 2: Calculate the average of the obtained squares
Step 3: Finally, take the square root of the average
and this is the Dart implementation :
import 'dart:math';
void main() {
List<int> values = [1,3,5,7,9];
num result = 0;
num rootMeanSquare = 0;
for( var i = 0 ; i < values.length; i++ ) {
result = result + (values[i] * values[i]);
}
rootMeanSquare = sqrt(result / values.length);
print(rootMeanSquare);
}
https://byjus.com/maths/root-mean-square/#:~:text=Root%20Mean%20Square%20Formula&text=For%20a%20group%20of%20n,x%20n%202%20%20N

Using zoomCallback, how can I "snap" the zoom to existing x values?

I'm trying to use the zoomCallback function to set up interaction between my dygrpahs chart and a map chart. My x values are timestamps in seconds but since the sample rate is about 100Hz the timestamps are stored as float numbers.
The goal is that when dygraphs chart is zoomed in, the new x1 and x2 will be used to extract a piece of GPS track (lat, lng points). The extracted track will be used to re-fit the map boundaries - this will look like a "zoom in" on the map chart.
In my dygraphs options I specified the callback:
zoomCallback: function(x1,x2) {
let x1Index = graphHolder.getRowForX(x1);
let x2Index = graphHolder.getRowForX(x2);
// further code
}
But it looks like the zoom is not "snapped" to existing timestamp points so both x1Index and x2Index are null. Only when I zoom out, they'll correctly point to row 0 and the last row of data.
So the question is - is there a way to make the zoom snap only to the nearest existing x value so the row number can be returned? Or, is there an alternative to do what I want?
Thanks for any insights!
You can access the x-axis values via g.getValue(row, 0). From this you can either do a linear scan to find the first row in the range or (fancier but faster) use a binary search.
Here's a way to do the linear scan:
const [x1, x2] = g.xAxisRange();
let letRow = null, highRow = null;
for (let i = 0; i < g.numRows(); i++) {
if (g.getValue(i, 0) >= x1) {
lowRow = i;
break;
}
}
for (let i = g.numRows() - 1; i >= 0; i--) {
if (g.getValue(i, 0) <= x2) {
highRow = i;
break;
}
}
const dataX1 = g.getValue(lowRow, 0);
const dataX2 = g.getValue(highRow, 0);
For larger data sets you might want to do a binary search using something like lodash's _.sortedIndex.
Update Here's a binary search implementation. No promises about the exact behavior on the boundaries (i.e. whether it always returns indices that are inside the visible range or indices which contain the visible range).
function dygraphBinarySearch(g, x) {
let low = 0;
let high = g.numRows() - 1;
while (high > low) {
let i = Math.floor(low + (high - low) / 2);
const xi = g.getValue(i, 0);
if (xi < x) {
low = i + 1;
} else if (xi > x) {
high = i - 1;
} else {
return i;
}
}
return low;
}
function getVisibleDataRange(g) {
const [x1, x2] = g.xAxisRange();
let lowI = dygraphBinarySearch(g, x1);
let highI = dygraphBinarySearch(g, x2);
return [lowI, highI];
}

Implementing convolution in C++ using fftw 3

UPDATE
See my fundamental based question on DSP stackexchange here
UPDATE
I am still experiencing crackling in the output. These crackles are now less pronounced and are only audible when the volume is turned up
UPDATE
Following the advice given here has removed the crackling sound from my output. I will test with other available HRIRs to see if the convolution is indeed working properly and will answer this question once I've verified that my code now works
UPDATE
I have made some progress, but I still think there is an issue with my convolution implementation.
The following is my revised program:
#define HRIR_LENGTH 512
#define WAV_SAMPLE_SIZE 256
while (signal_input_wav.read(&signal_input_buffer[0], WAV_SAMPLE_SIZE) >= WAV_SAMPLE_SIZE)
{
#ifdef SKIP_CONVOLUTION
// Copy the input buffer over
std::copy(signal_input_buffer.begin(),
signal_input_buffer.begin() + WAV_SAMPLE_SIZE,
signal_output_buffer.begin());
signal_output_wav.write(&signal_output_buffer[0], WAV_SAMPLE_SIZE);
#else
// Copy the first segment into the buffer
// with zero padding
for (int i = 0; i < HRIR_LENGTH; ++i)
{
if (i < WAV_SAMPLE_SIZE)
{
signal_buffer_fft_in[i] = signal_input_buffer[i];
}
else
{
signal_buffer_fft_in[i] = 0; // zero pad
}
}
// Dft of the signal segment
fftw_execute(signal_fft);
// Convolve in the frequency domain by multiplying filter kernel with dft signal
for (int i = 0; i < HRIR_LENGTH; ++i)
{
signal_buffer_ifft_in[i] = signal_buffer_fft_out[i] * left_hrir_fft_out[i]
- signal_buffer_fft_out[HRIR_LENGTH - i] * left_hrir_fft_out[HRIR_LENGTH - i];
signal_buffer_ifft_in[HRIR_LENGTH - i] = signal_buffer_fft_out[i] * left_hrir_fft_out[HRIR_LENGTH - i]
+ signal_buffer_fft_out[HRIR_LENGTH - i] * left_hrir_fft_out[i];
//double re = signal_buffer_out[i];
//double im = signal_buffer_out[BLOCK_OUTPUT_SIZE - i];
}
// inverse dft back to time domain
fftw_execute(signal_ifft);
// Normalize the data
for (int i = 0; i < HRIR_LENGTH; ++i)
{
signal_buffer_ifft_out[i] = signal_buffer_ifft_out[i] / HRIR_LENGTH;
}
// Overlap-add method
for (int i = 0; i < HRIR_LENGTH; ++i)
{
if (i < WAV_SAMPLE_SIZE)
{
signal_output_buffer[i] = signal_overlap_buffer[i] + signal_buffer_ifft_out[i];
}
else
{
signal_output_buffer[i] = signal_buffer_ifft_out[i];
signal_overlap_buffer[i] = signal_output_buffer[i]; // record into the overlap buffer
}
}
// Write the block to the output file
signal_output_wav.write(&signal_output_buffer[0], HRIR_LENGTH);
#endif
}
The resulting output sound file contains crackling sounds; presumably artefacts left from the buggy fftw implementation. Also, writing blocks of 512 (HRIR_LENGTH) seems to result in some aliasing, with the soundfile upon playback sounding like a vinyl record being slowed down. Writing out blocks of size WAV_SAMPLE_SIZE (256, half of the fft output) seems to playback at normal speed.
However, irrespective of this the crackling sound remains.
ORIGINAL
I'm trying to implement convolution using the fftw library in C++.
I can load my filter perfectly fine, and am zero padding both the filter (of length 512) and the input signal (of length 513) in order to get a signal output block of 1024 and using this as the fft size.
Here is my code:
#define BLOCK_OUTPUT_SIZE 1024
#define HRIR_LENGTH 512
#define WAV_SAMPLE_SIZE 513
#define INPUT_SHIFT 511
while (signal_input_wav.read(&signal_input_buffer[0], WAV_SAMPLE_SIZE) >= WAV_SAMPLE_SIZE)
{
#ifdef SKIP_CONVOLUTION
// Copy the input buffer over
std::copy(signal_input_buffer.begin(),
signal_input_buffer.begin() + WAV_SAMPLE_SIZE,
signal_output_buffer.begin());
signal_output_wav.write(&signal_output_buffer[0], WAV_SAMPLE_SIZE);
#else
// Zero pad input
for (int i = 0; i < INPUT_SHIFT; ++i)
signal_input_buffer[WAV_SAMPLE_SIZE + i] = 0;
// Copy to the signal convolve buffer
for (int i = 0; i < BLOCK_OUTPUT_SIZE; ++i)
{
signal_buffer_in[i] = signal_input_buffer[i];
}
// Dft of the signal segment
fftw_execute(signal_fft);
// Convolve in the frequency domain by multiplying filter kernel with dft signal
for (int i = 1; i < BLOCK_OUTPUT_SIZE; ++i)
{
signal_buffer_out[i] = signal_buffer_in[i] * left_hrir_fft_in[i]
- signal_buffer_in[BLOCK_OUTPUT_SIZE - i] * left_hrir_fft_in[BLOCK_OUTPUT_SIZE - i];
signal_buffer_out[BLOCK_OUTPUT_SIZE - i]
= signal_buffer_in[BLOCK_OUTPUT_SIZE - i] * left_hrir_fft_in[i]
+ signal_buffer_in[i] * left_hrir_fft_in[BLOCK_OUTPUT_SIZE - i];
double re = signal_buffer_out[i];
double im = signal_buffer_out[BLOCK_OUTPUT_SIZE - i];
}
// inverse dft back to time domain
fftw_execute(signal_ifft);
// Normalize the data
for (int i = 0; i < BLOCK_OUTPUT_SIZE; ++i)
{
signal_buffer_out[i] = signal_buffer_out[i] / i;
}
// Overlap and add with the previous block
if (first_block)
{
first_block = !first_block;
for (int i = 0; i < BLOCK_OUTPUT_SIZE; ++i)
{
signal_output_buffer[i] = signal_buffer_out[i];
}
}
else
{
for (int i = WAV_SAMPLE_SIZE; i < BLOCK_OUTPUT_SIZE; ++i)
{
signal_output_buffer[i] = signal_output_buffer[i] + signal_buffer_out[i];
}
}
// Write the block to the output file
signal_output_wav.write(&signal_output_buffer[0], BLOCK_OUTPUT_SIZE);
#endif
}
In the end, the resulting output file contains garbage, but is not all zeros.
Things I have tried:
1) Using the standard complex interface fftw_plan_dft_1d with the appropriate fftw_complex type. Same issues arise.
2) Using a smaller input sample size and iterating over the zero padded blocks (overlap-add).
I also note that its not a fault of libsndfile; toggling SKIP_CONVOLUTION does successfully result in copying the input file to the output file.

relation between harris detector results in matlab and opencv

I am working on corner feature detection using harris detector. I wrote program detect feature in image in matlab using following code to detect harris feature
corners = detectHarrisFeatures(img, 'MinQuality', 0.0001);
S = corners.selectStrongest(100);
then I transfer all program from matlab to opencv
I used following code to detect harris corner points
int thresh = 70;
for( int j = 0; j < dst_norm.rows && cont < 100; j++ )
{
for( int i = 0; i < dst_norm.cols && cont < 100; i++ )
{
if((int) dst_norm.at<float>(j, i) > thresh )
{
S.at<int>(cont, 0) = i;
S.at<int>(cont, 1) = j;
I.at<int>(cont, 0) = i;
I.at<int>(cont, 1) = j;
cont = cont + 1;
}
}
}
extracted region was different in both program and I discovered that harris detected corner points in matlab not as harris detected corner points in opencv.
How can I make detected corner points from both programs are same?
Is dst_norm an array of Harris corner metric values? In that case you are choosing first 100 pixels with the corner metric above the threshold, which is incorrect.
In your MATLAB code, detectHarrisFeatures finds points which are local maxima of the corner metric. Then selectStrongest method selects 100 of those points with the highest metric. So, first you have to find the local maxima. Then you have to sort them, and take the top 100.
Even then, the results will not be exactly the same, because detectHarrisFeatures locates the corners with sub-pixel accuracy, using interpolation.

Avg distance between points in a cluster

Sounds like I got the concept but cant seems to get the implementation correct. eI have a cluster (an ArrayList) with multiple points, and I want to calculate avg distance. Ex: Points in cluster (A, B, C, D, E, F, ... , n), Distance A-B, Distance A-C, Distance A-D, ... Distance A,N, Distance (B,C) Distance (B,D)... Distance (B,N)...
Thanks in advance.
You don't want to double count any segment, so your algorithm should be a double for loop. The outer loop goes from A to M (you don't need to check N, because there'll be nothing left for it to connect to), each time looping from curPoint to N, calculating each distance. You add all the distances, and divide by the number of points (n-1)^2/2. Should be pretty simple.
There aren't any standard algorithms for improving on this that I'm aware of, and this isn't a widely studied problem. I'd guess that you could get a pretty reasonable estimate (if an estimate is useful) by sampling distances from each point to a handful of others. But that's a guess.
(After seeing your code example) Here's another try:
public double avgDistanceInCluster() {
double totDistance = 0.0;
for (int i = 0; i < bigCluster.length - 1; i++) {
for (int j = i+1; j < bigCluster.length; j++) {
totDistance += distance(bigCluster[i], bigCluster[j]);
}
}
return totDistance / (bigCluster.length * (bigCluster.length - 1)) / 2;
}
Notice that the limit for the first loop is different.
Distance between two points is probably sqrt((x1 - x2)^2 + (y1 -y2)^2).
THanks for all the help, Sometimes after explaining the question on forum answer just popup to your mind. This is what I end up doing.
I have a cluster of point, and I need to calculate the avg distance of points (pairs) in the cluster. So, this is what I did. I am sure someone will come with a better answer if so please drop a note. Thanks in advance.
/**
* Calculate avg distance between points in cluster
* #return
*/
public double avgDistanceInCluster() {
double avgDistance = 0.0;
Stack<Double> holder = new Stack<Double>();
for (int i = 0; i < cluster.size(); i++) {
System.out.println(cluster.get(i));
for (int j = i+1; j < cluster.size(); j++) {
avgDistance = (cluster.get(i) + cluster.get(j))/2;
holder.push(avgDistance);
}
}
Iterator<Double> iter = holder.iterator();
double avgClusterDist = 0;
while (iter.hasNext()) {
avgClusterDist =+ holder.pop();
System.out.println(avgClusterDist);
}
return avgClusterDist/cluster.size();
}