Matlab HDF5: Read DIMENSION_LIST attribute - matlab

I'm trying to read HDF5 files with Matlab. I created the files in Fortran, which is only relevant in that I used h5dsattach_scale_f to attached scale datasets to each dimension of my given primary dataset. Most of my logic works well, but I'm having trouble reading the attributes of my primary dataset in order to get at the attached scales.
I start by iterating through each dataset in the file. Once I know I have my primary dataset, I iterate through its attributes with this call:
[status, index_out, SD] = H5A.iterate(dset_id, 'H5_INDEX_NAME', 'H5_ITER_NATIVE', 0, #hdf5_sds_attr_iter, SD);
That calls this function for every attribute:
function [status, SD] = hdf5_sds_attr_iter(dset_id, attr_name, info, SD)
status = 0;
disp(attr_name);
if ~strcmp(attr_name, 'DIMENSION_LIST')
return;
end
attr_id = H5A.open(dset_id, attr_name, 'H5P_DEFAULT');
space = H5A.get_space (attr_id);
[~, dims, ~] = H5S.get_simple_extent_dims(space);
info2 = H5A.get_info(attr_id);
disp(info2);
rdata = H5A.read(attr_id, 'H5ML_DEFAULT');
disp(rdata);
for i = 1:dims
disp(rdata{i});
end
H5S.close(space);
H5A.close(attr_id);
end
This is the output:
DIMENSION_LIST
3
corder_valid: 1
corder: 0
cset: 0
data_size: 48
[8x1 uint8]
[8x1 uint8]
[8x1 uint8]
184
17
0
0
0
0
0
0
32
28
0
0
0
0
0
0
240
29
0
0
0
0
0
0
If I do h5dump on the dataset, this is what that attribute looks like:
ATTRIBUTE "DIMENSION_LIST" {
DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
DATASPACE SIMPLE { ( 3 ) / ( 3 ) }
DATA {
(0): (DATASET 1400 /beamdata scale rank 1 ),
(1): (DATASET 6512 /beamdata scale rank 2 ),
(2): (DATASET 6976 /beamdata scale rank 3 )
}
}
Since those numbers (1400, 6512, 6976) do not appear elsewhere in the dump, I don't know how to use them or the output of H5A.read (rdata) to actually get at the scale data. The Matlab HDF5 documentation is rather silent on what to do with attribute data. Does anyone know how to process attribute reference data correctly?

Related

get values from a text file with a mix of floats and strings

I am struggling with a text file that I have to read in. In this file, there are two types of line:
133 0102764447 44 11 54 0.4 0 0.89 0 0 8 0 0 7 Attribute_Name='xyz' Type='string' 02452387764447 884
134 0102256447 44 1 57 0.4 0 0.81 0 0 8 0 0 1 864
What I want to do here is to textscan all the lines and then try to determine the number of 'xyz' (and the total number of lines).
I tried to use:
fileID = fopen('test.txt','r') ;
data=textscan(fileID, %d %d %d %d %d %d %d %d %d %d %d %d %d %s %s %d %d','\n) ;
And then I will try to access data{i,16} to count how many are equal to Attribute_Name='xyz', it doesnt seem to be an efficient though.
what will be a proper way to read the data(what interests me is to count how many Attribute_Name='xyz' do I have)? Thanks
You could simply use count which is referenced here.
In your case you could use it in this way:
filetext = fileread("test.txt");
A = count(filetext , "xyz")
fileread will read the whole text file into a single string. Afterwards you can process that string using count which will return the occurrences from the given pattern.
An alternative when using older versions of MATLAB is this one. It will work with R2006a and above.
filetext = fileread("test.txt");
A = length(strfind(filetext, "xyz");
strfind will return an array which length represents the amount of occurrences of the specified string. The length of that array can be accessed by length.
There is the option of strsplit. You may do something like the following:
count = 0;
fid = fopen('test.txt','r');
while ~feof(fid)
line = fgetl(fid);
words = strsplit( line )
ind = find( strcmpi(words{:},'Attribute_Name=''xyz'''), 1); % Assume only one instance per line, remove 1 for more and correct the rest of the code
if ( ind > 0 ) then
count = count + 1;
end if
end
So at the end count will give you the number.

Compute the Frequency of bigrams in Matlab

I am trying to compute and plot the distribution of bigrams frequencies
First I did generate all possible bigrams which gives 1296 bigrams
then i extract the bigrams from a given file and save them in words1
my question is how to compute the frequency of these 1296 bigrams for the file a.txt?
if there are some bigrams did not appear at all in the file, then their frequencies should be zero
a.txt is any text file
clear
clc
%************create bigrams 1296 ***************************************
chars ='1234567890abcdefghijklmonpqrstuvwxyz';
chars1 ='1234567890abcdefghijklmonpqrstuvwxyz';
bigram='';
for i=1:36
for j=1:36
bigram = sprintf('%s%s%s',bigram,chars(i),chars1(j));
end
end
temp1 = regexp(bigram, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp1(1:end-1)', temp1(2:end)','un',0);
bigrams = temp2;
bigrams = unique(bigrams);
bigrams = rot90(bigrams);
bigram = char(bigrams(1:end));
all_bigrams_len = length(bigrams);
clear temp temp1 temp2 i j chars1 chars;
%****** 1. Cleaning Data ******************************
collection = fileread('e:\a.txt');
collection = regexprep(collection,'<.*?>','');
collection = lower(collection);
collection = regexprep(collection,'\W','');
collection = strtrim(regexprep(collection,'\s*',''));
%*******************************************************
temp = regexp(collection, sprintf('\\w{1,%d}', 1), 'match');
temp2 = cellfun(#(x,y) [x '' y],temp(1:end-1)', temp(2:end)','un',0);
words1 = rot90(temp2);
%*******************************************************
words1_len = length(words1);
vocab1 = unique(words1);
vocab_len1 = length(vocab1);
[vocab1,void1,index1] = unique(words1);
frequencies1 = hist(index1,vocab_len1);
I. Character counting problem for a string
bsxfun based solution for counting characters -
counts = sum(bsxfun(#eq,[string1-0]',65:90))
Output -
counts =
2 0 0 0 0 2 0 1 0 0 ....
If you would like to get a tabulate output of counts against each letter -
out = [cellstr(['A':'Z']') num2cell(counts)']
Output -
out =
'A' [2]
'B' [0]
'C' [0]
'D' [0]
'E' [0]
'F' [2]
'G' [0]
'H' [1]
'I' [0]
....
Please note that this was a case-sensitive counting for upper-case letters.
For a lower-case letter counting, use this edit to this earlier code -
counts = sum(bsxfun(#eq,[string1-0]',97:122))
For a case insensitive counting, use this -
counts = sum(bsxfun(#eq,[upper(string1)-0]',65:90))
II. Bigram counting case
Let us suppose that you have all the possible bigrams saved in a 1D cell array bigrams1 and the incoming bigrams from the file are saved into another cell array words1. Let us also assume certain values in them for demonstration -
bigrams1 = {
'ar';
'de';
'c3';
'd1';
'ry';
't1';
'p1'}
words1 = {
'de';
'c3';
'd1';
'r9';
'yy';
'de';
'ry';
'de';
'dd';
'd1'}
Now, you can get the counts of the bigrams from words1 that are present in bigrams1 with this code -
[~,~,ind] = unique(vertcat(bigrams1,words1));
bigrams_lb = ind(1:numel(bigrams1)); %// label bigrams1
words1_lb = ind(numel(bigrams1)+1:end); %// label words1
counts = sum(bsxfun(#eq,bigrams_lb,words1_lb'),2)
out = [bigrams1 num2cell(counts)]
The output on code run is -
out =
'ar' [0]
'de' [3]
'c3' [1]
'd1' [2]
'ry' [1]
't1' [0]
'p1' [0]
The result shows that - First element ar from the list of all possible bigrams has no find in words1 ; second element de has three occurrences in words1 and so on.
Hey similar to Dennis solution you can just use histc()
string1 = 'ASHRAFF'
histc(string1,'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
this checks the number of entries in the bins defined by the string 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' which is hopefully the alphabet (just wrote it fast so no garantee). The result is:
Columns 1 through 21
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0
Columns 22 through 26
0 0 0 0 0
Just a little modification of my solution:
string1 = 'ASHRAFF'
alphabet1='A':'Z'; %%// as stated by Oleg Komarov
data=histc(string1,alphabet1);
results=cell(2,26);
for k=1:26
results{1,k}= alphabet1(k);
results{2,k}= data(k);
end
If you look at results now you can easily check rather it works or not :D
This answer creates all bigrams, loads in the file does a little cleanup, ans then uses a combination of unique and histc to count the rows
Generate all Bigrams
note the order here is important as unique will sort the array so this way it is created presorted so the output matches expectation;
[y,x] = ndgrid(['0':'9','a':'z']);
allBigrams = [x(:),y(:)];
Read The File
this removes capitalisation and just pulls out any 0-9 or a-z character then creates a column vector of these
fileText = lower(fileread('d:\loremipsum.txt'));
cleanText = regexp(fileText,'([a-z0-9])','tokens');
cleanText = cell2mat(vertcat(cleanText{:}));
create bigrams from file by shifting by one and concatenating
fileBigrams = [cleanText(1:end-1),cleanText(2:end)];
Get Counts
the set of all bigrams is added to our set (so the values are created for all possible). Then a value ∈{1,2,...,1296} is assigned to each unique row using unique's 3rd output. Counts are then created with histc with the bins equal to the set of values from unique's output, 1 is subtracted from each bin to remove the complete set bigrams we added
[~,~,c] = unique([fileBigrams;allBigrams],'rows');
counts = histc(c,1:1296)-1;
Display
to view counts against text
[allBigrams, counts+'0']
or for something potentially more useful...
[sortedCounts,sortInd] = sort(counts,'descend');
[allBigrams(sortInd,:), sortedCounts+'0']
ans =
or9
at8
re8
in7
ol7
te7
do6 ...
Did not look into the entire code fragment, but from the example at the top of your question, I think you are looking to make a histogram:
string1 = 'ASHRAFF'
nr = histc(string1,'A':'Z')
Will give you:
2 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
(Got a working solution with hist, but as #The Minion shows histc is more easy to use here.)
Note that this solution only deals with upper case letters.
You may want to do something like so if you want to put lower case letters in their correct bin:
string1 = 'ASHRAFF'
nr = histc(upper(string1),'A':'Z')
Or if you want them to be shown separately:
string1 = 'ASHRaFf'
nr = histc(upper(string1),['a':'z' 'A':'Z'])
bi_freq1 = zeros(1,all_bigrams_len);
for k=1: vocab_len1
for i=1:all_bigrams_len
if char(vocab1(k)) == char(bigrams(i))
bi_freq1(i) = frequencies1(k);
end
end
end

How to automatically loop over the combinations [duplicate]

This question already has answers here:
How to generate all pairs from two vectors in MATLAB using vectorised code?
(8 answers)
Closed 9 years ago.
New version:
The Edited Part of main program and function
AID=[30,50,52,53,54,56,57,72,75,77];
SID=[30,50,52,53,54,56,57,72,75,77];
[AID,SID]=meshgrid(AID,SID)
myfunction=#(SID,AID)myfunc(Blink,SID,AID);
[rss_dBm1,rss_dBm2,rss_dBm3,rss_dBm4,y1,y2,y3,y4]=arrayfun(Blink,AID,SID)
Function
function [rss_dBm1,rss_dBm2,rss_dBm3,rss_dBm4,y1,y2,y3,y4] = arrayfun(Blink,AID,SID)
for i=1:length(BlinkSetList)
S=cell2mat(BlinkSetList(i));
for j=1:length(S)
if S(j).AID==AID & S(j).SID==SID
if S(j).AnchorChan==0 & S(j).SourceChan==0
y=S(j).agc;
rss_dB1(i)= -(33+y*(89-33)/(29-1));
else
rss_dB1(i)=0;
isempty(rss_dB1(i))
end
if S(j).AnchorChan==0 & S(j).SourceChan==1
y=S(j).agc;
rss_dB2(i)= -(33+y*(89-33)/(29-1));
else
rss_dB2(i)=0;
isempty(rss_dB2(i))
end
if S(j).AnchorChan==1 & S(j).SourceChan==0
y=S(j).agc;
rss_dB3(i)= -(33+y*(89-33)/(29-1));
else
rss_dB3(i)=0;
isempty(rss_dB3(i))
end
if S(j).AnchorChan==1 & S(j).SourceChan==1
y=S(j).agc;
rss_dB4(i)= -(33+y*(89-33)/(29-1));
else
rss_dB4(i)=0;
isempty(rss_dB4(i))
end
end
end
end
rss_dB1(rss_dB1==0)=[];
rss_dB2(rss_dB2==0)=[];
rss_dB3(rss_dB3==0)=[];
rss_dB4(rss_dB4==0)=[];
y1=std(rss_dB1);
y2=std(rss_dB2);
y3=std(rss_dB3);
y4=std(rss_dB4);
rss_dBm1=sum(rss_dB1(:))/length(rss_dB1);
rss_dBm2=sum(rss_dB2(:))/length(rss_dB2);
rss_dBm3=sum(rss_dB3(:))/length(rss_dB3);
rss_dBm4=sum(rss_dB4(:))/length(rss_dB4);
disp([sprintf('The rssi value with A-Chan 0 and S-Chan 0 is %0.0f',rss_dBm1)]);
disp([sprintf('The rssi value with A-Chan 0 and S-Chan 1 is %0.0f',rss_dBm2)]);
disp([sprintf('The rssi value with A-Chan 1 and S-Chan 0 is %0.0f',rss_dBm3)]);
disp([sprintf('The rssi value with A-Chan 1 and S-Chan 1 is %0.0f',rss_dBm4)]);
and I get output followed by the error as
AID =
Columns 1 through 10
30 50 52........
30 50 52
30 50 52
.
.
SID =
Columns 1 through 10
30 30 30
50 50 50
52 52 52......
.
.
.
??? Undefined function or variable "rss_dB1".
Error in ==> arrayfun at 54
rss_dB1(rss_dB1==0)=[];
Error in ==> main_reduced at 38
[rss_dBm1,rss_dBm2,rss_dBm3,rss_dBm4,y1,y2,y3,y4]=arrayfun(Blink,AID,SID)
but I want my result as
Result for all combinations for example: AID=30 SID=50 , AID=50 SID=54 , AID= 54 SID=57 .......
The rss value with A-Chan 0 and S-Chan 0 is -68 % combination of AID=30 SID=50
The rss value with A-Chan 0 and S-Chan 1 is -73 % with all pairs of anchor and source channel (0,0),(0,1),(1,0),(1,1)
The rss value with A-Chan 1 and S-Chan 0 is -73
The rss value with A-Chan 1 and S-Chan 1 is -76
The rss value with A-Chan 0 and S-Chan 0 is -68 % combination of AID=50 SID=54
The rss value with A-Chan 0 and S-Chan 1 is -73% with all pairs of anchor and source channel (0,0),(0,1),(1,0),(1,1)
The rss value with A-Chan 1 and S-Chan 0 is -73
The rss value with A-Chan 1 and S-Chan 1 is -76
The rss value with A-Chan 0 and S-Chan 0 is -68 % combination of AID=54 SID=57
The rss value with A-Chan 0 and S-Chan 1 is -73 % with all pairs of anchor and source channel (0,0),(0,1),(1,0),(1,1)
The rss value with A-Chan 1 and S-Chan 0 is -73
The rss value with A-Chan 1 and S-Chan 1 is -76
rss_dBm1 =-68
rss_dBm2 =-72.8621
rss_dBm3 =-73
rss_dBm4 = -76
rss_dBm1 =-68
rss_dBm2 =-72.8621
rss_dBm3 =-73
rss_dBm4 = -76
rss_dBm1 =-68
rss_dBm2 =-72.8621
rss_dBm3 =-73
rss_dBm4 = -76
Note : At times the channel pair combinations or AID & SID combinations does not exist, so in that case it is simply returns NaN ( that's why I used isempty)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
first of all I appreciate whoever sees this post and try giving a solution.Thanks in advance
My questions is as follows,
A part of main Program
AID=30;
SID=50;
[rss_dBm1,rss_dBm2,rss_dBm3,rss_dBm4,y1,y2,y3,y4]=sample(Blink,AID,SID)
Note: The AID has different ID's as 30,50,52,54,55,57 (same as SID)
The SID has different ID's as 30,50,52,54,55,57 ( same as AID)
Here the AID and SID is manually entered from the user to check the below anchorchannel and source channel condition and check if such combination exist it will display the rss values if not it will return NaN.
calling function
function [rss_dBm1,rss_dBm2,rss_dBm3,rss_dBm4,y1,y2,y3,y4]=sample(Blink,AID,SID)
for i=1:length(Blink) %Blink=<500x1 cell> inside which several blinks are present
S=cell2mat(Blink(i)); % with information on AID,SID,agc
for j=1:length(S)
if S(j).AID==AID && S(j).SID==SID
if S(j).AnchorChannel==0 && S(j).SourceChannel==0 %Anchor-source channel
y=S(j).agc; %agc is present in every blink to calculate rss %combination
rss_dB1(i)= -(33+y*(89-33)/(29-1));
else
rss_dB1(i)=0;
isempty(rss_dB1(i))
end
if S(j).AnchorChannel==0 && S(j).SourceChannel==1
y=S(j).agc;
rss_dB2(i)= -(33+y*(89-33)/(29-1));
else
rss_dB2(i)=0;
isempty(rss_dB2(i))
end
if S(j).AnchorChannel==1 && S(j).SourceChannel==0
y=S(j).agc;
rss_dB3(i)= -(33+y*(89-33)/(29-1));
else
rss_dB3(i)=0;
isempty(rss_dB3(i))
end
if S(j).AnchorChan==1 && S(j).SourceChan==1
y=S(j).agc;
rss_dB4(i)= -(33+y*(89-33)/(29-1));
else
rss_dB4(i)=0;
isempty(rss_dB4(i))
end
end
end
end
rss_dB1(rss_dB1==0)=[];
rss_dB2(rss_dB2==0)=[];
rss_dB3(rss_dB3==0)=[];
rss_dB4(rss_dB4==0)=[];
y1=std(rss_dB1);
y2=std(rss_dB2);
y3=std(rss_dB3);
y4=std(rss_dB4);
rss_dBm1=sum(rss_dB1(:))/length(rss_dB1);
rss_dBm2=sum(rss_dB2(:))/length(rss_dB2);
rss_dBm3=sum(rss_dB3(:))/length(rss_dB3);
rss_dBm4=sum(rss_dB4(:))/length(rss_dB4);
disp([sprintf('The rss value with A-Chan 0 and S-Chan 0 is %0.0f',rss_dBm1)]);
disp([sprintf('The rss value with A-Chan 0 and S-Chan 1 is %0.0f',rss_dBm2)]);
disp([sprintf('The rss value with A-Chan 1 and S-Chan 0 is %0.0f',rss_dBm3)]);
disp([sprintf('The rss value with A-Chan 1 and S-Chan 1 is %0.0f',rss_dBm4)]);
Now my problem is how to automatically check the different combinations of AID and SID without giving the user input ?. if this make sense it should loop over every combinations and return the "rss" result for all possible combinations of AID SID with anchor channel and source channel
Result for one combination: AID=30 SID=50
The rss value with A-Chan 0 and S-Chan 0 is -68
The rss value with A-Chan 0 and S-Chan 1 is -73
The rss value with A-Chan 1 and S-Chan 0 is -73
The rss value with A-Chan 1 and S-Chan 1 is -76
rss_dBm1 =-68
rss_dBm2 =-72.8621
rss_dBm3 =-73
rss_dBm4 = -76
y1 = 1.4142
y2 = 1.4072
y3 = 0
y4 = 1.1547
The above is a result for one AID(30) SID(50) combination.But I want to loop over like AID=50 SID=52, AID=52 SID=55, AID=57 SID=54 so these are some examples of the pairs. I want the result to be like the above output, except it should also include the pairs I mentioned ,with four different channel combinations
Note:The output with above combination must also be included with below mentioned combinations too.
example:AID=50 SID=52,AID=52 SID=55,AID=57 SID=54 with anchor,source channel pairs (0,0),(0,1),(1,0),(1,1) in few cases the anchor source channel pairs does not exist so then it automatically returns '0'or'NaN'
Suppose you have a function that takes sid and aid arguments and returns a single struct with all the data you need:
res = function sample(Blink, AID, SID)
As I said, res here is a struct with fields rss_dBm1, rss_dBm2, etc...
And you also have two arrays:
SIDS = [30,50,52,54,55,57];
AIDS = [30,50,52,54,55,57];
To obtain all pairs of sids and aids you can use meshgrid function:
[sid aid] = meshgrid(SIDS, AIDS);
And to call your function for each pair you can use arrayfun function:
fn = #(sid, aid) sample(Blink, sid, aid);
data = arrayfun(fn, sid, aid);
data here is a length(SIDS) x length(AIDS) structure matrix. I used here an anonymous function (lambda expression) to do a partial application of the first argument of your sample function.
If you don't want to change you function, you can use it as it is:
fn = #(sid, aid) sample(Blink, sid, aid);
[rss_dBm1,rss_dBm2,rss_dBm3,rss_dBm4,y1,y2,y3,y4] = arrayfun(fn, sid, aid);
In this case each returned variable will be a length(SIDS) x length(AIDS) array.

Implementation of correlation matrix in MATLAB

I want to implement the following formula in MATLAB, where u_i^(k) means the i,k element. However, I get different results from the ones I compute by hand... I believe that something is wrong with my MATLAB code. For instance,
I should get:
L_ii =
0.1022 0 0
0 0.1657 0
0 0 2.7321
U_ij =
0.7514 0.3104 0.5823
-0.6513 0.4901 0.5793
-0.1055 -0.8145 0.5704
1,1=1-(0.1022*(+0.7514)^2+0.1657*(+0.3104)^2+2.7321*(+0.5823)^2)=-0.000049
2,2=1-(0.1022*(-0.6513)^2+0.1657*(+0.4901)^2+2.7321*(+0.5793)^2)=-0.000015
3,3=1-(0.1022*(-0.1055)^2+0.1657*(-0.8145)^2+2.7321*(+0.5704)^2)=+0.000030
Any ideas??? Please, help me fix Epsilon first (it might not need to move on Rho. Let's fix Epsilon first...)
EDIT: Here is a sample code:
E_squared_ii = ONES_j - diag(L_ii)' * (U_ij'.^ 2)
And here is the wrong result I get at the moment:
E_squared_ii =
1.0e-15 *
0.444089209850063 0.333066907387547 -0.222044604925031
If I use your values and code, I get the expected result:
>> L_ii
L_ii =
0.1022 0 0
0 0.1657 0
0 0 2.7321
>> U_ij
U_ij =
0.7514 0.3104 0.5823
-0.6513 0.4901 0.5793
-0.1055 -0.8145 0.5704
>> ONES_j
ONES_j =
1 1 1
>> E_squared_ii = ONES_j - diag(L_ii)' * (U_ij'.^ 2)
E_squared_ii =
1.0e-04 *
-0.4935 -0.1451 0.2985
Presumably this means that something isn't the value you think it is...

Why are XOR often used in java hashCode() but another bitwise operators are used rarely?

I often see code like
int hashCode(){
return a^b;
}
Why XOR?
Of all bit-operations XOR has the best bit shuffling properties.
This truth-table explains why:
A B AND
0 0 0
0 1 0
1 0 0
1 1 1
A B OR
0 0 0
0 1 1
1 0 1
1 1 1
A B XOR
0 0 0
0 1 1
1 0 1
1 1 0
As you can see for AND and OR do a poor job at mixing bits.
OR will on average produce 3/4 one-bits. AND on the other hand will produce on average 3/4 null-bits. Only XOR has an even one-bit vs. null-bit distribution. That makes it so valuable for hash-code generation.
Remember that for a hash-code you want to use as much information of the key as possible and get a good distribution of hash-values. If you use AND or OR you'll get numbers that are biased towards either numbers with lots of zeros or numbers with lots of ones.
XOR has the following advantages:
It does not depend on order of computation i.e. a^b = b^a
It does not "waste" bits. If you change even one bit in one of the components, the final value will change.
It is quick, a single cycle on even the most primitive computer.
It preserves uniform distribution. If the two pieces you combine are uniformly distributed so will the combination be. In other words, it does not tend to collapse the range of the digest into a narrower band.
More info here.
XOR operator is reversible, i.e. suppose I have a bit string as 0 0 1 and I XOR it with another bit string 1 1 1, the the output is
0 xor 1 = 1
0 1 = 1
1 1 = 0
Now I can again xor the 1st string with the result to get the 2nd string. i.e.
0 1 = 1
0 1 = 1
1 0 = 1
So, that makes the 2nd string a key. This behavior is not found with other bit operator
Please see this for more info --> Why is XOR used on Cryptography?
There is another use case: objects in which (some) fields must be compared without regarding their order. For example, if you want a pair (a, b) be always equal to the pair (b, a).
XOR has the property that a ^ b = b ^ a, so it can be used in hash function in such cases.
Examples: (full code here)
definition:
final class Connection {
public final int A;
public final int B;
// some code omitted
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Connection that = (Connection) o;
return (A == that.A && B == that.B || A == that.B && B == that.A);
}
#Override
public int hashCode() {
return A ^ B;
}
// some code omitted
}
usage:
HashSet<Connection> s = new HashSet<>();
s.add(new Connection(1, 3));
s.add(new Connection(2, 3));
s.add(new Connection(3, 2));
s.add(new Connection(1, 3));
s.add(new Connection(2, 1));
s.remove(new Connection(1, 2));
for (Connection x : s) {
System.out.println(x);
}
// output:
// Connection{A=2, B=3}
// Connection{A=1, B=3}