Similarity Hash function(simhash) - hash

I have a problem with using hash function. I have to assign some number(128 bit or 64 bit) with every word in the document. So, the hash value of "similarity" must be near with "similar". That means, if has value of similarity=>10022(say) then similar=>10025. which should near with similar word. also the hash value of different name should similar. that means, hash value of "john" also should be near about with " michel" or "sita"... so on. If any body have any idea about it.
Thanks in advanced. :)

it's not working in that way , first you have to find the general model for the sample value of available data, and then use it for the streaming log messages.

there is a library called OpenNLP, so by using this library you can know what type of word is it. then as you said that for the similar word like names, there can be write hash function in which name or verbs and so one can get the similar hash value.
thanks.

Related

Reading CSV file with Spring batch and map to Domain objects based on the the first field and then insert them in DB accordingly [duplicate]

How can we implement pattern matching in Spring Batch, I am using org.springframework.batch.item.file.mapping.PatternMatchingCompositeLineMapper
I got to know that I can only use ? or * here to create my pattern.
My requirement is like below:
I have a fixed length record file and in each record I have two fields at 35th and 36th position which gives record type
for example below "05" is record type which is at 35th and 36th position and total length of record is 400.
0000001131444444444444445589868444050MarketsABNAKKAAAAKKKA05568551456...........
I tried to write regular expression but it does not work, i got to know only two special character can be used which are * and ? .
In that case I can only write like this
??????????????????????????????????05?????????????..................
but it does not seem to be good solution.
Please suggest how can I write this solution, Thanks a lot for help in advance
The PatternMatchingCompositeLineMapper uses an instance of org.springframework.batch.support.PatternMatcher to do the matching. It's important to note that PatternMatcher does not use true regular expressions. It uses something closer to ant patterns (the code is actually lifted from AntPathMatcher in Spring Core).
That being said, you have three options:
Use a pattern like you are referring to (since there is no short hand way to specify the number of ? that should be checked like there is in regular expressions).
Create your own composite LineMapper implementation that uses regular expressions to do the mapping.
For the record, if you choose option 2, contributing it back would be appreciated!

Swift3 URL from URLComponents, how to add OR query items

Im constructing a URL from query items using URLComponents and I want to add some query items as OR conditions rather than AND. Im not sure what the proper terminology for this is. Anyway I would like the following, roughly
website.com/things?param1=thing&param2=thing|param3=thing|param4=thing
but appending query items i can only get
website.com/things?param1=thing&param2=thing&param3=thing&param4=thing
My goal is to check 3 different parameters for the term I pass in, and return any results that match from any of the 3. If I was constructing the url from a string, I could just use a pipe instead of ampersand (I think - please correct me if wrong), but Im using URLComponents and am not sure how to do this.
Perhaps Im going about this incorrectly. I dont have a ton of experience with this. If this is the wrong approach, please point me in the right direction. Im not sure how to word this question appropriately and that makes it hard to search for an answer
Im not sure what the proper terminology for this is
There is no terminology for it; it doesn't exist. What you're trying to do is nonstandard. There is no such thing as a query item OR condition. Standard separators are semicolon and ampersand, with ampersand used almost universally. You can't use a pipe to separate query items.
Thus, for example, if you paste website.com/things?param1=thing&param2=thing|param3=thing|param4=thing into the parser at http://www.freeformatter.com/url-parser-query-string-splitter.html, it doesn't know what to make of the pipes; it thinks that param2 must be thing|param3=thing|param4=thing.
Thus, URLComponents is not going to insert the pipe for you. Its goal and purpose is to make a valid URL, and you are attempting to make an invalid one.

I didn't understand what is 's' refering to in s="1", "s=2" and so on in the SpreadsheetDocument in Openxml

Please tell me what is 's' referring to in SpreadsheetML
The s attribute in a c (for CELL) element gives the index of the STYLE applied to the cell.
Excel Open XML uses a lot of "shorthand" (very short element and attribute names) in order to keep file size as small as possible. Thus, things aren't terribly descriptive.
See also:
http://officeopenxml.com/SSstyles.php
https://msdn.microsoft.com/en-us/library/documentformat.openxml.spreadsheet.cell%28v=office.14%29.aspx?f=255&MSPPError=-2147217396

Accessing values in multiple structures with similar names

I'm reasonably new to Matlab, and have been trying to teach myself. I have looked for a similar question, but can't find one that's quite right.
In my workspace I have several structures with similar names. These structures will always start with the same word ('Base'), though the rest of the name will change ('1', '2', '3'), so for example Base1, Base2, Base3... etc. These variables were generated using the data cursor tool in a figure, so contain the fields Target, Position and DataIndex. I am only interested in the value in Base*.Position(1,1). I would like to extract this value from each structure, as many times as there are structures (in one instance there may be 6 structures, another time only 4).
I am considering using the eval function, but it seems to work on exact strings rather than only the first part of a name. Additionally, a lot of documentation seems to advise against using eval.
So far I have:
clearvar except 'Base*'
list_variables=who;
for i=1:length(list_variables)
BaseTS(i) = eval('Base1.Position(1,1)');
end
It's the for loop I'm stuck on, as I don't know how to generalise so it will extract the value .Position(1,1) for each different structure name.
Thanks in advance
Instead of having many structures called Base1, Base2 etc rather put your structure in an array. Then you could rather call Base(1).Position(1,1), Base(2).Position... etc. Your code will be more flexible and manageable this way.
So I suggest when you export using the data cursor, export to a variable called Base_temp and then immediately stick this into the next element of an array:
Base(end+1) = Base_temp
or even:
Position(end+1) = Base_temp.Position(1,1);
Then it's just a case of pressing up and enter after each time you export with the data cursor.
What you have read about avioding eval is correct, it's very rare (if ever) that eval is a good idea. It makes your code hard to read and very hard to debug. But since you're learning, this is how you could fix your loop. (But don't do this way, seriously don't, use arrays rather):
for i=1:length(list_variables)
BaseTS(i) = eval(['Base', num2str(i), '.Position(1,1)']);
end
in other words use string concatenation to build up your string and use the looping variable (i) to get the different numbers. You'll need num2str to convert fromthe number to the string. But don't do it this way. This is a bad way.
Dan's suggestion about avoiding eval is very valid. But if you decide to keep on the structures you have in your workspace, here's something without loops, but again cellfun seems to use loops internally. So, I guess this could be an alternative solution, with the not-so-popular eval -
list1 = who('Base*')
list2 = cellstr(strcat('BaseTS(',num2str([1:numel(list1)]'),')='));%%//'
ev1 = strcat(list2,list1,'.Position(1,1)');%%//'
cellfun(#evalc,ev1,'uni',0)

Get Multiple Locations of the Same String in Array

Is there any way to get the indices of the same string (that appears more than once) in a single array? I know I can find a specific string's location using:
[nameOfArray indexOfObject:#"apple"]
Of course, I could create a for loop essentially using the same code above and ignoring the previous "apples" found. I can't help to feel that there is a simpler (built-in) way to do this in objective-c. Am I right?
Thank you all in advance.
You could use indexesOfObjectsPassingTest with the "test" block being a block that tests for equality.