Break down keyword in URL, then check whether the keywords exist in content page

Break down keyword in URL, then check whether the keywords exist in content page - matlab

1) Can MATLAB break down the key words in URL?
eg:http://en.wikipedia.org/wiki/Hostname,
output:wikipedia wiki Hostname
2) After the output of keywords in URL then check whether the keywords exist in the content of the page like the content below, if yes then return 1, else return 0
Contents:
Hostname From Wikipedia, the free encyclopedia Jump to: navigation, search In computer networking, a hostname (archaically nodename[1]) is a label that is assigned to a device connected to a computer network and that is used to identify the device in various forms of electronic communication such as the World Wide Web, e-mail or Usenet. Hostnames may be simple names consisting of a single word or phrase, or they may be structured. On the Internet, hostnames may have appended the name of a Domain Name System (DNS) domain, separated from the host-specific label by a period ("dot"). In the latter form, a hostname is also called a domain name.
Example of output:
wikipedia [1]
wiki [0]
Hostname [1]

Here is a possible solution:
str = 'http://en.wikipedia.org/wiki/Hostname'
Paragraph = 'Hostname From Wikipedia, the free encyclopedia Jump to: navigation, search In computer networking, a hostname (archaically nodename[1]) blah blah'
SplitStrings = regexp(str,'[/.]','split')
c = containers.Map;
for it = SplitStrings
c( it{1} ) = strfind(Paragraph, it{1} )
end
Issues:
You will need to find out a way of including relevant and irrelevant parts of the URL. Currently, it takes http and en as valid parts of string.
You will need to see if you want the case to be respected or not.
It is algorithmically inefficient since it is making as many passes through the data as keywords. I will think about improving on this.

Related

how do duckduckgo spice IA secondary API calls get their parameters?

I have been looking through the spice instant answer source code. Yes, I know it is in maintenance mode, but I am still curious.
The documentation makes it fairly clear that the primary spice to API gets its numerical parameters $1, $2, etc. from the handle function.
My question: should there be secondary API calls included with spice alt_to as, say, in the movie spice IA, where do the numerical parameters to that API call come from?
Note, for instance, the $1 in both the movie_image and cast_image secondary API calls in spice alt_to at the preceding link. I am asking which regex capture returns those instances of $1.

I believe I see how this works now. The flow of information is still a bit murky to me, but at least I see how all of the requisite information is there.
I'll take the cryptocurrency instant answer as an example. The alt_to element in the perl package file at that link has a key named cryptonator. The corresponding .js file constructs a matching endpoint:
var endpoint = "/js/spice/cryptonator/" + from + "/" + to;
Note the general shape of the "remainder" past /js/spice/cryptonator: from/to, where from and to will be two strings.
Back in the perl package the hash alt_to->{cryptonator} has a key from which receives, I think, this remainder from/to. The value corresponding to that key is a regex meant to split up that string into its two constituents:
from => '([^/]+)/([^/]*)'
Applied to from/to, that regex will return $1=from and $2=to. These, then, are the $1 and $2 that go into
to => 'https://api.cryptonator.com/api/full/$1-$2'
in alt_to.
In short:
The to field of alt_to->{blah} receives its numerical parameters by having the from regex operate on the remainder past /js/spice/blah/ of the name of the corresponding endpoint constructed in the relevant .js file.

Obtaining Name of Simscape Physical Port

I am working on a model generation script to automatically generate Simscape models from a library of components with varying ports. Due to the vast number of ports that need to be connected across the model, I am looking for a good way to set up which ports need to be connected to each other. The best solution I have come up with so far is to name each port with a unique tag that indicates what other ports in the system it should be connected to in the generated model. However, I am not able to obtain the name of any physical port. It is labeled on the mask, but the 'Name' parameter always comes back empty. Here's what I've tried:
h = get_param(gcb,'PortConnectivity')
port = h(1).Type %This only returns the physical port #, not custom name
h = get_param(gcb,'PortHandles')
port = get_param(h.LConn(1),'Name') %This returns an empty cell array
Not sure where to go from here. Any ideas on how to solve this? Thanks!

You can use:
name = get_param(gcb, 'Name');
to get the port name. A general tip for finding the right block property, run:
get(get_param(gcb, 'object'))
this will display all block properties and their values.

Using Powershell to find Lotus Notes internet email address

Using Powershell, I need to retrieve an Internet email address from a Lotus Notes names.nsf address book.
Within Notes, I can see the internet email address I want to retrieve. It is on the Basics tab, under the Mail section, in a field called "Internet Address". However, I have not been able to find the view it is under, or a way to query it or derive it. It would also be helpful to filter the email addresses I would like to find by the Company on the "Work/Home" tab in Lotus Notes.
Opening one of the names.nsf files in IE, I see a number of the fields I want, but a formatted internet email address isn't in there. All I see is the Lotus notes style of email address:
firstname lastname/mycompany/abc # abc
(The column name that is in is named $16).
Is there a way to pull the full internet email addresses from a Lotus Notes names.nsf address book? If so, how? If they are in some of the "hidden" views available, how do you query the values in those hidden views?
Thanks!

In many cases, the best view to use for finding Person documents is the hidden view called "$Users". It is indexed by just about every variation of the name that you can think of, so lookups just tend to work. You can find it by opening the Person document from the view and reading the NotesItem named "InternetAddress", or you can read it directly from the view column that is labled "InternetAddress", which I believe is the 17th column.

You can access all of the properties for a given user from the NAB using powershell. You must run the 32 bit version of PS if your Notes client is 32 bit. You will be prompted for your Notes ID password.
$notes=new-object -comobject Lotus.NotesSession;
$notes.Initialize("");
$ndb = $notes.getdatabase("admin","names.nsf");
$nview = $ndb.getview('($Users)');
$searchkey = "Schmoe"; #whatever
$doc =$nview.getdocumentbykey($searchkey,$true);
$mailaddress = $doc.getitemvalue('mailaddress')[0];
$internetaddress = $doc.getitemvalue('internetaddress')[0];

Feasibility of extracting arbitrary locations from a given string? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I have many spreadsheets with travel information on them amongst other things.
I need to extract start and end locations where the row describes travel, and one or two more things from the row, but what those extra fields are shouldn't be important.
There is no known list of all locations and no fixed pattern of text, all that I can look for is location names.
The field I'm searching in has 0-2 locations, sometimes locations have aliases.
The Problem
If we have this:
00229 | 445 | RTF | Jan | trn_rtn_co | Chicago to Base1
00228 | 445 | RTF | Jan | train | Metroline to home coming from Base1
00228 | 445 | RTF | Jan | train_s | Standard train journey to Friends
I, for instance (though it will vary), will want this:
RTF|Jan|Chicago |Base1
RTF|Jan|Home |Base1
RTF|Jan|NULL |Friends
And then to go though, look up what Base1 and Friends mean for that person (whose unique ID is RTF) and replace them with sensible locations (assuming they only have one set of 'friends'):
RTF|Jan|Chicago |Rockford
RTF|Jan|Home |Rockword
RTF|Jan|NULL |Milwaukee
What I need
I need a way to pick out key words from the final column, such as: Metroline to home coming from Base1.
There are three types of words I'm looking for:
Home LocationsThese are known and limited, I can get these from a list
Home AliasesThese are known and limited, I can get these from a list
Away LocationsThese are unknown but cities/towns/etc in the UK I don't know how to recognize these in the string. This is my main problem
My Ideas
My go to program I thought of was awk, but I don't know if I can reliably search to find where a proper noun (i.e. location) is used for the location names.
Is there a package, library or dictionary of standard locations?
Can I get a program to scour the spreadsheets and 'learn' the names of locations?
This seems like a problem that would have been solved already (i.e. find words in a string of text), but I'm not certain what I'm doing, and I'm only a novice programmer.
Any help on what I can do would be appreciated.
Edit:
Any answer such as "US_Locations_Cities is something you could check against", "Check for strings mentioned in a file in awk using ...", "There is a library for language X that will let a program learn to recognise location names, it's not RegEx, but it might work", or "There is a dictionary of location names here" would be fine.
Ultimately anything that helps me do what I want to do (i.e get the location names!) would be excellent.

Sorry to tell you, but i think this is not 100% programmable.
The best bet would be to define some standard searches:
Chicago to Base1
[WORD] to [WORD]:
where "to" is fixed and you look for exactly one word before and after. the word before then is your source and word after your target
Metroline to home coming from Base1
[WORD] to [WORD] coming from [WORD]:
where "to" and "coming from" is fixed and you look for three words in the appropriate slots.
etc
if you can match a source and target -> ok
if you cannot match something then throw an error for that line and let the user decide or even better implement an appropiate correction and let the program automatically reevaluate that line.
these are non-trivial goals.
consider:
Cities out of us of a
Non english text entries
Abbreviations
for automatic error corrections try to match the found [WORD]'s with a list of us or other cities.
if the city is not found throw an error. if you find that error either include that not found city to your city list or translate a city name in a publicly known (official) name.

The best I can suggest is that, as long as your locations are all US cities, you can use a database of zip codes such as this one.
I don't know how you expect any program to pick up things like Friends or Base1

I have to agree with hacktick that as it stands now, it is not programmable. It seems that the only solution is to invent a language or protocol.
I think an easy implementation follows:
In this language you have two keywords: to and from (you could also possibly allocate at as a keyword synoym for from as well).
These keywords define a portion of string that follows as a "scan area" for
recognizing names
I'm only planning on implementing the simplest scan, but as indicated at the end of the post allows you to do your fallback.
In the implementation you have a "Preferred Name" hash, where you define the names that you want displayed for things that appear there.
{ Base1 => 'Rockford'
, Friends => 'Milwaukee'
, ...
}
You could split your sentences by chunks of text between the keywords, using the following rules:
A. First chunk, if not a keyword is taken as the value of 'from'.
A. On this or any subsequent chunk, if keyword then save the next chunk
after that for that value.
A. Each value is "scanned" for a preferred phrase before being stored
as the value.
my #chunks
= grep {; defined and ( s/^\s+//, s/\s+$//, length ) }
split /\b(from|to)\s+/i, $note
;
my %parts = ( to => '', from => '' );
my $key;
do {
last unless my $chunk = shift #chunks;
if ( $key ) {
$parts{ $key } = $preferred_title{ $chunk } // $chunk;
$key = '';
}
elsif ( exists $parts{ lc $chunk } ) {
$key = lc $chunk;
}
elsif ( !$parts{from} ) {
$parts{from} = $preferred_title{ $chunk } // $chunk;
}
} while ( #chunks );
say join( '|', $note, #parts{ qw<from to> } );
At the very least, collecting these values and printing them out can give you a sieve to decide on further courses of action. This will tell you that 'home coming' is perceived as a 'from' statement, as well as 'Standard train journey'.
You *could fix the 'home coming' by amending the regex thusly:
/\b(?:(?:coming )?(from)|(to))\s+/i
And we could add the following key-value pair to our preferred_title hash:
home => 'Home'
We could simply define 'Standard train journey' => '', or we could create a list of rejection patterns, where we reject a string as a meaningful value if they fit a pattern.
But they allow you to dump out a list of values and refine your scan of data. Another idea is that as it seems that your pretty consistent with your use of capitals (except for 'home') for places. So we could increase our odds of finding the right string by matching the chunk with
/\b(home|\p{Upper}.*)/
Note that this still considers 'Standard train journey' a proper location. So this would still need to be handled by rejection rules.
Here I reiterate that this can be a minimal approach to scanning the data to the point that you can make sense of what it this system takes to be locations and "80/20" it down: that is, hopefully those rules handle 80 percent of the cases, and you can tune the algorithm to handle 80 percent of the remaining 20, and iterate to the point that you simply have to change a handful of entries at worst.
Then, you have a specification that you would need to follow in creating travel notes from then on. You could even scan the notes as they were entered and alert something like
'No destination found in note!'.

RESTful URL design for search

I'm looking for a reasonable way to represent searches as a RESTful URLs.
The setup: I have two models, Cars and Garages, where Cars can be in Garages. So my urls look like:
/car/xxxx
xxx == car id
returns car with given id
/garage/yyy
yyy = garage id
returns garage with given id
A Car can exist on its own (hence the /car), or it can exist in a garage. What's the right way to represent, say, all the cars in a given garage? Something like:
/garage/yyy/cars ?
How about the union of cars in garage yyy and zzz?
What's the right way to represent a search for cars with certain attributes? Say: show me all blue sedans with 4 doors :
/car/search?color=blue&type=sedan&doors=4
or should it be /cars instead?
The use of "search" seems inappropriate there - what's a better way / term? Should it just be:
/cars/?color=blue&type=sedan&doors=4
Should the search parameters be part of the PATHINFO or QUERYSTRING?
In short, I'm looking for guidance for cross-model REST url design, and for search.
[Update] I like Justin's answer, but he doesn't cover the multi-field search case:
/cars/color:blue/type:sedan/doors:4
or something like that. How do we go from
/cars/color/blue
to the multiple field case?

For the searching, use querystrings. This is perfectly RESTful:
/cars?color=blue&type=sedan&doors=4
An advantage to regular querystrings is that they are standard and widely understood and that they can be generated from form-get.

The RESTful pretty URL design is about displaying a resource based on a structure (directory-like structure, date: articles/2005/5/13, object and it's attributes,..), the slash / indicates hierarchical structure, use the -id instead.
Hierarchical structure
I would personaly prefer:
/garage-id/cars/car-id
/cars/car-id #for cars not in garages
If a user removes the /car-id part, it brings the cars preview - intuitive. User exactly knows where in the tree he is, what is he looking at. He knows from the first look, that garages and cars are in relation. /car-id also denotes that it belongs together unlike /car/id.
Searching
The searchquery is OK as it is, there is only your preference, what should be taken into account. The funny part comes when joining searches (see below).
/cars?color=blue;type=sedan #most prefered by me
/cars;color-blue+doors-4+type-sedan #looks good when using car-id
/cars?color=blue&doors=4&type=sedan #also possible, but & blends in with text
Or basically anything what isn't a slash as explained above.
The formula: /cars[?;]color[=-:]blue[,;+&], though I wouldn't use the & sign as it is unrecognizable from the text at first glance if that's your thing.
** Did you know that passing JSON object in URI is RESTful? **
Lists of options
/cars?color=black,blue,red;doors=3,5;type=sedan #most prefered by me
/cars?color:black:blue:red;doors:3:5;type:sedan
/cars?color(black,blue,red);doors(3,5);type(sedan) #does not look bad at all
/cars?color:(black,blue,red);doors:(3,5);type:sedan #little difference
possible features?
Negate search strings (!)
To search any cars, but not black and red:
?color=!black,!red
color:(!black,!red)
Joined searches
Search red or blue or black cars with 3 doors in garages id 1..20 or 101..103 or 999 but not 5
/garage[id=1-20,101-103,999,!5]/cars[color=red,blue,black;doors=3]
You can then construct more complex search queries. (Look at CSS3 attribute matching for the idea of matching substrings. E.g. searching users containing "bar" user*=bar.)
Conclusion
Anyway, this might be the most important part for you, because you can do it however you like after all, just keep in mind that RESTful URI represents a structure which is easily understood e.g. directory-like /directory/file, /collection/node/item, dates /articles/{year}/{month}/{day}.. And when you omit any of last segments, you immediately know what you get.
So.., all these characters are allowed unencoded:
unreserved: a-zA-Z0-9_.-~
Typically allowed both encoded and not, both uses are then equivalent.
special characters: $-_.+!*'(),
reserved: ;/?:#=&
May be used unencoded for the purpose they represent, otherwise they must be encoded.
unsafe: <>"#%{}|^~[]`
Why unsafe and why should rather be encoded: RFC 1738 see 2.2
Also see RFC 1738#page-20 for more character classes.
RFC 3986 see 2.2
Despite of what I previously said, here is a common distinction of delimeters, meaning that some "are" more important than others.
generic delimeters: :/?#[]#
sub-delimeters: !$&'()*+,;=
More reading:
Hierarchy: see 2.3, see 1.2.3
url path parameter syntax
CSS3 attribute matching
IBM: RESTful Web services - The basics
Note: RFC 1738 was updated by RFC 3986

Although having the parameters in the path has some advantages, there are, IMO, some outweighing factors.
Not all characters needed for a search query are permitted in a URL. Most punctuation and Unicode characters would need to be URL encoded as a query string parameter. I'm wrestling with the same problem. I would like to use XPath in the URL, but not all XPath syntax is compatible with a URI path. So for simple paths, /cars/doors/driver/lock/combination would be appropriate to locate the 'combination' element in the driver's door XML document. But /car/doors[id='driver' and lock/combination='1234'] is not so friendly.
There is a difference between filtering a resource based on one of its attributes and specifying a resource.
For example, since
/cars/colors returns a list of all colors for all cars (the resource returned is a collection of color objects)
/cars/colors/red,blue,green would return a list of color objects that are red, blue or green, not a collection of cars.
To return cars, the path would be
/cars?color=red,blue,green or /cars/search?color=red,blue,green
Parameters in the path are more difficult to read because name/value pairs are not isolated from the rest of the path, which is not name/value pairs.
One last comment. I prefer /garages/yyy/cars (always plural) to /garage/yyy/cars (perhaps it was a typo in the original answer) because it avoids changing the path between singular and plural. For words with an added 's', the change is not so bad, but changing /person/yyy/friends to /people/yyy seems cumbersome.

To expand on Peter's answer - you could make Search a first-class resource:
POST /searches # create a new search
GET /searches # list all searches (admin)
GET /searches/{id} # show the results of a previously-run search
DELETE /searches/{id} # delete a search (admin)
The Search resource would have fields for color, make model, garaged status, etc and could be specified in XML, JSON, or any other format. Like the Car and Garage resource, you could restrict access to Searches based on authentication. Users who frequently run the same Searches can store them in their profiles so that they don't need to be re-created. The URLs will be short enough that in many cases they can be easily traded via email. These stored Searches can be the basis of custom RSS feeds, and so on.
There are many possibilities for using Searches when you think of them as resources.
The idea is explained in more detail in this Railscast.

Justin's answer is probably the way to go, although in some applications it might make sense to consider a particular search as a resource in its own right, such as if you want to support named saved searches:
/search/{searchQuery}
or
/search/{savedSearchName}

I use two approaches to implement searches.
1) Simplest case, to query associated elements, and for navigation.
/cars?q.garage.id.eq=1
This means, query cars that have garage ID equal to 1.
It is also possible to create more complex searches:
/cars?q.garage.street.eq=FirstStreet&q.color.ne=red&offset=300&max=100
Cars in all garages in FirstStreet that are not red (3rd page, 100 elements per page).
2) Complex queries are considered as regular resources that are created and can be recovered.
POST /searches => Create
GET /searches/1 => Recover search
GET /searches/1?offset=300&max=100 => pagination in search
The POST body for search creation is as follows:
{
"$class":"test.Car",
"$q":{
"$eq" : { "color" : "red" },
"garage" : {
"$ne" : { "street" : "FirstStreet" }
}
}
}
It is based in Grails (criteria DSL): http://grails.org/doc/2.4.3/ref/Domain%20Classes/createCriteria.html

This is not REST. You cannot define URIs for resources inside your API. Resource navigation must be hypertext-driven. It's fine if you want pretty URIs and heavy amounts of coupling, but just do not call it REST, because it directly violates the constraints of RESTful architecture.
See this article by the inventor of REST.

In addition i would also suggest:
/cars/search/all{?color,model,year}
/cars/search/by-parameters{?color,model,year}
/cars/search/by-vendor{?vendor}
Here, Search is considered as a child resource of Cars resource.

There are a lot of good options for your case here. Still you should considering using the POST body.
The query string is perfect for your example, but if you have something more complicated, e.g. an arbitrary long list of items or boolean conditionals, you might want to define the post as a document, that the client sends over POST.
This allows a more flexible description of the search, as well as avoids the Server URL length limit.

RESTful does not recommend using verbs in URL's /cars/search is not restful. The right way to filter/search/paginate your API's is through Query Parameters. However there might be cases when you have to break the norm. For example, if you are searching across multiple resources, then you have to use something like /search?q=query
You can go through http://saipraveenblog.wordpress.com/2014/09/29/rest-api-best-practices/ to understand the best practices for designing RESTful API's

Though I like Justin's response, I feel it more accurately represents a filter rather than a search. What if I want to know about cars with names that start with cam?
The way I see it, you could build it into the way you handle specific resources:
/cars/cam*
Or, you could simply add it into the filter:
/cars/doors/4/name/cam*/colors/red,blue,green
Personally, I prefer the latter, however I am by no means an expert on REST (having first heard of it only 2 or so weeks ago...)

My advice would be this:
/garages
Returns list of garages (think JSON array here)
/garages/yyy
Returns specific garage
/garage/yyy/cars
Returns list of cars in garage
/garages/cars
Returns list of all cars in all garages (may not be practical of course)
/cars
Returns list of all cars
/cars/xxx
Returns specific car
/cars/colors
Returns lists of all posible colors for cars
/cars/colors/red,blue,green
Returns list of cars of the specific colors (yes commas are allowed :) )
Edit:
/cars/colors/red,blue,green/doors/2
Returns list of all red,blue, and green cars with 2 doors.
/cars/type/hatchback,coupe/colors/red,blue,green/
Same idea as the above but a lil more intuitive.
/cars/colors/red,blue,green/doors/two-door,four-door
All cars that are red, blue, green and have either two or four doors.
Hopefully that gives you the idea. Essentially your Rest API should be easily discoverable and should enable you to browse through your data. Another advantage with using URLs and not query strings is that you are able to take advantage of the native caching mechanisms that exist on the web server for HTTP traffic.
Here's a link to a page describing the evils of query strings in REST: http://web.archive.org/web/20070815111413/http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful
I used Google's cache because the normal page wasn't working for me here's that link as well:
http://rest.blueoxen.net/cgi-bin/wiki.pl?QueryStringsConsideredHarmful

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Break down keyword in URL, then check whether the keywords exist in content page - matlab

Related

how do duckduckgo spice IA secondary API calls get their parameters?

Obtaining Name of Simscape Physical Port

Using Powershell to find Lotus Notes internet email address

Feasibility of extracting arbitrary locations from a given string? [closed]

RESTful URL design for search

Categories

Resources