Regex for URL filtering - regex-group

I have a URL which needs to be filtered from a certain logs if it has user-specific information.
URL looks something like this:
/v1/info/infor1/users/ABC
(/v1/info/:info/users/:userID #info and userID are parameters)
If I write a regex like /v1/info/.*/users/.*, will it filter all URLs starting with /v1/info? If yes, how do I make sure the filtering happens only based on the URL containing the userUID?
Tried /v1/info/.*/users/.* and it matches URLs like /v1/info/infor1/users/ABC and not /v1/info/details and I am not sure how it works. Does .* not compare everything from that point?

Related

VWO Split URL test no redirect link

I'm using VWO to do some split url testing. Is there some query param you can pass in the url so it doesn't get put into the a/b test and doesn't get redirected.
For example lets say I'm running a split URL test on
mywebsite.com/a vs. mywebsite.com/b
I want to give someone a link to mywebsite.com/a without being included in the split URL test, ensuring that they actually get to mywebsite.com/a and not mywebsite.com/b
Is there some query param or some other way I can ensure this?
(example: mywebsite.com/a?vwo_testing=false)
You can exclude someone from the test by using a Query Parameter segmentation to exclude the query parameter. You can refer to the article below to create a Custom Visitor Segment for this.
https://vwo.com/knowledge/how-to-define-custom-visitor-segments/
Segmentation would look like -
Query Parameter: vwo_testing IS NOT EQUAL TO false
You can reach out to support#vwo.com in case you have any further query.
In my use case I was able to go to https://vwo.com/opt-out/ and opt out of being redirected. Another option is to add ?vwo_opt_out=1 param to the url.

nginx: rewrite a LOT (2000+) of urls with parameters

I have to migrate a lot of URLs with params, which look like that:
/somepath/somearticle.html?p1=v1&p2=v2 --> /some-other-path-a
and also the same URL without params:
/somepath/somearticle.html --> /some-other-path-b
The tricky part is that the two destination URLs are totally different pages in the new system, whereas in the old system the params just indicated which tab to open by default.
I tried different rewrite rules, but came to the conclusion that parameters are not considered by nginx rewrites. I found a way using location directives, but having 2000+ location directives just feels wrong.
Does anybody know an elegant way how to get this done? It may be worth noting that beside those 2000+ redirects, I have another 200.000(!) redirects. They already work, because they're rather simple. So what I want to emphasize is that performance should be key!
You cannot match the query string (anything from the ? onwards) in location and rewrite expressions, as it is not part of the normalized URI. See this document for details.
The entire URI is available in the $request_uri parameter. Using $request_uri may be problematic if the parameters are not sent in a consistent order.
To process many URIs, use a map directive, for example:
map $request_uri $redirect {
default 0;
/somepath/somearticle.html?p1=v1&p2=v2 /some-other-path-a;
/somepath/somearticle.html /some-other-path-b;
}
server {
...
if ($redirect) {
return 301 $redirect;
}
...
}
You can also use regular expressions in the map, for example, if the URIs also contain optional unmatched parameters. See this document for more.

How to crawl data from encrypted url?

I'm trying to use scrapy to collect the university's professors' contact information from its directory. Since I can't post more than 2 links, I put all links in the following picture.
I set last name equals from the drop-down menu as shown in the picture. Then I search all professors by last name.
Usually, the url will have some pattern from other universities' website. However, for this one, the original url is (1). It becomes (2)when I search 'An' as last name. It seems like 'An' is replaced by something like 529385FD5FF90A198625819E002B8B41? I'm not sure. Is there any way I can get the url that I need to send as a request? I mean, this time I search 'An'. If I search another last name like Lee. It will be another request. They are irregular. I can't find a pattern.
The scraper is not as complex as you think it is. It just makes a POST call from the form and that returns a GET request. Below would work
import scrapy
from scrapy.utils.response import open_in_browser
class univSpider(scrapy.Spider):
name = "univ"
start_urls = ["http://appl103.lsu.edu/dir003.nsf/(NoteID)/5903C096337C2AA28625819E0038E3E4?OpenDocument"]
def parse(self, response):
yield FormRequest.from_response(response, formname="_DIRVNAM", formdata={"LastName": "Lalwani"},callback = self.search_result)
def search_result(self, response):
open_in_browser(response)
print(response.body)

REST Resource route naming for get and ResourceByID and ResourceByName

I am trying to write 2 Rest GET methods.
Get user by Id
Get user by userName.
I need to know if there is any resource naming convention for this. Both my id and username are strings.
I came up with:
/api/{v}/users/{userid}
/api/{v}/users/username/{username}
However, 2) doesn't seem correct and if I change 2) to /api/{v}/users/{username}, I am mapping to 1) as both id and username are strings. Or is it considered acceptable to use /api/{v}/userbyName/{username}?
How should I name my resource route in case 2) ?
First of all: https://vimeo.com/17785736 (15 minutes which will solve all your questions)
And what is unique? Is the username unique or only the id or both are unique?
There is a simple rule for all that:
/collection/item
However, 2) doesn't seem correct and if I change 2) to /api/{v}/users/{username}, I am mapping to 1) as both id and username are strings.
If your item can be identified with an id and also with an unique username - it doesn't matter if it's the username or the id - simply look for both (of course your backend needs to handle that) and retrieve it.
According to your needs this would be correct:
/api/{v}/users/{userid}
/api/{v}/users/{username}
but I would choose only to use: /api/{v}/users/{userid} and filter by username only with a query parameter (description for that down there below)
Also will I break any rules if I come up with
/api/{v}/userbyName/{username}
Yes - /api/{v}/userbyName/{username} will break the rule about /collection/item because userByName is clearly not a collection it would be a function - but with a real restful thinking api there is no function in the path.
Another way to get the user by name would be using a filter/query paramter - so the ID will be available for the PathParameter and the username only as filter. which than would look like this:
/api/{v}/users/?username={username}
This also don't break any rules - because the query parameter simply filters the whole collection and retrieves only the one where username = username.
How should I name my resource route in case 2) ?
Your 2) will break a rule - so I can't/won't suggest you a way to do it like this.
Have a look at this: https://vimeo.com/17785736 this simple presentation will help you a lot about understanding rest.
Why would you go this way?
Ever had a look at a javascript framework like - let's say ember. (Ember Rest-Adapter). If you follow the idea described up there and maybe also have a look at the json format used by ember and their rest adapter - you can make your frontend developer speed up their process and save a lot of money + time.
By REST you send back links, which can contain URI templates. For example: /api/{v}/users/{userid} in your case, where v and userid are template variables. Since the URI structure does not matter from a client perspective you can use whatever structure you want. Ofc. it is more convenient to use nice and short URIs, because it is easier to write the routing with them.
According to the URI standard the path contains the hierarchical while the query contains the non-hierarchical part of the URI, but this is just a loose constraint, in practice ppl use both one.
/api/{v}/users/name/{username}
/api/{v}/users/name:{username}
/api/{v}/users?name="{username}"
Ofc. you can use a custom convention, for example I use the following:
I don't use plural resource name by collections
I end collection path with slash
I use slash by reducing a collection to sub-collections or individual items
I don't use slash to give the value of a variable in the path, I use colon instead
I use as few variables and as short URI as I can
I use query by reducing a collection to sub-collections especially by defining complex filters with logical operators
So in you case my solution would be
/api/{v}/user/
/api/{v}/user/name:{username}
/api/{v}/user/{userid}
and
/api/{v}/user/?firstName="John"
/api/{v}/user/?firstName="John|Susan"&birthYear="1980-2005"
or
/api/{v}/user/firstName:John/
/api/{v}/user/firstName:John|Susan/birthYear:1980-2005/
etc...
But that's just my own set of constraints.
Each resource should have a unique URI.
GET /users/7
{
"id": 7,
"username": "jsmith",
"country": "USA"
}
Finding the user(s) that satisfy a certain predicate should be done with query parameters.
GET /users?username=jsmith
[
"/users/7"
]

What is the best RESTful uri for a resource's tags

Say you have a resource of recipes and a recipe can have tags. Then you want to get a list of all tags used across all recipes, what would the URI be?
/recipes/tags
Seems like it might work but it breaks the convention of not pointing to a specific id such as:
/recipes/1/tags
You could also just use:
/tags
But I only want tags for recipes not any other resource. So would I use query params, such as:
/tags?type-recipes
And FTR, tags are only used for recipes not any other resource, so it seems misleading to use the query param since it would never be anything but recipes
Filters are tricky -- especially when your filter is also a resource.
You are correct that /recipes/tags is problematic. You go from a path variable of /resouce/{resouceId} to /resource/{filter} which can open a whole word of pain.
So for your first example go for something like
/recipes?tags={tags to filter on}
Returns a list of Recipes based on the tag
The problem is you don't have any lookup method for what acceptable tags are. I would expect /tags to return a list of tags and then consume it like this
/tags
Returns a list of all tags
/tags/{tagID}
Returns meta data about the specific tag (what the tag is used for, which is superfluous in this case if you only have recipe tags, but it is more flexible)
/tags/{tagID}/recipes
Returns a list of recipes associated with that ID.
Then you have to decide if you want to support a linked resource like this
/tags/eggs/recipes/1
Recipe details
Which would still be the same thing as
/recipes/1
It's acceptable to link them -- just might get confusing if recipe/2 doesn't have any eggs in it and you try to access it using /tags/eggs/recipes/2