How to disallow URLs by query string in array format - robots.txt

I want to disallow all URLs with a certain query parameter in array format.
For example I have this kind of URL:
https://example.com/site/?param[index]=1&param[index2]=5
and I do not want all crawlers to crawl site with param parameter in all array variations.
Second question: Is it possible to disallow only if certain array parameter occurs? For instance, param[index3]? (I do not need it, but it could be useful for other people)

The robots.txt syntax doesn't support this.
The closest you can get would be to add <meta name="ROBOTS" content="NOINDEX"> to any page with that parameter in the query string.

Related

Polylang: How does the linking of right page by changing the attribute "href" work?

I have a question about linking the right page with Polylang.
I have a hard-coded anchor, which is basically a “back home” link.
It looks like this:
<a href=“magazin” class=“article-type-inner”><?php pll_e(‘Close’); ?></a>
I have already implemented a string and it works fine in the posts in both languages, but how can I change the “Href” to the right language?
For example my default language is English and the other language is French. If I am on a french post I will return to the English page… Is there any solution?
Thank you.
Ideally you should not hardcode any strings or URL's, here is some options for you:
Use pll_home_url() if your intention is to redirect to the home page. pll_home_url() accepts optional parameter $slug (2-letters code of the language) to switch between languages if needed.
Use get_permalink(), the_permalink() or get_the_permalink(). You can pass page_id as first argument. Make sure post/pages/cpt's are linked properly. E.g. the_permalink(100).
Worst case scenario - use if/else in combination with pll_current_language(). Not recommended.

No Rich Snippet for AggregateOffer because my #id results in a 404?

Pricerunner have their prices in SERP, so I wanted to do this as well. But for some reason, I don't get any prices in my results.
When I test with Google's structured data tool, I get:
But on my page I get:
Apparently, the only difference (besides review), is the #ids.
If I follow Pricerunners ID then it's an actice link, but if I follow mine IDs, they result in a 404. Problem is, that I haven't set up any IDs?
If you take the first ID on my page, it's: /lelo-lyla-2/product-2348
ID is set to product-2348, wich is standard WooCommerce, but it's being added to the URL, so the URL is /lelo-lyla-2/product-2348 witch results in a 404.
Same with the last ID in aggregated offer: /lelo-lyla-2/price-list
Where does /price-list come from? The div? Should I remove the id="price-list" from the div, in order to make it work, or?
<div id="price-list" itemtype="http://schema.org/AggregateOffer" itemscope itemprop="offers">
<meta content="675" itemprop="lowPrice">
<meta content="1039" itemprop="highPrice">
<meta content="7" itemprop="offerCount">
<meta content="DKK" itemprop="priceCurrency">
</div>
When I run the page through Google's test tool, it gets 0 errors. But I suspect it's because of the "invalid" IDs, or?
I think it’s a bug in Google’s SDTT that it takes the value of the id attribute and uses it as identifier for the item. That’s the job of the itemid attribute in Microdata. I would suggest to ignore the extracted #ids that result from this; only care about those coming from itemid.
For Google’s Product rich result (with aggregate offer) it doesn’t state that an identifier would be required to begin with. So the problem that you don’t get the rich result has most likely nothing to do with this.
As per Google's Product-specific usage guidelines, "Adult-related products are not supported."

What's the best/most RESTful way to simulate a procedure call over HTTP?

I have a service that takes an .odt template file and some text values, and produces an .odt as it's output. I need to make this service available via HTTP, and I don't quite know what is the most RESTful way to make the interface work.
I need to be able to supply the template file, and the input values, to the server - and get the resulting .odt file sent back to me. The options I see for how this would work are:
PUT or POST the template to the server, then do a GET request, passing along the URI of the template I just posted, plus the input values - the GET response body would have the .odt
Send the template and the parameters in a single GET request - the template file would go in the GET request body.
Like (2) above except do the whole thing as a single POST request instead of GET.
The problem with (1) is that I do not want to store the template file on the server. This adds complexity and storing the file is not useful to me beyond the fact that it's a very RESTful approach. Also, a single request would be better than 2, all other things being equal.
The problem with (2) is that putting a body in a GET request is bordering on abuse of HTTP - it is supported by the software I'm using now, but may not always be.
Number (3) seems misleading since this is more naturally a 'read' or 'get' operation than a 'post'.
What I am doing is inherently like a function call - I need to pass a significant amount of data in, and I am really just using HTTP as a convenient way of exposing my code across the network. Perhaps what I'm trying to do is inherently un-RESTful, and there is no REST-friendly solution? Can anyone advise? Thank you!
Wow, so this answer escalated quickly...
Over the last year or so I've attempted to gain a much better understanding of REST through books, mailing lists, etc. For some reason I decided to pick your question as a test of what I've learned.
Sorry :P
Let's make this entire example one step simpler. Rather than worry about the user uploading a file, we'll instead assume that the user just passes a string. So, really, they are going to pass a string, in addition to the arguments of characters to replace (a list of key/values). We'll deal with the file upload part later.
Here's a RESTful way of doing it which doesn't require anything to be stored on the server. I will use some HTML (albeit broken, I'll leave out stuff like HEAD) as my media type, just because it's fairly well known.
A Sample Solution
First, the user will need to access our REST service.
GET /
<body>
<a rel="http://example.com/rels/arguments" href="/arguments">
Start Building Arguments
</a>
</body>
This basically gives the user a way to start actually interacting with our service. Right now they have only one option: use the link to build a new set of arguments (the name/value pairings that will eventually be used to in the string replacement scheme). So the user goes to that link.
GET /arguments
<body>
<a rel="self" href="/arguments"/>
<form rel="http://example.com/rels/arguments" method="get" action="/arguments?{key}={value}">
<input id="key" name="key" type="text"/>
<input id="value" name="value" type="text"/>
</form>
<form rel="http://example.com/rels/processed_string" action="/processed_string/{input_string}">
<input id="input_string" name="input_string" />
</form>
</body>
This brings us to an instance of an "arguments" resource. Notice that this isn't a JSON or XML document that returns to you just the plain data of the key/value pairings; it is hypermedia. It contains controls that direct the user to what they can do next (sometimes referred to allowing the user to "follow their nose"). This specific URL ("/arguments") represents an empty list of key/value pairings. I could very well have named the url "/empty_arguments" if I wanted to: this is an example why it's silly to think about REST in terms of URLs: it really shouldn't matter what the URL is.
In this new HTML, the user is given three different resources that they can navigate to:
They can use the link to "self" to navigate to same resource they are currently on.
They can use the first form to navigate to a new resource which represents an argument list with the additional name/value pairing that they specify in the form.
They can use the second form to provide the string that they wish to finally do their replacement on.
Note: You probably noticed that the second form has a strange "action" url:
/arguments?{key}={value}
Here, I cheated: I'm using URI Templates. This allows me to specify how the arguments are going to be placed onto the URL, rather than using the default HTML scheme of just using <input-name>=<input-value>. Obviously, for this to work, the user can't use a browser (as browsers don't implement this): they would need to use software that understands HTML and URI templating. Of course, I'm using HTML as an example, your REST service could use some kind of XML that supports URI Templating as defined by the URI Template spec.
Anyway, let's say the user wants to add their arguments. The user uses the first form (e.g., filling in the "key" input with "Author" and the "value" input with "John Doe"). This results in...
GET /arguments?Author=John%20Doe
<body>
<a rel="self" href="/arguments?Author=John%20Doe"/>
<form rel="http://example.com/rels/arguments" method="get" action="/arguments?Author=John%20Doe&{key}={value}">
<input id="key" name="key" type="text"/>
<input id="value" name="value" type="text"/>
</form>
<form rel="http://example.com/rels/processed_string" action="/processed_string/{input_string}?Author=John%20Doe">
<input id="input_string" name="input_string" />
</form>
</body>
This is now a brand new resource. You can describe it as an argument list (key/value pairs) with a single key/value pair: "Author"/"John Doe". The HTML is pretty much the same as before, with a few changes:
The "self" link now points to current resources URL (changed from "/arguments" to "/arguments?Author=John%20Doe"
The "action" attribute of the first form now has the longer URL, but once again we use URI Templates to allow us to build a larger URI.
The second form
The user now wants to add a "Date" argument, so they once again submit the first form, this time with key of "Date" and a value of "2003-01-02".
GET /arguments?Author=John%20Doe&Date=2003-01-02
<body>
<a rel="self" href="/arguments?Author=John%20Doe&Date=2003-01-02"/>
<form rel="http://example.com/rels/arguments" method="get" action="/arguments?Author=John%20Doe&Date=2003-01-02&{key}={value}">
<input id="key" name="key" type="text"/>
<input id="value" name="value" type="text"/>
</form>
<form rel="http://example.com/rels/processed_string" action="/processed_string/{input_string}?Author=John%20Doe">
<input id="input_string" name="input_string" />
</form>
</body>
Finally, the user is ready to process their string, so they use the second form and fill in the "input_string" variable. This once again uses URI Templates, thus having bringing the user to the next resource. Let's say that that the string is the following:
{Author} wrote some books in {Date}
The results would be:
GET /processed_string/%7BAuthor%7D+wrote+some+books+in+%7BDate%7D?Author=John%20Doe&Date=2003-01-02
<body>
<a rel="self" href="/processed_string/%7BAuthor%7D+wrote+some+books+in+%7BDate%7D?Author=John%20Doe&Date=2003-01-02">
<span class="results">John Doe wrote some books in 2003-01-02</span>
</body>
PHEW! That's a lot of work! But it's (AFAIC) RESTful, and it fulfills the requirement of not needing to actually store ANYTHING on the server side (including the argument list, or the string that you eventually want to process).
Important Things to Note
One thing that is important here is that I wasn't just talking about URLs. In fact, the majority of time, I'm talking about the HTML. The HTML is the hypermedia, that that's is such a huge part of REST that is forgotten about. All those APIs that say they are "restful" where they say "do a GET on this URL with these parameters and POST on this URL with a document that looks like this" are not practicing REST. Roy Fielding (who literally wrote the book on REST) made this observation himself.
Another thing to note is that it was quite a bit of pain to just set up the arguments. After the initial GET / to get to the root (you can think of it as the "menu") of the service, you would need to do five more GET calls just to build up your argument resource to make an argument resource of four key/value pairings. This could be alleviated by not using HTML. For example, I already did use URI Templates in my example, there's no reason to say that HTML just isn't good enough for REST. Using a hypermedia format (like some derivation of XML) that supports something similar to forms, but with the ability to specify "mappings" of values, you could do this in one go. For example, we could extend the HTML media type to allow another input type called "mappings"...
So long as the client using our API understands what a "mappings" input type is, they will be able to build their arguments resource with a single GET.
At that point, you might not even need an "arguments" resource. You could just skip right to the "processed_string" resource that contains the mapping and the actual string...
What about file upload?
Okay, so originally you mentioned file uploads, and how to get this without needing to store the file. Well, basically, we can use our existing example, but replace the last step with a file.
Here, we are basically doing the same thing as before, except we are uploading a file. What is important to note is that now we are hinting to the user (through the "method" attribute on the form) that they should do a POST rather than a GET. Note that even though everywhere you hear that POST is a non-safe (it could cause changes on the server), non-idempotent operation, there is nothing saying that it MUST be change state on the server.
Finally, the server can return the new file (even better would be to return some hypermedia or LOCATION header with a link to the new file, but that would require storage).
Final Comments
This is just one take on this specific example. While I hope you have gained some sort of insight, I would caution you to accept this as gospel. I'm sure there have been things that I have said that are not really "REST". I plan on posting this question and answer to the REST-Discuss Mailing List and see what others have to say about it.
One main thing I hope to express through this is that your easiest solution might simply be to use RPC. After all, what was your original attempt at making it RESTful attempting to accomplish? If you are trying to be able to tell people that you accomplish "REST", keep in mind that plenty of APIs have claimed themself "RESTful" that have really just been RPC disguised by URLs with nouns rather than verbs.
If it was because you have heard some of the benefits of REST, and how to gain those benefits implicitly by making your API RESTful, the unfortunate truth is that there's more to REST than URLs and whether you GET or POST to them. Hypermedia plays a huge part.
Finally, sometimes you will encounter issues that mean you might do things that SEEM non-RESTful. Perhaps you need to do a POST rather than a GET because the URI (which have a theoretical infinite amount of storage, but plenty of technical limitations) would get too long. Well then, you need to do POST. Maybe
More resources:
REST-Discuss
My e-mail on this answer to REST-Discuss
RESTful Web Services Cookbook
Hypermedia APIs with HTML5 and Node (Not specifically about REST, but a VERY good introduction to Hypermedia)
What you are doing is not REST-ful - or, at least, is difficult to express in REST, because you are thinking about the operation first, not the objects first.
The most REST-ful expression would be to create a new "OdtTemplate" resource (or get the URI of an existing one), create a new "SetOfValues" resource, then create a "FillInTemplateWithValues" job resource that was tied to both those inputs, and which could be read to determine the status of the job, and to obtain a pointer to the final "FilledInDocument" object that contained your result.
REST is all about creating, reading, updating, and destroying objects. If you can't model your process as a CRUD database, it isn't really REST. That means you do need to, eg, store the template on the server.
You might be better off, though, just implementing an RPC over HTTP model, and submitting the template and values, then getting the response synchronously - or one of the other non-REST patterns you named... since that is just what you want.
If there is no value in storing the templates then option 2 is the most RESTful, but as you are aware there is the possibility of having your GET body dropped.
However, if I was a user of this system, I would find it very wasteful to have to upload the template each time I would like to populate it with values. Instead it would seem more appropriate to have the template stored and allow different requests with different values to populate the resulting documents.

innerHTML of an element without id or name attributes

Why is the following code NOT working without id or name attribute specified for the anchor element?
<html>
<body>
First link
<p>innerHTML of the first anchor:
<script>document.write(document.anchors[0].innerHTML);</script>
</p>
</body>
</html>
But if I add an id (or name) attribute, like that:
<a id="first" href="#">First link</a>
It starts to work.
Why is id or name attribute so important? I don't refer to it in my javascript code. I don't use "getElementById" or anything, but it still wants an id to be specified.
P.S. I tested only in IE7 (not the best browser, but I don't have access to anything better at the moment, and it can't stop me from learning :)
UPDATE:
Thanks to Raynos who gave me an idea of HTMLCollection in his answer, I've gotten a deeper understanding of what's going on here, by searching the web.
When we use document.anchors collection, we're actually referring to a collection of a elements with the name attribute that makes an a element behave as an anchor, and not (only) as a link.
We don't have to specify the name attribute if we want to refer to a elements as links. In this case we just need to use a different instance of HTMLCollection object which is document.links.
So the original code will work without name attribute if we modify it to:
document.write(document.links[0].innerHTML);
What a nice feeling of enlightenment! :)
WHATWG says:
The anchors attribute must return an HTMLCollection rooted at the Document node, whose filter matches only a elements with name attributes.
the document.anchors collection needs <a> elements with a name attribute.
IE is known to have bugs where it treats id's and name's as the "same" thing. So that would probably explain why it works for <a> elements with an id attribute.
As an aside, document.write and .innerHTML are evil.
Why don't you use this:
document.getElementsByTagName('a')[0].innerHTML

How do I access a Post Slug in a Tumblr theme?

I want to write a canonical tag into my Tumblr theme, and i need the slug for the (full) url. How can i access the posts-slug within the template? I just have access to the PostId. My current code looks like this:
<link rel="canonical" href="http://domain.com/blog/{block:PostTitle}post/{PostID}{/block:PostTitle}" />
What i want to have is something like this:
<link rel="canonical" href="http://domain.com/blog/{block:PostTitle}post/{PostID}/{PostSlug}{/block:PostTitle}" />
I tried the following tags (which obviously did not work...):
{slug}
{PostSlug}
{Postslug}
What amuses me is, that the API gives out a slug-key on every post, try:
http://(YOU).tumblr.com/api/read?debug=1
Thanks for any hints and suggestions.
Edit: I already scanned http://www.tumblr.com/docs/en/custom_themes for hints - but found nothing useful.
The post slug is not available as a token in Tumblr’s theme DSL. I’m not sure if this is an intentional omission, as post slugs are optional on Tumblr (you can manually set one, but if you don’t your post just goes by its numeric ID). However, you can parse it out of the link inserted by the {Permalink} token, i.e. include it in some hidden element in your template along the lines of
<span class="permalink-url">{Permalink}</span>
(hide the span if you will), then retrieve and parse it with JavaScript:
var plTags = document.querySelectorAll('.permalink-url');
for (i = 0; i <= plTags.length; i++) {
postSlug=plTags[i].replace(/.+\//, '');
// do whatever you want with the slug
}