remove google cache of thousand's of pages - google-search-console

Webmaster's tool have cache removal option but it allows to enter only one URL. I want to remove cache of around 50k url's. It's tedious job to do it 50k times.
Address of URL's is like following:
Say user profiles are being cached.
URL's are like "myinfo.com/profiles/1", "myinfo.com/profiles/2", "myinfo.com/profiles/3", "myinfo.com/profiles/4" and so on.
If I enter relative path http://myinfo.com/profiles/, will google remove cache of all profiles?
Or Is there any way to submit url's in bulk?

Since you've already figured out how to request removal of a single URL, that means it is out of the question here.
So, the next step in that process is it lets you request removal of an entire directory, if the file URLs are predictable in that particular manner. (Since you mentioned you have thousands of URLs, I'd hope they're at least somewhat organized.) So, try it out this way once.

Related

How to delete items from a subset with REST API

I'm wondering what are the best ways to delete items from a subset in a restful way. I got users and series, each user has his own list of series (watching, completed, etc). For example, if we want to get a list from a user we can do it with: GET /users/:id_user/series
If we want to delete a serie from the list of that user (but we don't want to delete the serie itself), how should it be?
I thought about the possibility of using DELETE /users/:id_user/series/:id_serie, but I'm not sure if it's the correct way for this case (maybe PATCH?).
I got another case, we got series and reviews. We can get the reviews like this: GET /series/:serie_id/reviews. In the other case we didn't want to delete the serie itself when deleting from a user list of series, but in this case we want to delete the review because its existence depends on the serie. So I guess in this case DELETE /series/:serie_id/reviews/:review_id is correct.
Is this difference important in order to choose the rest operation to delete the object/item from the subset?
How would you do it on the web?
You'd follow a link to a form, with input controls. You might have a something like a dropdown if you wanted to delete one series at a time, or lots of check boxes if you wanted to support a bulk delete. The user would provide input, hit the submit button, and the browser would create an application/x-www-form-urlencoded document and send it to the server.
What method would be used? Normally POST, because we are intending an edit to some resource on the server.
What resource would we be editing? Well, in trutch, it could be anything -- the server would include that information in the form metadata, so the client can just do what it is told.
So where should the server tell it to submit the form? Again, it could be anywhere... but a useful approach is to think about what resource in the client's cache is being updated. Because if we send the request to that resource, we get intelligent cache invalidation "for free".
So on the web, we would expect to see:
POST /users/:id_user/series
Does it have to be POST? On the HTML web, maybe it does, because the ubiquitous client of the web is a browser, not an editor.
It is okay to use POST.
But a perfectly valid alternative would be to edit the local copy of /users/:id_user/series, and then send back to the server a complete copy of the new version (PUT) or a patch-document describing the edits (PATCH). Notice that with both of these choices, the target uri is still /user/:id_user/series, so we still get the cache invalidation magic.
Creating a new resource in your model just to have something to DELETE is probably the wrong idea.
There are cases where an edit, or a delete, will necessarily impact more than one resource.
There are some specific circumstances when you can get the right magic cache invalidation with two resources (for instance, delete one document, and send back an updated copy of another).
But we don't, today, have a general purpose cache invalidation mechanism. (Closest thing I've been able to find is this, which seems to have stalled out in 2012.

REST - GET-Respone with temporary uploaded File

I know that the title is not that correct, but i don't know how to name this problem...
Currently I'm trying to design my first REST-API for a conversion-service. Therefore the user has an input file which is given to the server and gets back the converted file.
The current problem I've got is, that the converted file should be accessed with a simple GET /conversionservice/my/url. However it is not possible to upload the input file within GET-Request. A POST would be necessary (am I right?), but POST isn't cacheable.
Now my question is, what's the right way to design this? I know that it could be possible to upload the input file before to the server and then access it with my GET-Request, but those input files could be everything!
Thanks for your help :)
A POST request is actually needed for a file upload. The fact that it is not cachable should not bother the service because how could any intermediaries (the browser, the server, proxy etc) know about the content of the file. If you need cachability, you would have to implement it yourself probably with a hash (md5, sha1 etc) of the uploaded file. This would keep you from having to perform the actual conversion twice, but you would have to hash each file that was uploaded which would slow you down for a "cache miss".
The only other way I could think of to solve the problem would be to require the user to pass in an accessible url to the file in the query string, then you could handle GET requests, but your users would have to make the file accessible over the internet. This would allow caching but limit the usability.
Perhaps a hybrid approach would be possible where you accepted a POST for a file upload and a GET for a url, this would increase the complexity of the service but maximize usability.
Also, you should look into what caches you are interested in leveraging as a lot of them have limits on the size of a cache entry meaning if the file is sufficiently large it would not cache anyway.
In the end, I would advise you to stick to the standards already established. Accept the POST request for the file upload and if you are interested in speeding up the user experience maybe make the upload persist, this would allow the user to upload a file once and download it in many different formats.
You sequence of events can be as follows:
Upload your file/files using POST. For immediate response, you can return required information using your own headers. (It should return document key to access the file for future use.)
Then you can use GET for further operations using the above mentioned document key as a query string.

Best DB solution for storing large files

I must provide a solution where user can upload files and they must be stored together with some metadata, and this may grow really big.
Access to these files must be controlled, so they want me to just store them in DB BLOBs, but I fear PostgreSQL won't handle it properly over time.
My first idea was use some NoSQL DB solution, but I couldn't find any that would replace a good RDBMS and elegantly store files together. Then I thought on just saving these files in HD somewhere WebServer won't serve them, name them their table ID, and just load them on RAM and print them with proper content-type.
Could anyone suggest me any better solution for this?
I had the requirement to store many images (with some meta data) and allow controlled access to them, here is what I did.
To the cloud™
I save the image files in Amazon S3. My local database holds the metadata with the S3 location of the file as one column. When an authenticated and authorized user needs to see the file they hit a URL in my system (where the authentication and authorization checks occur) which then generates a pre-signed, expiring URL for the image and sends a redirect back to the browser. The browser is then able to load the image for a given amount of time (as specified in the signature within the URL.)
With this solution I have user level access to the resources and I don't have to store them as BLOBs or anything like that which may grow unwieldy over time. I also don't use MY bandwidth to stream the files to the client and get cheap, redundant storage for them. Obviously the suitability of this solution will depend on the nature of the binary files you are looking to store and your level of trust in Amazon. The world doesn't end if there is a slip and someone sees an image from my system they shouldn't. YMMV.

Facebook image URLs - how are they kept from un-authorised users?

I'm interested in social networks and have stumbled upon something which makes me curious.
How does facebook keep people from playing with URLs and gaining access to photos they should not?
Let me expand, here's an altered example of a facebook image URL that came up on my feed-
https://fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-prn1/s480x480/{five_digit_number}_{twelve_digit_number}_{ten_digit_number}_n.jpg
So, those with more web application experience will presumably know the answer to this, I suspect it's well understood, but what is to stop me from changing the numbers and seeing other people's photos that possibly I'm not supposed to?
[I understand that this doesn't work, I'm just trying to understand how they maintain security and avoid this problem]
Many thanks in advance,
Nick
There's a couple ways you can achieve it.
The first is link to a script or action that authenticates the request, and then returns an image. You can find an example with ASP.NET MVC here. The downside is it's pretty inefficient, and you run the risk of twice the bandwidth for each request (once so your server can grab the image from wherever it's stored, and once to serve it to your users).
The second option, you can do like Facebook and just generate obscure url's for each photo. As Thomas said in his comment, you're not going to guess a 27 digit number.
The third option I think is the best, especially if you're using something like Microsoft Azure or Amazon S3. Azure Blob Storage supports Shared Access Signatures, which let's you generate temporary url's for private files. These can be set to expire in a few minutes, or last a lifetime. The files are served directly to the user, and there's no risk if the url leaks after the expiration period.
Amazon S3 has something similar with Query String Authentication.
Ultimately, you need to figure out your threat model, and make a decision weighing the pros and cons of each approach. On Facebook, these are images that have presumably been shared with hundreds of friends. There's a significantly lower expectation of privacy, and so maybe authenticating every request is overkill. A random, hard to guess URL is probably sufficient, and let's them serve data through their CDN, and minimizes the amount of processing per request. With Option 3, you're still going to have overhead of generating those signed URL's.

Does facebook have its own cache?

While developing Facebook applications I have faced this problem many times that if I delete any image, then it appears on the application while testing, even I delete the whole file then, even, it is executed successfully, so I want to know "Does Facebook have its own cache from where files are executed?".
If so then is there any solution of this problem?
If not then why is happening this?
Best Regards & Thanks in advance
Not sure about image files (they reside in CDN) but facebook uses MemCached server to cache their stuff.
It's not that it has cache but that its main backing store doesn't provide any more coherency than is strictly necessary. Coherency has a cost, so if you don't need it, it makes sense not to pay the cost.
When operations have no enforced order between them, they may complete as if they were executed in either order. If your retrieval and your delete have on enforced order, then they may complete as if they were executed in either order. This applies even if one operation receives its response before the other operation was sent.
My understanding was that there was a cache. Especially for images and styles.
I have frequently made changes to css and updated images only to be left wondering why i can not see these updates.
I always change my css url to be something like styles/styles.css?time= which remedies everything.
In regards to the images , right click on the image in application and view in browser. Refresh to get the updated image and then go back to you application.