How to get one robots.txt for each store - magento2

I have a Magento 2 website with two stores. At the moment, I can edit the global website and his content is applied to both stores.
What I want to de is replace that behaviour in order to get one robot.txt file by store.
But I really have no idea how I should do that.
Currently, if I go to the back office Content > design > Configuration > (Store Edit) > Search Engine Robots
All the fields are disabled in the stores and can't be modified
But If I go on the global Content > design > Configuration > (Global Edit) > Search Engine Robots, of course, I can modify.
I also have 3 robots.txt files on my storage, but none of them seems to be matching the information saved in the global search engine robots configuration
src/robots.txt
src/app/design/frontend/theme1/jeunesse/robots.txt
src/app/design/frontend/theme2/jeunesse/robots.txt
I found these two links...but none of them helped me : https://inchoo.net/online-marketing/editing-robots-txt-in-magento-2-admin/ and https://support.hypernode.com/knowledgebase/create-robots-txt-magento-2/
The first one tells me that If I have a robots.txt on my storage it should override the configurations...but looks like no considering I have robots file and they aren't showing when I go to website/robots.txt. I only find again the one in the global configuration.
The second one tells that saving the configuration should save the robots.txt file on the storage...but once again...that's not what is happening.
Thanks for your help, let me know if there is pieces of code I can show ? I really don't know which one at this point.

I'm the author of the first link. It's a 2 years old article, Magento 2 has since then introduced a few improvements to the built in robots.txt functionality.
The robots.txt content you save under Content > Design > Configuration has a "website" scope. Meaning you can edit it on website level and if you need it to vary through this config you can do it if you have multiple websites.
It is unclear from the question itself if you have multiple websites or if you have set-up multiple stores and/or storeviews under the same website.

Related

Copy and paste Typo3 Sites between 2 backends

I am managing around 12 TYPO3 backends with almost similar content. Is it possible to copy and paste a created site between independent backends? Right now I'm creating by hand 12 sites with the same content. There has to be an easier way.
Well, there is not much I could try. Within TYPO3 I don't see any option to export/import sites from other backends.
First of all you should merge those 12 sites into one backend with multiple root sites and trees. Then you can easily handle different domains and/or languages via the site configurations for those roots.
Of course you can then make use of shared sys_folder pages that contain the content elements, that should be available for multiple sites. To make them available for a specific site, you can use references then.
You can export a page tree and import it into another instance.
On the other hand you can duplicate an instance by copying the complete database and original files.
That includes necessarily the fileadmin/ and uploads/ folders.
typo3conf/ should be duplicated by deployment but might differ in the files typo3conf/LocalConfiguration.php and typo3conf/AdditionalConfiguration.php (e.g. each instance should have other databases).
You can use the core extension impexp to import/export content parts. There is even a context menu entry
Be aware of some drawbacks:
if assets are exported, those are exported & imported you can hit the limit of your memory_limit
take extra care about which uids should be used, e.g. forcing uids can lead to drawbacks
Of course there are other options as well like:
create a custom extension which exports/imports the content on the fly using either something like a custom endpoint or fetching directly the DB if possible
use 1 installation as discussed already
if e.g. using ext:news use something like rss feeds for a poor man import/export with ext:news_importicsxml

TYPO3 9+: Multiple user_uploads related to page tree

is it possible to use different user_uploads by page tree in TYPO3 9+?
Informations:
Multiple companies are using the same TYPO3 instance for their websites.
Example #1:
An editor is placing images in a content element on a page of a page tree with drag'n'drop. Normally the files would be uploaded to the user_upload folder. If you have a huge amount of backend editors the user_upload folder will be a mass in a short time and nobody feels responsible for the files. I know that you can see if a file is used by its references in the info section but it would be much easier if a company would have its own user_upload directory (maybe subfolder of it).
Did anyone faced the same problem an has a solution for this?
Thanks in advance!
I think you should configure your user groups and their users in such a way that they use the correct storage. you can read about it here:
https://docs.typo3.org/m/typo3/reference-tsconfig/master/en-us/UserTsconfig/Options.html#defaultuploadfolder
https://docs.typo3.org/m/typo3/reference-coreapi/master/en-us/ApiOverview/AccessControl/MoreAboutFileMounts/Index.html

Noindex in a robots.txt

I've always stopped google from indexing my website using a robots.txt file. Recently i've read an article from a google employee where he stated you should do this using meta tags. Does this mean Robots.txt won't work? Since i'm working with a CMS my options are very limited and its a lot easier just using a robots.txt file. My question is whats the worst that could happen if i proceed using a robots.txt file instead of meta tags.
Here's the difference in simple terms:
A robots.txt file controls crawling. It instructs robots (a.k.a. spiders) that are looking for pages to crawl to “keep out” of certain places. You place this file in your website’s root directory.
A noindex tag controls indexing. It tells spiders that the page should not be indexed. You place this tag in the code of the relevant web page.
Use the robots.txt file when you want control at the directory level or across your site. However, keep in mind that robots are not required to follow these directives. Most will, such as Googlebot, but it is safer to keep any highly sensitive information out of publicly-accessible areas of the site.
As with robots.txt files, noindex tags will exclude a page from search results. The page will still be crawled, but it won’t be indexed. Use these tags when you want control at the individual page level.
An aside on the difference between crawling and indexing: Crawling (via spiders) is how a search engine’s spider tracks your website; the results of the crawling go into the search engine’s index. Storing this information in an index speeds up the return of relevant search results—instead of scanning every page related to a search, the index (a smaller database) is searched to optimize speed.
If there was no index, the search engine would look at every single bit of data or info in existence related to the search term, and we’d all have time to make and eat a couple of sandwiches while waiting for search results to display. The index uses spiders to keep its database up to date.
Here is an example of the tag:
<meta name="robots" content="noindex,follow"/>
Now that you read and understand the above information, I think you are able to answer your question on your own ;)
Indeed, there was the opportunity of GoogleBot that allowed to use:
Noindex
Nofollow
Crawl-delay
But seen on the GoogleBlog-News they will no longer support those (0,001% used) commands anymore from September 2019 on. So you should only use meta tags anymore for these on your page to be safe for the future.

Find all <forms> used on a site

Is there e.g. a crawler that can find (and list the form action etc.) all pages that have forms in my site?
I'd like to log all pages with unique actions to then audit further.
Norconex HTTP Collector is an open source web crawler that can certainly help you. Its "Importer" module has a "TextBetweenTagger" feature to extract text between any start and end text and store it in a metadata field of your choice. You can then filter out those that have no such text extracted (look at the EmptyMetadataFilter option for this).
You can do this without writing code. As far as storing the results, the product uses "Committers". A few committers are readily available (including a filesystem one), but you may want to write your own to "commit" your crawled data wherever you like (e.g. in a database).
Check its configuration page for ideas.

tinymce file browsers multiple file source

I am doing some updates to a site I have developed over the last few years. It has grown rather erratically (I tried to plan ahead, but with this site it has taken some odd turns).
Anyway, the site has a community blog ( blog.domain.com - used to be domainblog.com) ) and users with personal areas ( user1.domain.com, user2.domain.com, etc ).
The personal areas have standard page content that the user can use, or add snippets of text to partially customize. Now the owner wants the users to be able to create their own content.
Everything is done up to using a file browser.
I need a browser that will allow me to do the following:
the browser needs to be able to browse the common files at blog.domain.com/files and the user files at user_x.domain.com/files
the browser will also need to be able to differentiate between the two and generate the appropriate image url.
of course, the browser access to the user files will need to be dynamic and only show those files particular to the user (along with the common files)
I also need to be able to set a file size for images
the admin area is in a different directory than either the blog or the user subdomains.
general directory structure
--webdir--
|--client --
|--clientsite--
|--blog (blog.domain.com)
|--sites--
|--main site (domain.com)
|--admin (admin.domain.com)
|--users--
|--user1 (user1.domain.com)
|--user2 (user2.domain.com)
...etc.
I have tried several different browsers and using symlinks but the browsers don't seem to be able to follow them. I am also having trouble even setting them to use a directory that isn't the default.
what browser would you recommend? what would I need to customize to make it work.
TIA
ok, since I have not had any responses to this question, I guess I will have to do a work around and then see about writing a custom file browser down the road.