I wanted to check something before I break everything. It may be obvious to you, but I'm naturally cautious and wanted to see whether I'm OK to do what I'm thinking.
I currently have docs on RTD (and have had them there for 3 years). We have moved pages around as versions changed, so have a set of about a hundred exact redirects, something like this:
/en/stable/introduction/intro.html -> /en/stable/introduction/setup-guide.html
Now I have a custom domain and want to point everything with a canonical link over there.
What will happen to those redirects? Are they applied first so someone with a bookmark to intro.html is pointed to setup-guide.html over on project.readthedocs.io and then the redirect to myprojectname.com/docs/ applied?
That is, will this happen?
/en/stable/introduction/intro.html --> myprojectname.com/docs/en/stable/introduction/setup-guide.html
Thanks in advance?
I haven't tried this yet. I'm too scared I'll break the whole edifice and end world order.
We have a site which sometimes delivers a wrong content for a specific URL.
The page has a plugin and by default should show the records listing (or the first record listing as the listing is grouped by initial letter). After clicking a link some records are viewable in detail on the same page.
Every now and then a cache problem occurs:
Instead of the listing a detailed record is shown.
Although we use realurl, all problems occur also with the basic urls.
For overview I will only write the url-parameters, assume www.domain.tld/index.php? in front.
The page to call is id=61.
What I see is
cHash=3df3421afc42d3d5bfa1bc50603ea00d&id=61&tx_citkoegovservicelight_ansprechpartner%5Baction%5D=show&tx_citkoegovservicelight_ansprechpartner%5Bansprechpartner%5D=282.
In the HTML-source of the page I show the page calling parameters with the extension page_params. Here I see:
tx_citkoegovservicelight_ansprechpartner[action]=show&tx_citkoegovservicelight_ansprechpartner[ansprechpartner]=282&tx_citkoegovservicelight_ansprechpartner[letter]=kontakt&id=61
Two strange notes: there is no cHash parameter, there is an additional parameter tx_citkoegovservicelight_ansprechpartner[letter] which never should be used with detail view and never should have the value kontakt (only single characters were used for the listing of all records starting with that letter = no detail view)
Using these parameters does not show the detail-view but the the list view (for letter 'A').
I do not find a reason why this special URL should be called (no link) and I don't know why TYPO3 should cache a content which belongs to another URL.
And it is a problem with TYPO3 cache as all works correct if I clear the cache of this single page.
Please check my answer to another issue. The accepted answer is right in that case, but in your case it can really be caused by failed cHash calculation, because it is not related to RealURL.
Try to clear the cache and then right after that go to tx_citkoegovservicelight_ansprechpartner[action]=show&tx_citkoegovservicelight_ansprechpartner[ansprechpartner]=282&tx_citkoegovservicelight_ansprechpartner[letter]=kontakt&id=61.
And then simply open the page id=61. If you see the wrong cached result, then the reason is in a combination of following factors:
Plugin's action is cached
Cache fails are allowed in installation
cHash calculation failed
To prevent this you should enable pageNotFoundOnCHashError in Install Tool. Then the problematic link above will just trigger 404 and will not force TYPO3 to render the page.
To a question where the link is coming from. If the website is already live, it can be everything: from a crawler, which somehow builded the link itself to a user who tried to play with parameters.
We're looking for a solution on how to best deal with the situation where multiple authors are working on the same page. If the first author pushes in the content, the second should have a way to merge it when he tries to publish. Launches appears to be a way to take care of this but it doesn't seem to be handling content merging. Is there any way an author can view the diff(and or do merge) of the content that might have been pushed by another author while they were working concurrently ?
Please help with any pointers.
Page modifications happen in real time to the underlying structure. They also happen at as small a level as possible.i.e. If you go into a text area and modify the text there, the text node is changed on the server, you aren't saving the entire page.
The only way that person A could interfere with what person B is doing is if they were working on the exact same area of the page. Which, honestly is a process issue. I say this because the answer to your question is that there is nothing out of the box to handle this type of scenario and if you are on 6.0 or higher and looking at the JCR3. JCR3 handles this far worse than the older version did. Last time I checked it didn't support nodes at all
Adding to what Bailey said, AEM OOB allow multiple users to edit same page in real time, though if multiple users are working on same node will be a reason of conflict. Such cases can be managed by defining a process like:
1. Take a lock of page and edit page or
2. Create versions of page and publish versions
I get the following errors from the Google Rich Snippet Tool for my website http://iancrowther.co.uk/
hcard
Warning: This information will not appear as a rich snippet in search results results, because it seems to describe an organization. Google does not currently display organization information in rich snippets
Warning: At least one field must be set for Hcard.
Warning: Missing required field "name (fn)".
Im experimenting with vcard and Schema.org and am wondering if I'm missing something or the validator is playing up. I have added vcard and Schema.org markup to the body which may be causing confusion. Also, I am making the assumption I can use both methods to markup my code.
Update:
I guess with the body tag, I'm just trying to let Google discover the elements which make up the schema object within the page. I'm not sure if this is a good / bad way to approach things? However it lets my markup be free of specific blocks of markup. I guess this is open to discussion but I like the idea of having a natural flow to the content that's decorated in the background. Do you think there is any negative impact? I'm undecided.
I am in favour of the Person structure, this was a good call as this is more representative of the current site content. I am a freelance developer and as such use this page as my Organisation landing page, so I guess I have to make a stronger decision of the sites goals and tailor the content accordingly, ie Organisation or Person.
I understand that there is no immediate rich snippet gains, but im a web guy so have a keen interest in these kind of things.
With schema testing, I find it easiest to start from the most obvious problem, and try to work our way deeper from there. Note, I have zero experience with hcard, but I don't believe the error you mentioned actually has anything to do with your hcard properties.
The most obvious problem I see, is that your body tag has an itemtype of schema.org\Organization. When you set an itemtype on a dom element, you are saying that everything inside of that element is going to help describe that itemtype. Since you've placed this on your body element, you are quite literally telling Google that your entire page is about an organization.
From the content of your page, I would recommend changing that itemtype to schema.org\Person. This would seem to be a more accurate description. Once you make that change and run the scanner again, you may see more errors relating to the schema and we can work through those too (for example, you'll probably need to set familname and givenName).
With all of that said, you should know that currently there are no rich snippets that you will gain from adding this schema data. Properly setting this up on your page, is only good to do, especially since we don't know what rich snippets Google or others will expose in the future, but currently you won't see any additional rich snippets in Google search results from adding these tags. I don't want to discourage you from setting this up properly but I just want to set your expectations.
I understand that no matter what I do, someone will be able to copy it. However I can still make them work hard for it. What are some good ways of making data not easily copied using php compatible coding.
--- Added ----
The data is a listing of results for certain local sports events. We send people out to collect the information, post the information, make corrections and such. However a competing website takes our results (I know they are directly copying them) and never updates them which causes people to call our office and complain.
---- Answer for my Use ----
I picked one of them, however I am going to use multiple of your answers. I am going to add my link in a using the copy pasta trick. I am going to put fake hidden text into it. I am also going to do the fake hidden text trick with different versions of the div tag that are fake (making it even harder to scrape or to do something like copy to textpad and replace it real easily), and I am going to talk to a lawyer as well about legal recourse and what I can do to make it illegal for them to copy the data (such as creative bios or something cool like that). Thanks for your help.
Joe, you can't really make them work really hard to get your data. It's essentially just a single request to any of your pages. Your best option is to explicitly state that you own the rights to all of your content, and that any infringement on that ownership will lead to legal ramifications*.
* Not a lawyer
Your data will be copied to every computer that requests the page and it will stay there until the person clears their cache. To answer your question, you can't.
What you can do is create a CSS style such as:
.copy-pasta { display: none; }
And then throughout your content, add something like this:
<p class="copy-pasta">Content provided via [your website here]</p>
This will increase your page rank when copy-pasters blatantly steal your content, meaning you will show up first in search results.
Place some <div style="display: inline; position: absolute; overflow: hidden; width: 0px">useless words</div> in the text. It won't display for reading, but if someone copy and paste... "WOW where it came from WTF!! *CRY*"
How about putting links to your site in with the displayed data? No big fanfare, but just suggest that the for the most up to date figures, they can go to the real website that publishes them.
Most of what you try will only work for a time. Until you exceed their laziness factor. (What they're doing suggests a high laziness factor.)
Laws don't protect publicly available data, but you may be able to protect the packaging and presentation.
Programs used to copy out data look for the data using pattern-matching. You could 'decorate' your data with randomly-chosen tags (like one row would have a span tag surrounding it, the next row a div, etc...). Just a thought.
Clarification:
With screen-scraper at least, the user of the program specifies what HTML comes before the data they want, and what HTML comes after it. You can make it more difficult for them to automatically retrieve the data.
Why are people calling your office to complain if the data is on a competing website? If they have a domain name that is similar enough to yours that people are confusing the two of you or if they've put something on their site that makes it look like you've endorsed them, then you've got them for trademark infringement.
Disable the context menu is a start.
$(document).bind('contextmenu', function(e)
{
return false;
});
Or
<body oncontextmenu="return false;">
Forbidding people to get data is almost impossible. You can mess up your tags and make the code really dirty and hard to parse... but it's not really enough. You could also generate a big image with the data in it, this would be painful to parse! ... but you don't want to do that.
Because you said...
However a competing website takes our
results (I know they are directly
copying them) and never updates them
which causes people to call our office
and complain.
... my call would be to take this the other way and create an API allowing people to get your content in a way that YOU designed.
Also if they are just shamelessly stealing your data and they don't have the right to do it, consider a legal option.
Another option is to use PHP code to generate images from the site's HTML. You would use the images to display the content, instead of HTML which can be easily copied out. Example code is here, and I bet you could find more code to do this by Googling:
http://www.acasystems.com/en/web-thumb-activex/faq-php-convert-html-to-image.htm
Try Copyscape it wont prevent your content from being copied, but it will make finding the copies very easy.
You may encrypt the data on the page, and have javascript obfuscated decoding routine that will decode it for you viewers. You may switch keys and encryption algorithms from time to time. Same javascript should disable ability to select text and/or copy it to prevent manual copy-pasting.
They won't be able to copy manually and their scraper would have to be able to run javascript to get the data.
Caveat is that the data won't be visible for Google, but if data is rather numeric it might not be such a big harm.
If they scrape automatically and very often you may also try to pinpoint their IP by observing most active IP-s on your site and serve them fake data.
Please don't use lawyers, that's hitting below the belt.
use swf to display your data, just like other online books