What are some good ways of keeping content from being copied to other sites - copy-protection

I understand that no matter what I do, someone will be able to copy it. However I can still make them work hard for it. What are some good ways of making data not easily copied using php compatible coding.
--- Added ----
The data is a listing of results for certain local sports events. We send people out to collect the information, post the information, make corrections and such. However a competing website takes our results (I know they are directly copying them) and never updates them which causes people to call our office and complain.
---- Answer for my Use ----
I picked one of them, however I am going to use multiple of your answers. I am going to add my link in a using the copy pasta trick. I am going to put fake hidden text into it. I am also going to do the fake hidden text trick with different versions of the div tag that are fake (making it even harder to scrape or to do something like copy to textpad and replace it real easily), and I am going to talk to a lawyer as well about legal recourse and what I can do to make it illegal for them to copy the data (such as creative bios or something cool like that). Thanks for your help.

Joe, you can't really make them work really hard to get your data. It's essentially just a single request to any of your pages. Your best option is to explicitly state that you own the rights to all of your content, and that any infringement on that ownership will lead to legal ramifications*.
* Not a lawyer

Your data will be copied to every computer that requests the page and it will stay there until the person clears their cache. To answer your question, you can't.
What you can do is create a CSS style such as:
.copy-pasta { display: none; }
And then throughout your content, add something like this:
<p class="copy-pasta">Content provided via [your website here]</p>
This will increase your page rank when copy-pasters blatantly steal your content, meaning you will show up first in search results.

Place some <div style="display: inline; position: absolute; overflow: hidden; width: 0px">useless words</div> in the text. It won't display for reading, but if someone copy and paste... "WOW where it came from WTF!! *CRY*"

How about putting links to your site in with the displayed data? No big fanfare, but just suggest that the for the most up to date figures, they can go to the real website that publishes them.
Most of what you try will only work for a time. Until you exceed their laziness factor. (What they're doing suggests a high laziness factor.)
Laws don't protect publicly available data, but you may be able to protect the packaging and presentation.

Programs used to copy out data look for the data using pattern-matching. You could 'decorate' your data with randomly-chosen tags (like one row would have a span tag surrounding it, the next row a div, etc...). Just a thought.
Clarification:
With screen-scraper at least, the user of the program specifies what HTML comes before the data they want, and what HTML comes after it. You can make it more difficult for them to automatically retrieve the data.

Why are people calling your office to complain if the data is on a competing website? If they have a domain name that is similar enough to yours that people are confusing the two of you or if they've put something on their site that makes it look like you've endorsed them, then you've got them for trademark infringement.

Disable the context menu is a start.
$(document).bind('contextmenu', function(e)
{
return false;
});
Or
<body oncontextmenu="return false;">

Forbidding people to get data is almost impossible. You can mess up your tags and make the code really dirty and hard to parse... but it's not really enough. You could also generate a big image with the data in it, this would be painful to parse! ... but you don't want to do that.
Because you said...
However a competing website takes our
results (I know they are directly
copying them) and never updates them
which causes people to call our office
and complain.
... my call would be to take this the other way and create an API allowing people to get your content in a way that YOU designed.
Also if they are just shamelessly stealing your data and they don't have the right to do it, consider a legal option.

Another option is to use PHP code to generate images from the site's HTML. You would use the images to display the content, instead of HTML which can be easily copied out. Example code is here, and I bet you could find more code to do this by Googling:
http://www.acasystems.com/en/web-thumb-activex/faq-php-convert-html-to-image.htm

Try Copyscape it wont prevent your content from being copied, but it will make finding the copies very easy.

You may encrypt the data on the page, and have javascript obfuscated decoding routine that will decode it for you viewers. You may switch keys and encryption algorithms from time to time. Same javascript should disable ability to select text and/or copy it to prevent manual copy-pasting.
They won't be able to copy manually and their scraper would have to be able to run javascript to get the data.
Caveat is that the data won't be visible for Google, but if data is rather numeric it might not be such a big harm.
If they scrape automatically and very often you may also try to pinpoint their IP by observing most active IP-s on your site and serve them fake data.
Please don't use lawyers, that's hitting below the belt.

use swf to display your data, just like other online books

Related

internal links in Lektor's markdown blocks

I want to build a website, maybe similar to a movie database, where every page has, say, actors, director, year (it seems that Lektor can deal very well with such structured metadata), and I am thinking about how to realize internal links between pages on that site.
Say I have a text such as
just like in [his previous movie](link), he shows again ...
then I guess I could use the absolute path of the linked page as link target, but that makes me very inflexible with respect to changing URL structure. Can I somehow just use the ID of the target content?
Or, better yet, can I somehow automatically obtain the title of the linked page?
just like in his previous movie <<link:title>>, he shows again ...
Can I use the standard Markdown blocks for that or would I have to add some handcrafted database lookup logic?
if some contents will be changed in future. I think you can use the databag feature to implement it. you just modify the databg in case changed is need.

confluence display content by user

I am trying to get specific content on a confluence cloud wiki to display content based on a specific user. The scenario here is that there are links on a page but only 1 should display, the one that displays is based on whom ever is logged in.
I have been told how a macro is the way forward, but I have read the documentation and I am at a loss. I do not understand what I have to do or how to write a confluence macro. could someone help me out with either an example or some links? I have searched like crazy, but maybe i am not asking the right questions but hopfully you can all help me out?
There's a plugin for this:
https://marketplace.atlassian.com/plugins/net.customware.confluence.plugin.visibility
But I'm not sure how thoroughly it hides the content. It might still be visible if users view the page source. If you're trying to hide content which needs to be really protected, you'll probably need to do something else.
Depending on how many users are going to be using the page, you could also just make separate spaces for them, add the permissions to those spaces, and then use a page-include on your "main" page to display the content. If they don't have access it shouldn't show up. You might experience some formatting issues with that solution, however.
Finally, you could grab the username with jquery and display stuff based on that. This solution will be pretty easy if you are familiar with javascript/jquery.
Edit: Here are some helpful resources on how to use javascript and jquery within confluence:
https://confluence.atlassian.com/display/CONFKB/How+to+Use+JavaScript+in+Confluence
https://developer.atlassian.com/confdev/confluence-plugin-guide/writing-confluence-plugins/including-javascript-and-css-resources

Google Rich Snippets warnings for hCard

I get the following errors from the Google Rich Snippet Tool for my website http://iancrowther.co.uk/
hcard
Warning: This information will not appear as a rich snippet in search results results, because it seems to describe an organization. Google does not currently display organization information in rich snippets
Warning: At least one field must be set for Hcard.
Warning: Missing required field "name (fn)".
Im experimenting with vcard and Schema.org and am wondering if I'm missing something or the validator is playing up. I have added vcard and Schema.org markup to the body which may be causing confusion. Also, I am making the assumption I can use both methods to markup my code.
Update:
I guess with the body tag, I'm just trying to let Google discover the elements which make up the schema object within the page. I'm not sure if this is a good / bad way to approach things? However it lets my markup be free of specific blocks of markup. I guess this is open to discussion but I like the idea of having a natural flow to the content that's decorated in the background. Do you think there is any negative impact? I'm undecided.
I am in favour of the Person structure, this was a good call as this is more representative of the current site content. I am a freelance developer and as such use this page as my Organisation landing page, so I guess I have to make a stronger decision of the sites goals and tailor the content accordingly, ie Organisation or Person.
I understand that there is no immediate rich snippet gains, but im a web guy so have a keen interest in these kind of things.
With schema testing, I find it easiest to start from the most obvious problem, and try to work our way deeper from there. Note, I have zero experience with hcard, but I don't believe the error you mentioned actually has anything to do with your hcard properties.
The most obvious problem I see, is that your body tag has an itemtype of schema.org\Organization. When you set an itemtype on a dom element, you are saying that everything inside of that element is going to help describe that itemtype. Since you've placed this on your body element, you are quite literally telling Google that your entire page is about an organization.
From the content of your page, I would recommend changing that itemtype to schema.org\Person. This would seem to be a more accurate description. Once you make that change and run the scanner again, you may see more errors relating to the schema and we can work through those too (for example, you'll probably need to set familname and givenName).
With all of that said, you should know that currently there are no rich snippets that you will gain from adding this schema data. Properly setting this up on your page, is only good to do, especially since we don't know what rich snippets Google or others will expose in the future, but currently you won't see any additional rich snippets in Google search results from adding these tags. I don't want to discourage you from setting this up properly but I just want to set your expectations.

How do websites change content daily?

I just started learning HTML and CSS, with no knowledge on other languages such as javascript, Php, and so forth. Websites like Refdesk.com boast fresh content everyday, there has to be someway they are able to have new content everyday other then changing it by hand. Some Google searches came up with nothing but RSS feeds.
How is this done?
Thanks for the helpful answers, it answers half of my question, but does this also mean that the owner would have to manually add the webpage each day for new content, or say add in the content for a few days and have them displayed day after day automatically?
Most dynamic websites derive their page content from a database. Change the content in the database, and the content on the pages changes to follow suit.
Likely they have some form of content management system which allows non-technical users to update the site. In some systems, the content manager itself can get quite advanced. Here's a description of the latest version of the one used at the BBC, CPS, which drives the many BBC websites and more.
They most probably use a database where they store the content and the newest entries are retrieved from this database and displayed. This requires a server side language like PHP, Java, Python.
The HTML is generated dynamically.
The answers about databases combined with a server-side language like PHP are pretty good and very direct, but depending on how new you are to web development they might not be conceptual enough.
The first thing you need to understand is that a database is a collection of tables - each like any you might be familiar with in excel.
For example, one table in your database might be named "daily_links" and it might have two columns, one named "Date", and one named "Link". So every time you want to publish a new link, you just make a new row.
So now you are half way there.
Now what the server-side scripting language is able to do is to go to the database, look at your table "daily_links" and bring back each all the information that it found there.
From there it can do anything with that information like make a new anchor tag in html for each row it found, and give it an href of the data found in the column "Link".
That is rough idea in (very) general terms.
I hope that is easy to understand.

MySource Matrix - Opinions

Has anyone had experience with MySource Matrix as a content management system? If so, thoughts/opinions/comments?
Thanks in advance.
Absolutely excellent. It takes little while to get used to how it does things with its asset structure, but it is really flexible and powerful. Simple edit interfaces are great too.
Make sure you give it enough hardware. If you want dynamic content without caching you need heaps of grunt to make it hum.
Hands down the best CMS I have ever used. We use it on the Pacific Union College website, as well as many side projects. I am still amazed at all it has to offer compared to other products that are not free.
Give it a good look, and take some time to get past the learning curve, but once you do, it will be more than worth it. :)
I've recently been trying to use it in an organization where many non power users are generating content. - it has many interface bugs and odd behavior, so that many simple tasks (i.e. loading images) often have to be done by an power user (i.e. me).
When you are editing the HTML of page content white space is not preserved. If you where to format the HTML in the WYSIWYG editor, save you changes, and then come back the whitespace you've added will be removed - actually when you switch the WYSIWYG editor into HTML mode it doesn't show you the exact HTML, and does some silly things - like pressing enter inserts non breaking spaces - but doesn't show them until you save and re-enter HTML mode.
it is a number of little details like this which make it generally frustrating to use and disliked by everyone here.