Is it possible to protect JSON-LD from email harvesters? - email

I want to use JSON-LD for SEO purposes, but not sure how to prevent an automated email harvester from picking up the address(es) from the source.
In the email schema you supply an email address. I've always obfuscated email addresses in some way by either using JS to display them, or other methods. This has helped stop spam so far.
<script type="application/ld+json">
{
"#context": "http://schema.org",
"#type": "Person",
"address": {
"#type": "PostalAddress",
"addressLocality": "Seattle",
"addressRegion": "WA",
"postalCode": "98052",
"streetAddress": "20341 Whitworth Institute 405 N. Whitworth"
},
"colleague": [
"http://www.xyz.edu/students/alicejones.html",
"http://www.xyz.edu/students/bobsmith.html"
],
"email": "mailto:jane-doe#xyz.edu",
"image": "janedoe.jpg",
"jobTitle": "Professor",
"name": "Jane Doe",
"telephone": "(425) 123-4567",
"url": "http://www.janedoe.com"
}
</script>
The only way I could think of doing it is using JS to dynamically create the above, which I would expect harvesters to not be able to interpret for the most part, but then that would most likely break search engine support. Is there any solution to this?

Unless you can detect the malicious bot (and serve it a version without the email address), there is no sensible solution. One of the main reasons for using structured data is giving bots easy access, so this is by design.
You could try to make getting the email address harder:
Schema.org’s email property expects Text as value, so obfuscation could be used (e.g., jane-doe at {this domain}).
Hope: bots don’t understand your obfuscation method by default.
If the use of Schema.org’s email property is not required: FOAF’s mbox_sha1sum property expects a SHA1 hashed email address.
Hope: bots don’t try to (or didn’t already) find the corresponding email address.
You could use JavaScript to add the email property (Google supports it, for example).
Hope: bots don’t execute JavaScript.
But this makes it harder for good bots too, of course, and at a certain point you might want to consider not providing the email address at all.
If you only want to provide the email address to certain consumers, you could serve these consumers the document that contains the email address, and all other bots the one without. But search engine bots might not like this method. And you disadvantage new consumers, or consumers you don’t know.
I would just provide the email address unobfuscated and for everyone, making the life of visitors (humans as well as bots) easier. Spam should be your problem, not theirs; and it’s a problem that can be handled.

JSON-LD makes data readily available for robots, including email harvesters which can easily spoof identity of other bots. I suggest leaving the email addresses out of the JSON-LD, it won't hurt the SEO and owners of those emails will love you for it. Otherwise you -will- cause their email boxes to be constant target of spam

Related

LD+JSON - Use for SameAs for a Product

Assume you have an eCommerce site from which you sell some product. You also sell that product on Amazon. What would be the SEO implications of using the sameAs property in your eCommerce site's ld+json to link to your Amazon URL as well? Is this a valid practice and would you gain anything by it?
For example:
{
"#context":"https://schema.org/",
"#type":"Product",
"sameAs":[AMAZON URL HERE],
"name":"My Product",
"image":"myproductimage.jpg",
"description":"my description",
"brand":{
"#type":"Thing",
"name":"Brand"
},
"sku":"SKU",
"mpn":"MPN",
"offers":{ ... }
}
N.B. I had previously asked this question here: https://opendata.stackexchange.com/questions/16016/ldjson-use-for-sameas-product since I tagged it as linked-data and that tag suggested posting there instead. But, I am not sure if it actually makes more sense to ask here since I've also seen several ld-json questions.
Description of this property from Schema:
URL of a reference Web page that unambiguously indicates the item's
identity.
We discuss activity in the digital domain can, therefore, make sense to check the Digital identity value of Wikipedia:
A digital identity is information on an entity used by computer
systems to represent an external agent. That agent may be a person,
organization, application, or device. ISO/IEC 24760-1 defines identity
as "set of attributes related to an entity".
In my humble opinion, the web page for selling your product on Amazon can hardly be called digital identification.
Information from Wikipedia, from Wikidata, from DBpedia and similar resources are more suitable for product identification.

URL links in email - resolve correct url?

Before i start thinking about this programatically, does anyone know if it is possible to actually extract the correct url from an email link that is basically a tracking module?
Our work email system auto blocks tracking based urls from email, so i am thinking of writing something to extract the correct url so people can copy and paste the tracking link into a program and it will provide the correct url.
Is this even possible with the way that email tracking works?
Here is an example of a url in an email that i recently received:
http://t.dripemail2.com/c/eyJhY2NvdW50X2lkIjoiNTE0MTQ4NSIsImRlbGl2ZXJ5X2lkIjoiOTI0NzI2MTU0IiwidXJsIjoiaHR0cHM6Ly93d3cuYXhzaWVkLmNvbS9nY3NlLWNvbXB1dGVyLXNjaWVuY2Uvb2NyLW5lYS1ndWlkZS8_X19zPXphb2txcDVpaWN4NGkxZndtYmNnIn0
Our system blocks these. It eventually resolves to:
https://www.axsied.com/gcse-computer-science/ocr-nea-guide/?__s=zaokqp5iicx4i1fwmbcg
(got our network admin to check it for me)
I want a system that gets the right url from the ugly mess that is blocked so we can actually view links from emails.
Thanks in advance for any help.
The data in tracking URLs are typically a unique ID pointing to some entry in a database, or are encrypted with a private key, so there's no way to obtain any meaningful information from them. (see answers to this related question: Generate unique link for each website visitor)
More naive approaches will simply encode the data, in which case you may be able to extract useful information from them. Funnily enough, your example URL is a base 64 encoded JSON object containing the link itself:
{
"account_id": "5141485",
"delivery_id": "924726154",
"url": "https://www.axsied.com/gcse-computer-science/ocr-nea-guide/?__s=zaokqp5iicx4i1fwmbcg"
}
In this case you could actually resolve the URL on your own, but this type of approach is uncommon for that very reason.

Send variable to 3rd party online form

In golang, is there a way to pipe a variable to part of a web form?
For example, sending "123 Random St." to the Street address part of https://www.dominos.com/en/pages/order/#/locations/search/ and so on? I found pizza_party*, but the GUI used is no longer available, I have also found pizzadash**, but this uses a credit card where I want to use cash. I even found a list of golang ones, but the links that they use doesn't work anymore.***
Therefore, my goal is so: order a pizza in golang through the dominos website API!
NOTE: Please suggest a package or function with example!
NOTE: I do not want to make a web scraper/data getter.
NOTE: Your answer must work on at least one box of my linked website.
NOTE: I want to fill out links similar to the provided link from the linux command line.
*https://github.com/coryarcangel/Pizza-Party-0.1.b
**https://github.com/bhberson/pizzadash
***https://golanglibs.com/top?q=pizza
This is how you post any form values onto an online form. Provided you know the POST endpoint of the service.
func main():
resp, err := http.PostForm(targetPostUrlHere,
url.Values{"Service_Type": {"Delivery"},
"Address_Type_Select": {"House"},
"Street": {"123 E 24th St"},
"Address_Line_2": {"4D"},
"City": {"New York"},
"Region": {"NY"},
"Postal_Code": {"10027"}})
}
**Note: The field keys and values are guesstimates. You must inspect the actual key names expected in the form.
In your case, https://www.dominos.com/en/pages/order/ is an endpoint for the form page. Once the form is filled and submitted, the information is submitted using POST method akin to the code afore-mentioned to a dedicated CREATE endpoint (C in the CRUD), which normally can be found in the <form> html tag.
<form action="posttargetendpoint" method="POST">...</form>
Once the POST operation is successful, usually a web service would redirect you to another page. In your case, it is https://www.dominos.com/en/pages/order/#/section/Food/category/AllEntrees/
However, any good web service wouldn't expose the POST endpoint in the clear since it is the vulnerable point of attack. You're welcome to find out by inspect he Domino's page source and adjust the field values in the Go code accordingly.
Now to make a command line prompt to wrap around the PostForm code, I suggest you look into https://github.com/codegangsta/cli which is a very nice package for creating quick command line app.
I assume you mean pipe information originating from your backend to another site on behalf of a user?
The standard way of passing information between domains is via HTTP params, usually via a GET request, but this capability would need to be supported by established protocols the remote site. You can also use an iframe to embed the page of another site onto your page, however, you wouldn't be able to remotely interact, call JS code, or even query the page at all. Cross-domain security safeguards justifiably prohibit such capability, and generally speaking, interacting on behalf of the user via their browser is also restricted for security reasons.
However, if you're looking to emulate user behavior such as with a bot or web scraper from your own host or browser then that's a different story. There are tons of frameworks provide rich capability for interacting with a page. I'd recommend checking out Selenium, which acts as a virtual browser. There are also tons of libraries in Python for processing data from HTML and structured data. You might want to check out Beatiful Soup and Scrapy.
Hope this helps.

Mailgun: Messages "Accepted" but taking long time to be delivered (or not being delivered)

I'm using Mailgun for a site I maintain, usually Mailgun works great, but I am encountering a strange problem. My script calls the HTTP API to send messages using Mailgun, these then show up in my log as being "accepted", but then take a very long time to be "delivered", often failing to be delivered at all and simply remaining as "accepted". Has anyone experienced a similar error or could anyone suggest a way to fix it? I'm guessing it's in the arguments supplied to the API but I can't for the life of me figure out the problem.
The problem exists for different recipient domains and different times of day.
The JSON log of a problematic message is below. I have, of course, changed addresses and domains.
{
"tags": [],
"timestamp": 1411498829.247304,
"envelope": {
"targets": "my-own-email#address.com",
"transport": "",
"sender": "noreply#the-site-in-question.com"
},
"recipient-domain": "address.com",
"event": "accepted",
"campaigns": [],
"user-variables": {},
"flags": {
"is-authenticated": true,
"is-system-test": false,
"is-test-mode": false
},
"message": {
"headers": {
"to": "my-own-email#address.com",
"message-id": "20140923190027.112157.29352#the-site-in-question.com",
"from": "\"the-site-in-question.com\" <noreply#the-site-in-question.com>",
"subject": "Dom, your password was reset."
},
"attachments": [],
"recipients": [
"my-own-email#address.com"
],
"size": 556
},
"recipient": "my-own-email#address.com",
"method": "http"
}
If this is happening regularly, it is very likely Mailgun has you on one of their low-tier IP addresses. I imagine this is the default for free accounts, since they don't want to "pollute" their good addresses with new users who may not be serious / legit.
You can check the "quality" of the IP address at a site like Sender Score. Find this line in the Mailgun log of a delivered message:
"sending-ip": "XXX.XX.XXX.XXX"
If Sender Score shows a score in the 70s, that's your problem. Send Mailgun a support ticket, as Chris suggested, and see if they can get you onto a higher quality IP address. I did so and my emails are now being sent from an IP address with a score in the 90s. Haven't had a single delay since.
Whenever a message shows as "Accepted" in the Mailgun logs this indicates that Mailgun has accepted the message and queued it for delivery. The message should be delivered fairly quickly, however it can be queued for a bit of time if you submitted a large amount of messages at once or if the recipient ESP is throttling messages from the IP/domain on your account.
I'd recommend opening a support ticket via your Mailgun account and provide some of the message-ID's so the support team can investigate the exact cause of these delays once the message is submitted to Mailgun.
The reason why emails are not delivering when sending to the domain name is due to no Routes being on the account. As a note, Flex plans cannot create routes; only Foundation and higher plans may use this feature.
For guys using the default Mailgun domain who will end up here like I did:
Mailgun did log my email as 'accepted' and 'delivered', however there was no email received in my inbox.
To fix this, you need to authorize the recipient email address: Read More Here

ColdFusion - Sending out a pretty email, mint style

I've used ColdFusion for sending text emails for years. I'm now interested in learning how to send those pretty emails you see from companies like Mint.
Anyone know of a good ColdFusion tutorial to teach me how to make this work and not get hit by bugs or spam filters?
As Ray said, ColdFusion supports HTML email, which is how you make an email "pretty". A quick down and dirty sample looks like this:
<cfmail from="bob#bob.com" to="someguy#email.com" subject="Check this out!" type="HTML">
<HTML>
<head><title>My Email</title>
</head>
<body>
<!--- Style Tag in the Body, not Head, for Email --->
<style type="text/css">
body { font-size: 14px; }
</style>
This is the text of my email.
</body>
</HTML>
</cfmail>
That's it, you've just sent an email. Notice how there is nothing preventing you from sticking in any old from email address you like? That leads me to my next point, in which you're wondering how to avoid getting hit by Spam filters:
The short answer is: You can't.
Oh sure, you can do intelligent things, like not including the word "VIAGRA" in your email (unless you're trying to send out penile enlargement emails and want to know how to get past spam filters, in which case I'm disinclined to help), but let's assume you just want to avoid obvious pitfalls.
I can think of two things that might help:
Send out email from a domain registered to the from email address. I didn't make the rules, but this one can be a pain. Ie., If you try to send out proxy emails for myorg.com, and your server does not host myorg.com, some spam filters are going to block it. What is usually done is to apply some branding to the from email, like this:
<cfmail from="MyOrg.Com <DONOTREPLY#registeredsite.com>" replyto="bob#myorg.com" to="someguy#email.com" subject="Test" type="HTML">
</cfmail>
In this case the email is sent from your server at registeredsite.com, with a replyto being the proxy email address. Spam filters will probably be okay with this, since the from email address of *#registeredsite.com resolves to your server. Try to send out with bob#myorg.com in the from, and you'll definitely run into some places that will block you.
Use a physical server, not a cloud site. I'm running into this very issue right now, but if you don't use a physical server that is located at a dedicated IP to send out your email, and if this server is not the originator of the email, some places are going to block it. This means no EC2 or Rackspace cloud site--sorry, some sysadmins are inclined to put down the banhammer on anything that originates from one of these providers, seeing as it is so easy to churn up your own little spam factory using EC2 or Rackspace for very little cost.
Even if you take these precautions, however, you'll run into a situation where someone gets a hold of your domain name and drags it through the mud. They'll send out thousands of emails to the internet in your name--or rather, in your domain's name--and because of the insecurity of email, your domain will get added to someone's blacklist after a thousand occurrences of hotlove4u#registeredsite.com hit the sysadmin's inbox. There's nothing you can do about it, either.
Or you can decide to run a cloud app and use a remote mail server. But some jokers will get one look at the originator being EC2 and will say, "Nope, sorry. Denied." They don't care about the legitimacy of your organization, only the origin of the email.
Email is an antiquated technology that has been rushed into mass usage before we really were able to think of a better protocol. As a protocol, it's terrible....and yet we're stuck with it, for backwards compatibility reasons. You cannot possibly avoid the spam filter. 95% of the email on the internet is junk mail, and never even reaches the intended recipient. Just absorb the enormity of that statistic for a moment, and pull your ideas back to reality. Many of the spam-prevention techniques being used today are unnecessarily aggressive, and create a great many 'false positives'. You can shoot for, say 80% of your email being sent, but what it really comes down to is this: As soon as the email has been fired off, it's completely out of your control. You can only take responsibility for so much.
What do you mean by "pretty" - HTML based? CF supports html email. Just use type="html". You can also use cfmailpart to send both text and html versions of the same content.
Here's a good article on making HTML email using CSS:
http://articles.sitepoint.com/article/code-html-email-newsletters
Ray's answer is right on the money about the CF part, but most of making this work is about HTML, CSS and testing testing testing.
And I would add to this all that you can check whether a mail will be displayed correctly and whether it will get hit by a spamfilter or not by going to a website that is called litmusapp. You can send your test newsletter to one of their emailaddresses and then they will give you screenshots of how each newsletter will look like in each type of emailclient. Also it checks the newsletter against a few popular spamblockers and gives you advice on what to change.
I would start by finding an HTML template email that you like. Then you put it in the tags with the type set to html as mentioned above. You might want to consider doing the multipart email to handle plaintext (and blackberry) users.
I subscribe to the Campaign Monitor Newsletter & they also have a list of very useful articles here: http://www.campaignmonitor.com/resources/
Might want to check out this ebook from MailChimp. Email apps render HTML in some unusual ways, so be prepared to use tables for layout.
Remember when you try to change the color of the font or background when you writing a cfmail, before you add #F0000, you need to ad extra # at the front of it, like ##F0000. Otherwise, it will cause an error.