What tag export formats are there? - tags

I'm writing an importer for a CMS to import tags from various platforms/sources. I want to be able to import tags from WordPress, Moveable Type, Blogger; basically all of the big boys.
Are there any generic, standard tag export formats?

Based on your question, I'd assume you're looking for a common format that all of the big boys export to, which can then be consumed by your application. From what I can see, WordPress eXtended RSS appears to be pseudo-supported amongst WordPress, Movable Type, and Blogger. You might want to consider reading here for more information:
http://codex.wordpress.org/Importing_Content
This link does double duty for your question, since it also provides a comprehensive list of common blog engines, their export assumptions, and what formats you should consider supporting.
As for the WXR file format itself, it seems to be quite difficult to find a specification for in both Google and WordPress' own search database. Here's the most relevant result I could find:
http://olalindberg.com/blog/2008/12/13/write-your-own-wxr-file-to-migrate-custom-cms-to-wordpress
Good luck!

Related

Zend Feed Writer And Media RSS (Yahoo)

Attempting to create a feed with Zend_Feed_Writer but need to include images.
I checked and such support is offered by Media RSS (Yahoo) and their namespace: http://search.yahoo.com/mrss.
Unfortunately this is not supported by ZendFramework and I am wondering what is the best approach to create such a feed through ZF.
I believe it can be addressed via Extensions but the documentation is poor. Anyone had this need as well?
You can simply create your own Feed_Writer class that extends Zend_Feed_Writer and add methods to support the elements from the Media RSS specifications.
From your tags on your question I'm guessing you're using ZF2, right? Couldn't find an example
relating to that version, but here's a good example for creating Custom Feed and Entry Classes on ZF1. It shouldn't be too hard to understand the concept and translate it to ZF2.
Hope this helps, good luck.

How to scrape and virtually combine wiki articles?

So our company has a large number of internal wiki sites for different departments and I'm looking for a way to unify them. We keep trying to get everybody to use the same wiki but it never works, they keep wanting to create new ones. What I'm wanting to do as an alternative is to scrape each wiki and create a new wiki with articles that has combined information from each source.
In terms of implementation I've looked at Nutch (http://nutch.apache.org/) and (http://scrapy.org/) to do the web crawling and using MediaWiki as the frontend. Basically I'd use the crawler as the front end to scrape each wiki, write some code in the middle (I'm thinking of using Python or Perl) to make sense of it and create new articles, writing to MediaWiki using its API.
Wasn't sure if anybody had similar experience and a better way to do this, trying to do some R&D before I get too deep into the project.
I did something very similar a little while back. I wrote a little Python script that scrapes a page hierarchy in our Confluence wiki, saves the resulting html pages locally and converts them into DITA XML topics for processing by our documentation team.
Python was a good choice - I used mechanize for my browsing/scraping needs, and the lxml module for making sense of the xhtml (it has quite a nice range of xml traversing/selection methods. Worked out nicely!
Please don't do screenscraping, you make me cry.
If you just want to regularly merge all wikis into one and have them under a "single wiki", export each wiki to XML and import the XML of each wiki into its own namespace of the combined wiki.
If you want to integrate the wikis more tightly and on a live basis, you need crosswiki transclusion on the combined wiki to load the HTML from a remote wiki and show it as a local page. You can build on existing solutions:
the DoubleWiki extension,
Wikisource's interwiki transclusion gadget.

Google Rich Snippets warnings for hCard

I get the following errors from the Google Rich Snippet Tool for my website http://iancrowther.co.uk/
hcard
Warning: This information will not appear as a rich snippet in search results results, because it seems to describe an organization. Google does not currently display organization information in rich snippets
Warning: At least one field must be set for Hcard.
Warning: Missing required field "name (fn)".
Im experimenting with vcard and Schema.org and am wondering if I'm missing something or the validator is playing up. I have added vcard and Schema.org markup to the body which may be causing confusion. Also, I am making the assumption I can use both methods to markup my code.
Update:
I guess with the body tag, I'm just trying to let Google discover the elements which make up the schema object within the page. I'm not sure if this is a good / bad way to approach things? However it lets my markup be free of specific blocks of markup. I guess this is open to discussion but I like the idea of having a natural flow to the content that's decorated in the background. Do you think there is any negative impact? I'm undecided.
I am in favour of the Person structure, this was a good call as this is more representative of the current site content. I am a freelance developer and as such use this page as my Organisation landing page, so I guess I have to make a stronger decision of the sites goals and tailor the content accordingly, ie Organisation or Person.
I understand that there is no immediate rich snippet gains, but im a web guy so have a keen interest in these kind of things.
With schema testing, I find it easiest to start from the most obvious problem, and try to work our way deeper from there. Note, I have zero experience with hcard, but I don't believe the error you mentioned actually has anything to do with your hcard properties.
The most obvious problem I see, is that your body tag has an itemtype of schema.org\Organization. When you set an itemtype on a dom element, you are saying that everything inside of that element is going to help describe that itemtype. Since you've placed this on your body element, you are quite literally telling Google that your entire page is about an organization.
From the content of your page, I would recommend changing that itemtype to schema.org\Person. This would seem to be a more accurate description. Once you make that change and run the scanner again, you may see more errors relating to the schema and we can work through those too (for example, you'll probably need to set familname and givenName).
With all of that said, you should know that currently there are no rich snippets that you will gain from adding this schema data. Properly setting this up on your page, is only good to do, especially since we don't know what rich snippets Google or others will expose in the future, but currently you won't see any additional rich snippets in Google search results from adding these tags. I don't want to discourage you from setting this up properly but I just want to set your expectations.

I'd like to generate a SCORM compliant test or assessment from a flat-file or XML format

Here's the deal. We've got a bunch of test questions that have been exported from another system ... and they aren't in a SCORM compliant format.
Even if they were, we really need to get all of this data into a real learning content authoring tool.
The incumbent tool is Articulate, and as any search of the Articulate support site shows, there's no way to actually import a test question into Articulate.
Since we've got a lot of data that we'd prefer not to re-key, my question is, what's a good course authoring tool that can generate a SCORM 2004 assessment, and has a good import from flat file function for its question data?
Googling isn't really getting me too far.
Thanks!
SCORM is used to create SCOs (shareable content objects, aka 'lessons' or 'courses') which may optionally contain questions, but SCORM isn't a quiz/assessment framework. Because it isn't an assessment framework, there is no importer for turning an XML file into a SCORM assessment.
If you can't get Articulate to work for you, then you'll probably need to roll your own SCORM SCO and build a quiz system for it (with the ability to import your custom XML files). Ideally, each quiz question would be set up as an interaction (using cmi.interactions) in SCORM.
You may want to look at some open-source SCORM SCO building tools, such as eXe and Reload, though I'm not sure how helpful they'll be for you.
Sorry I don't know of any easier solutions.
EDIT:
BTW there's a workaround for importing XML into in Articulate: import the XML containing the questions into Quizmaker 2, then import your Quizmaker 2 presentation into Quizmaker '09. Not the easiest, but still easier than building your own SCO. See http://www.articulate.com/forums/articulate-quizmaker/3239-securing-quizmaker-xml-file.html
Disclaimer - I haven't worked with IMS-QTI personally, I just know of it.
You may want to take a look at IMS-QTI, and see if that format would work for you. IMS-QTI stands for IMS Question and Test Interoperability. There may be other formats, but IMS-QTI is the only one I'm aware of, and I'm sure there would be tools out there which support it.
That would change your search to finding a tool which supports IMS-QTI, and you may have better luck with that. :-)
There are some general examples of what kinds of questions it supports in the implementation guide.
Hope that helps!
I don't think Captivate or Articulate supports question import in any easy workflow. Your honest fastest route might be to author your own SCORM package format that will import questions from XML or JSON. Then write a converter to put your CSV content into XML or JSON. There are lots of SCORM API wrappers out there to use, and you'll have more control over any issues you find with LMS vs authorware interpretations of SCORM if you just build your own player.
This feature is now available in Claro:
http://feedback.dominknow.com/knowledgebase/articles/312552-how-do-i-import-test-questions-from-excel

Sanitizing title URLs in ExpressionEngine 1.6.x

I run a blog where the blog title is either an external link or an internal link to a longer piece similar to what you’ve seen on similar blogs. For some reason, ExpressionEngine (1.6.x) does nothing to sanitize such things as ampersands in the URLs provided.
I use Markdown in the body text, which seems to do a great job of sanitizing all URLs. Yet, ExpressionEngine’s own handling of the titles doesn’t cut it. I have tried formatting the “title URLs” in Markdown and failed miserable, and damn if I know what the hell it is in ExpressionEngine that prevents me from using it.
So the question boils down to what other ExpressionEngine 1.6.x users do and have done, or whether someone can come up with a MacGyver-esque solution. Because I’ve been stumped upwards of half a year.
The XML Encode Plugin for EE1 from Rick Ellis of EllisLab will convert your special characters to HTML entities.
The plugin was originally designed to convert reserved XML characters to HTML entities in the ExpressionEngine RSS templates, but should work for what you need.
To use the plugin, wrap your {title_link} custom field in between its tag pairs:
{exp:xml_encode}
{title_link}
{/exp:xml_encode}
This would result in:
http://www.google.com/search?q=nytimes&btnG=Google+Search
Being converting into:
http://www.google.com/search?q=nytimes&btnG=Google+Search
Other EE1 Plugins which offer more similar but advanced features are Cleaner by utilitEEs (Oliver Heine) or Low Replace by Lodewijk Schutte.