Retrieve information from a mediawiki in a more structured way - rest

We would like to use a self-hosted mediawiki as a lightweight CMS to retrieve information from. However, the basic REST API is very limited in the way that it can retrieve content: this is probably because most information on wiki's is in unstructured form.
Is it possible to add your own ID system to a mediawiki so that instead of getting a whole page or section worth of information, you could search for specific ID's (or even request content by ID in a REST-like way? for example /:heading/:subheading/:sub-subheading ?
or if not, at least have a way of adding your own ID's so you could parse the information within a section in a more structured way?

Solved by using:
- The default REST API, simplified using npm package nodemw.
- Parsing wiki/text to HTML using npm package instaview.
- Accessing / modifying the HTML serverside using npm package cheerio.
Long live free, unstructured BLOBs of text! go wikimedia go! omg.

Related

How to upload files and attachments to the sobject record using REST API?

Salesforce has two different UIs and in accordance with it, it has the possibility to store attached files differently.
Two files were uploaded via the classic UI and they are marked as 'attachments'. Other files were uploaded through the new UI and they are marked as 'files'.
I want to upload all of these files using REST API. I cannot find the proper documentation. Can somebody help me with this?
That's not 100% true. In SF Classic UI you were able to upload Files too. It's "just" about knowing the right API name of the table and you'll find lots of examples online.
Attachment and Document objects have exactly same API names, you can view their definitions in SOAP API definition or in REST API explorer (there was something which you can still see in screenshot in here, seems to be down now, maybe they're moving it to another area in documentation...)
The Files (incl. "Chatter Files") are stored in ContentDocument and ContentVersion object. The name is unexpected because long time ago SF purchased another company's product and it was called "Salesforce Content". In beginning it was bit of mess, now it's better integrated into whole platform but still some things lurk like File folders can be called Libraries sometimes in documentation but actual API name is ContentWorkspace. The entity relationship diagram can help a bit: https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_erd_content.htm
ContentDocument is a header to which many places in SF link (imagine file wasting space on disk only once but being cross-linked from multiple records). It can have at least 1 version and if you need to update the document - you'd upload new version but all links in org wouldn't change, they'd still link to header.
So, how to use it?
REST API guide: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/dome_sobject_insert_update_blob.htm
or maybe Chatter API guide (you tagged it with chatter so chances are you already use it): https://developer.salesforce.com/docs/atlas.en-us.chatterapi.meta/chatterapi/connect_resources_files.htm
some of my answers here might help (shameless plug). They're about upload and reading data too and one is even about data loader... but you might experiment with exporting files first, get familiar with structure before you load?
https://stackoverflow.com/a/48668673/313628
https://stackoverflow.com/a/56268939/313628
https://stackoverflow.com/a/60284736/313628

Importing tags into AEM from a CSV file

Our client has a spreadsheet of about two thousand tags they want to start using on their AEM-based website.
I need a quick way to automatically import them as AEM tags.
I was thinking of writing a script to parse the document and issue a number of POST requests to AEM to create the content at /etc/tags
As an alterative, I considered uploading the CSV file to the repository and handling the creation of tags by means of a custom component or running a Groovy script in the AEM Groovy console.
Both solutions would require a lot of work and I'm a bit short in time. I also wouldn't like to reinvent the wheel. I don't think there's a way to do complete this task using OOTB functionality but is there any way to speed up the process?
You could use the Tag Maker provided by ACS AEM Tools.
You can find it in Tools > ACS AEM Tools > Tag Maker after installing the AEM Tools package on your instance.
It allows you to import tag hierarchies from CSV files and has a number of pre-defined converter that infer tag names and titles.

Find all <forms> used on a site

Is there e.g. a crawler that can find (and list the form action etc.) all pages that have forms in my site?
I'd like to log all pages with unique actions to then audit further.
Norconex HTTP Collector is an open source web crawler that can certainly help you. Its "Importer" module has a "TextBetweenTagger" feature to extract text between any start and end text and store it in a metadata field of your choice. You can then filter out those that have no such text extracted (look at the EmptyMetadataFilter option for this).
You can do this without writing code. As far as storing the results, the product uses "Committers". A few committers are readily available (including a filesystem one), but you may want to write your own to "commit" your crawled data wherever you like (e.g. in a database).
Check its configuration page for ideas.

How do I Server My WebApp files with Couchdb

I would like to develop an application with CouchDB, I believe that is possible to use ONLY CouchCB to server html, css, fonts, icons, js, etc. files as well as to store the data and handle them.
The problems I am facing is:
How to serve my files using CouchDB (without having to use any middleware like nodejs), what I found is that I can upload them as attachements to a _design document, but I find it not a practical way to do so for every single file
You are looking for couchapps. There are tools that take care of the uploading part for you like erica and couchapp.
Couchapp documentation is in the wiki part of the repo. Here is the file structure to design doc mapping guide.
For erica everything is in the readme.

Web CMS That Outputs to Flat Static Pages (.html) via FTP to Remote Server?

I have a web app project that I will be starting to work on shortly. One of the features included is going to be a content management system where users can add content and then that content will be combined with a template and then output as a regular .html file. This .html file would then be FTPed to their own web host.
As I've always believed in not reinventing the wheel I figured I'd see if there are any quality customizable CMSes out there that do this already do this. For instance, Blogger.com allows you to post all of your content to your account there; but offers the option to let you use your own hosting. Any time you publish a new article then a new .html page is generated (as well as an updated index page with links to the new article) and then the updated content is FTPed to your own server.
What I would like is something like this that I can modify to more closely suit my needs.
Required Features:
Able to host on my own server
Written in PHP
Users add content through their account, then when posted it is FTPed as .html to their server
Any appropriate pages are also updated to link to the new content (like the index page or whatnot)
Templateable
Customizable
Optional (but very much desired) features:
Written in CodeIgniter or a similar PHP framework
While CodeIgniter isn't strictly required, I would very much prefer it. It speeds up development time and makes things much easier to implement.
So - any suggestions? I've stumbled across a few CMSes that push to remote servers as static pages, but the ones I've found all are hosted on the developers servers which means that I cannot modify it at all.
Adobe Contribute might work for your situation. A developer/designer creates a set of templates with Dreamweaver and publishes the templates. Authorized users can then create pages based on the templates and only make changes within the editable regions. It includes systems for drafts and reviews prior to publishing (via many options, including ftp) and incorporates automatic version control. It can work with static html pages or dynamic pages like php.
Sounds like you need a separate application that can do this for you.
For example, you should be able to write something that queries Drupal's menu router and saves the output (with curl) to a directory and then run's rsync to push your content where you want it to go.
Otherwise your requirements are likely to be outside the scope of a typical CMS. Separating this functionality will give you better options.
You'd need to write a filter for your URLs too. It's a bit of work...
Hope that helps!