Tool to diff webpage semantic structure rather than content - diff

Does anyone know of any tools that allow diff'ing between two web pages semantic markup rather than content?
Cheers.

No, but you might have more success if you break it down into two steps:
Remove content
Diff

You could try using Pretty Diff tool. It would require a minor customization to the markup beautification component so that content components are set to empty strings.
Look at http://prettydiff.com/markup_beauty.js
Change lines 554, 557, and 560 to:
build.push("know text");
These change would actually need to occur in the larger prettydiff.com/prettydiff.js, but now you know where to look. Once done you can run all this off your local.
All you will need is:
HTML of http://prettydiff.com/
prettydiff.com/diffview.css
prettydiff.com/pd.js - this is the DOM interface between the application and the HTML
prettydiff.com/prettydiff.js - this is the actual application code.
I may write this concept of ignoring content into the tool as an option.

Related

TinyMCE Text editor security with HTML

I'm using the free JS plugin from tinymce and interested in preventing an HTML injection with the tinymce text editor.
I've added this property to the INIT:
invalid_elements: 'script' (just for this example)
However nothing happens. The editor still "accepts" the script tag and pass it on.
I looked at https://www.tiny.cloud/docs/tinymce/6/content-filtering/#invalid_elements and it should work but I don't see any change once it's added.
Am I doing something wrong?
Is there a way to limit some HTML elements with this editor?
Any other tips on how to use that editor and prevent the malicious HTML..?
TinyMCE certainly has a variety of configuration options to help you control what content is created in the editor but you can never assume that data provided to you client side is "clean" or "safe". Nefarious people can bypass your front end and all of its validation if their goal is to cause harm to your system.
You should always configure your front end appropriately. TinyMCE has a variety of configuration options to assist with content filtering/validation (https://www.tiny.cloud/docs/configure/content-filtering/) to only allow those types of tags you want created, etc including:
https://www.tiny.cloud/docs/configure/content-filtering/#valid_elements
https://www.tiny.cloud/docs/configure/content-filtering/#extended_valid_elements
https://www.tiny.cloud/docs/configure/content-filtering/#valid_children
https://www.tiny.cloud/docs/configure/content-filtering/#schema
https://www.tiny.cloud/docs/configure/content-filtering/#invalid_elements
However, regardless of the front end design, you should always re-check submitted content on the server to ensure it is safe. There is simply no way around that need. What constitutes "safe" is likely a business decision based on what your application does and who uses it.
There are many different libraries you can use server side to do this sort of validation/cleansing so depending on your specific server side setup you can find libraries that allow you to "sanitize/purify" the submitted HTML.
I would note that TinyMCE (by default) should not allow <script> tags in your content so it is likely that such behavior could be due to your current configuration.

Strike through page header in DokuWiki

I am looking for a way to strike through page header in DokuWiki. I checked on Dokuwiki page but couldn't find it. Can anyone please help me with it.
Another alternative would be to use an external tool like http://manytools.org/facebook-twitter/strikethrough-text/ to generate UTF-8 text that is already stricken through and then to copy&paste it into DokuWiki as your headline.
One advantage is that then also your links will display the headline as stricken through.
Also, you don't have to meddle with custom CSS or additional plugins.
it's a limitation of how headlines work in DokuWiki
See explanation here.

How to serve System.Web.UI.DataVisualization chart images from different domain?

I have a set of System.Web.UI.DataVisualization charts (ASP.NET 4.0) working great and saving the files to disk. No problems and works great.
The challenge is that I am serving all of my static image and assets from a cookieless domain -- and eventually maybe from a CDN -- and unfortunately, the chart's IMG SRC is always relative. I can't seem to find a way to override that so that I can specify the root domain of the static asset server.
Some possible workarounds I haven't tried yet are:
Progamatically generate the charts and manually add an image control to the page
URL rewrite any requests for images (extra server hit, may not work)
Search and replace the SRC manually before markup is sent to browser (ugh)
Are there any other possibilities I'm not aware/thinking of? Thanks!
To answer my own question, the first bullet of programatically generating the charts (vs having the control markup in the HTML) and adding a control to the page manually seems to work just fine.
It removes the ability to style things in the markup though, so any other ideas are welcome.
Edit: I decided to leave the styling markup and programatically hide the chart control and add in the image control separately. Works great!
Edit: There is one caveat to my solution of adding the image control separately -- if you used the tooltips for each datapoint, they will no longer work once you implement this change since they are generated by the control itself.

How to define custom wicket tag

I could not find a wicket tag like wicket:include? Can anyone suggest me anything? I want to include/inject raw source into html files? If there is no such utility, any suggestions to develop it?
update
i am looking for sth like jsp:include. this inclusion is expected to be handled on the server side.
To do this, you'll need to implement your own IComponentResolver.
This blog article shows an example somewhat resembling what you're after.
Is it raw markup that you want to include, or Wicket content?
If it's raw markup, even a simple Label can do that for you. If you call setEscapeModelStrings( false), the string value of the model will be copied straight in the markup. (Watch out for potential XSS attacks though.)
"Including" Wicket markup is done via Panels (or occasionally Fragments)
Update: If you add more detail about the actual problem you need to solve, there's a good chance that we can find a more "wickety" solution, after all, JSP and Wicket are two different worlds and the mindset of one doesn't work very well in the other.

TinyMCE writes terrible HTML!

I've currently got TinyMCE incorporated into the backend editor of a simple blogging/page-editing app, but I'm extremely unhappy with the HTML code it creates. It does all sorts of messy things like:
Adding inline style information to span tags that you can't ever find to get rid of without editing the HTML directly.
Nesting tags in nonsense ways (e.g. <p><strong><p><span>some text</span></p><strong></p> just to make something bold.)
Adding empty <p> </p> lines where they don't belong and I'm not trying to create blank lines.
EDIT: I've looked at lists of the other editors out there (including on SO), but I want to know if people firsthand have had better luck getting clean code out of their wysiwyg editors.
Any recommendations for one that outputs better code behind the scenes?
How about a rather drastic alternative, and using a WYMIWYG (What You Mean is What You Get) editor rather that another WYSIWYG editor. That way the author is in full control of the schematic markup as well as the content he/she is entering.
Unfortunately I haven't found one that is as feature rich and usable as tinyMCE, but it seems to have come a long way - see http://www.wymeditor.org/demo/
Use HTML purifier before saving the content into the database.
HTML Purifier
I found JoomlaFCK to be a very good alternative to Tiny MCE.
Hope you like it.
bye
BTW I know it is an old thread but someone might use it. ;)