Is there e.g. a crawler that can find (and list the form action etc.) all pages that have forms in my site?
I'd like to log all pages with unique actions to then audit further.
Norconex HTTP Collector is an open source web crawler that can certainly help you. Its "Importer" module has a "TextBetweenTagger" feature to extract text between any start and end text and store it in a metadata field of your choice. You can then filter out those that have no such text extracted (look at the EmptyMetadataFilter option for this).
You can do this without writing code. As far as storing the results, the product uses "Committers". A few committers are readily available (including a filesystem one), but you may want to write your own to "commit" your crawled data wherever you like (e.g. in a database).
Check its configuration page for ideas.
Related
Salesforce has two different UIs and in accordance with it, it has the possibility to store attached files differently.
Two files were uploaded via the classic UI and they are marked as 'attachments'. Other files were uploaded through the new UI and they are marked as 'files'.
I want to upload all of these files using REST API. I cannot find the proper documentation. Can somebody help me with this?
That's not 100% true. In SF Classic UI you were able to upload Files too. It's "just" about knowing the right API name of the table and you'll find lots of examples online.
Attachment and Document objects have exactly same API names, you can view their definitions in SOAP API definition or in REST API explorer (there was something which you can still see in screenshot in here, seems to be down now, maybe they're moving it to another area in documentation...)
The Files (incl. "Chatter Files") are stored in ContentDocument and ContentVersion object. The name is unexpected because long time ago SF purchased another company's product and it was called "Salesforce Content". In beginning it was bit of mess, now it's better integrated into whole platform but still some things lurk like File folders can be called Libraries sometimes in documentation but actual API name is ContentWorkspace. The entity relationship diagram can help a bit: https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/sforce_api_erd_content.htm
ContentDocument is a header to which many places in SF link (imagine file wasting space on disk only once but being cross-linked from multiple records). It can have at least 1 version and if you need to update the document - you'd upload new version but all links in org wouldn't change, they'd still link to header.
So, how to use it?
REST API guide: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/dome_sobject_insert_update_blob.htm
or maybe Chatter API guide (you tagged it with chatter so chances are you already use it): https://developer.salesforce.com/docs/atlas.en-us.chatterapi.meta/chatterapi/connect_resources_files.htm
some of my answers here might help (shameless plug). They're about upload and reading data too and one is even about data loader... but you might experiment with exporting files first, get familiar with structure before you load?
https://stackoverflow.com/a/48668673/313628
https://stackoverflow.com/a/56268939/313628
https://stackoverflow.com/a/60284736/313628
I am doing some updates to a site I have developed over the last few years. It has grown rather erratically (I tried to plan ahead, but with this site it has taken some odd turns).
Anyway, the site has a community blog ( blog.domain.com - used to be domainblog.com) ) and users with personal areas ( user1.domain.com, user2.domain.com, etc ).
The personal areas have standard page content that the user can use, or add snippets of text to partially customize. Now the owner wants the users to be able to create their own content.
Everything is done up to using a file browser.
I need a browser that will allow me to do the following:
the browser needs to be able to browse the common files at blog.domain.com/files and the user files at user_x.domain.com/files
the browser will also need to be able to differentiate between the two and generate the appropriate image url.
of course, the browser access to the user files will need to be dynamic and only show those files particular to the user (along with the common files)
I also need to be able to set a file size for images
the admin area is in a different directory than either the blog or the user subdomains.
general directory structure
--webdir--
|--client --
|--clientsite--
|--blog (blog.domain.com)
|--sites--
|--main site (domain.com)
|--admin (admin.domain.com)
|--users--
|--user1 (user1.domain.com)
|--user2 (user2.domain.com)
...etc.
I have tried several different browsers and using symlinks but the browsers don't seem to be able to follow them. I am also having trouble even setting them to use a directory that isn't the default.
what browser would you recommend? what would I need to customize to make it work.
TIA
ok, since I have not had any responses to this question, I guess I will have to do a work around and then see about writing a custom file browser down the road.
The view file macro allows embedding documents (.ppt, .pdf, etc) on a Confluence wiki page. Limitation is, documents must be on attachments.
So question, is there a way to load dynamically a file located into an SCM's deposit?
P.S. Current SCM: Perforce.
UPDATE: As I see, there is no official Perforce plugin.
You may of course include a link to that file, if Perforce provides a way to link items. We use that a lot, to include content that is stored in Subversion, and document the standing, the usage, ... in Confluence then. The user has to click on that link to get that file, but I think it is necessary anyway, because your authorization rules are not known to Confluence.
I have a web app project that I will be starting to work on shortly. One of the features included is going to be a content management system where users can add content and then that content will be combined with a template and then output as a regular .html file. This .html file would then be FTPed to their own web host.
As I've always believed in not reinventing the wheel I figured I'd see if there are any quality customizable CMSes out there that do this already do this. For instance, Blogger.com allows you to post all of your content to your account there; but offers the option to let you use your own hosting. Any time you publish a new article then a new .html page is generated (as well as an updated index page with links to the new article) and then the updated content is FTPed to your own server.
What I would like is something like this that I can modify to more closely suit my needs.
Required Features:
Able to host on my own server
Written in PHP
Users add content through their account, then when posted it is FTPed as .html to their server
Any appropriate pages are also updated to link to the new content (like the index page or whatnot)
Templateable
Customizable
Optional (but very much desired) features:
Written in CodeIgniter or a similar PHP framework
While CodeIgniter isn't strictly required, I would very much prefer it. It speeds up development time and makes things much easier to implement.
So - any suggestions? I've stumbled across a few CMSes that push to remote servers as static pages, but the ones I've found all are hosted on the developers servers which means that I cannot modify it at all.
Adobe Contribute might work for your situation. A developer/designer creates a set of templates with Dreamweaver and publishes the templates. Authorized users can then create pages based on the templates and only make changes within the editable regions. It includes systems for drafts and reviews prior to publishing (via many options, including ftp) and incorporates automatic version control. It can work with static html pages or dynamic pages like php.
Sounds like you need a separate application that can do this for you.
For example, you should be able to write something that queries Drupal's menu router and saves the output (with curl) to a directory and then run's rsync to push your content where you want it to go.
Otherwise your requirements are likely to be outside the scope of a typical CMS. Separating this functionality will give you better options.
You'd need to write a filter for your URLs too. It's a bit of work...
Hope that helps!
I'm creating a cms and have not yet settled on the matter of where to store the complete url for a given page in the structure.
Every page have a slug (url friendly name of the page) and every page has a nullable (for top-level pages) parent and children.
Where do I store the complete url (/first-page/sub-page) for a given page? Should this go in the database along with the other properties of the page or in some cache?
Update
It's not the database design I'm asking about, rather where to store the complete url to a given page so I don't need to traverse the entire url to get the page that the user requested (/first-page/sub-page)
Update 2
I need to find which page belongs to the currently requested url. If the requested url is /first-page/sub-page I don't want to split the url and looping through the database (obviously).
I'd rather have the entire url in the table so that I can just do a single query (WHERE url = '/first-page/sub-page') but this does not seem ideal, what if I change the slug for the parent page? Then I also need to update the url-field for all descendants.
How do other people solve this issue? Are they putting it in the database? In a cache that maps /first-page-/sub-page to the id for the page? Or are they splitting the requested url and looping though the database?
Thanks
Anders
Store it in a cache, because the web servers will need to be looking up URLs constantly. Unless you expect the URLs of pages to change very rapidly, caching will greatly reduce load on the database, which is usually your bottleneck in database driven web sites.
Basically, you want a dictionary that maps URL -> whatever you need to render the page. Many web servers will automatically use the operating system's file system as the dictionary and will often have a built-in cache that can recognize when a file changes in the file system. This would probably be much more efficient than anything you can write in your CMS. It might be better, therefore, to have you CMS implement the structure directly in the file system and handle additional mapping with hard or soft links.
I just did this for MvcCms. I went with the idea of content categories/sub categories and content pages. When a content category / subcategory is created I go recursively through the parents and build the entire route and then store it in the category table. Then when the page is requested I can find the correct content page and find out when going through a nav structure if the current nav being built is the current or active route.
This approach requires some rules about what happens when a category is edited. The approach right now is that once the full path is set for a sub category it can't be change later with the normal tools.
The source is a mvccms.codeplex.com