How can I create a bookmarkable stateful page without versioning? - wicket

Wicket offeres these concepts for pages and page links (afaik):
Bookmarkable Links do not depend on any session information. The URL may not contain session ids, version numbers, etc.
Stateful Pages are stored on the server so they can be used later in the session (e.g. for AJAX communication or for the browser's back function). Stateless pages are always created freshly for each request.
Page Versioning creates one version of a page instance per request and stores it in the session. Each version has a session unique id that is used in the page links to address a specific version directly. The url looks like that (the '8' indicated the 8th version of the profile page within this session): http://.../wicket7/profile?8
The Wicket documentation mentions these dependencies:
Stateless pages have always bookmarkable links (makes sense...)
Non-bookmarkable links point always to stateful pages (ok, the logical inverse...)
Stateful pages may have both, bookmarkable and non-bookmarkable links
It seems that stateful pages are always versioned. But I believe that there are situation where you want your pages stored, but not versioned. Furthermoree it seems to me that versioned pages have no bookmarkable link since the version id relies on the session. So this is my questions:
Are stateful pages always versioned? Is there a good practice to switch off versioning but keep storing stateful pages?

Frank,
If you don't want to have "version" in url I recommend to add following code to your Application.init
getRequestCycleSettings().setRenderStrategy(RenderStrategy.ONE_PASS_RENDER);
Look into RenderStrategy for more information.

Related

Dynamically creating static routes from database using Next JS

I'm trying to understand how Next JS does dynamic routing and I'm a little confused on how to properly implement it in my own website. Basically, I have a database (MySQL) of content that keeps growing, let's say they're blog posts, with images stored in GCS. From what I understand you can create a pages/[id].js file in your pages folder that can handle dynamically creating routes for new pages, but, in order for you to get a good SEO score you, the Google crawlers need to see your content before any javascript or data requests are made. So the pages have to be physically available for the content to instantly appear upon loading. So if I have pages/[id].js and I have content added to the database daily, how are physical content files supposed to spontaneously populate the pages folder? And if pages files keep getting created, how do I prevent my disk from running out of space? I'm sure there is something I'm not understanding.
I read on nextjs.org that you can have a function getStaticPaths that needs to return a list of possible values for 'id'. I'm wondering, if my site is live and new content (pages) is constantly being added to the database with their own unique ids, how is it "aware" of those ids? Do I need to write a program or message queue system that constantly appends new ids to a file that is read by getStaticPaths? What if I have to duplicate my site on multiple servers around the world, or if a server dies, do I have to keep track of the file's contents in order to boot up a new server with the same content?
From what I understand, in order for Google to see any sort of content on your website, the pages text (content) needs to be static and quickly available via physical files. Images and everything else can be loaded later since Google's crawlers mainly care about text. So if every post needs to be a physical file in your app's pages folder, how do new pages files get created if the content is added to the database?
TL:DR My main concern is having my content readily available for Google crawlers in order to get a good score for my website. How do I achieve that if content is added to my database?
As you stated before, you can set up getStaticPaths to provide a list of values for id at build time. If I understand correctly, you are most concerned about what happens to new content added after the initial build.
For this you have to return the fallback key from getStaticPaths.
If the fallback key is false, then all IDs not specified initially will go to 404 and you’d need to rebuild the app every time you add new content. This is what you don't want.
If you set it to true, then the initial values will be prerendered just like before, but new values will NOT go 404. Instead, the first user visiting a path with a new Id will trigger the rendering of that new page. This allows you to dynamically check for new content if a request hits an id that wasn't available at build time.
It is interesting here that the first visitor will temporarily see a ‘fallback’-version of the page, while next.js processes the request. On that fallback, you would usually just show a loading spinner. The server then passes the data to the client in order to properly render the full page. So in practice, the user will first see a loading indicator, then the page updates itself with the actual content. Subsequent visitors will get the now prerendered result immediately.
You may now be worried about crawlers hitting that fallback page and not getting SEO content. This concern has been addressed here: https://github.com/vercel/next.js/discussions/12482
Apart from being able to serve new pages after build, the fallback strategy has another use in that it allows you to prerender only a small subset of your website (like your most visited pages), while the other pages will be generated only when necessary.
From the docs: When is fallback: true useful?
You may statically generate a small subset of pages and use fallback:
true for the rest. When someone requests a page that’s not generated
yet, the user will see the page with a loading indicator. Shortly
after, getStaticProps finishes and the page will be rendered with the
requested data. From now on, everyone who requests the same page will
get the statically pre-rendered page.
This ensures that users always have a fast experience while preserving
fast builds and the benefits of Static Generation.

News Component Requirement

We have a requirement to create a News Component. So there will be news pages which we will author contains Title, Image & description. I will store this under one node say content\myproject\newsnode\news1,news2 like this.
On the homepage, I want to show the latest 3 authored news description. For that, I'm thinking of using a news component.
I thought of creating 2 component and map them. Thinking of using the Query builder to fetch the latest news to show on homepage. One component of news page and one component on a homepage to show latest 3 news with Title, Tile image and a small description.
Is there any other approach to this?
If you are using a dispatcher, querybuilder servlet is blocked by default and should be blocked for obvious reasons.
Since your question is general, I will try to answer generally and on a high level.
There are two possible options I can think of:
1. make a servlet to retrieve the last 3 news component information and expose them as JSON. Then send an AJAX request from your browser and change the view accordingly with jquery or your front-end framework of choice.
Advantages: No caching, you'll always get the latest news.
Disadvantages: SEO, if you care about that in this case. Search engines will
not index the news on the page since they are not part of the initial markup (not server-side rendered)
2. Create a service to get the last 3 news component info then render them on your component via HTL or JSP. Basically server-side render them.
Advantages: SEO, same as the reason above.
Disadvantages: You have to invalidate the cache for your page every time a new news component is added to make sure your end users get the latest.
Hope this helps.

How do sites like Facebook ensure vanity URLs don't conflict with internal pages/folders?

Sites like FB offer you a shallow "vanity" URL of your choice to give out to people like facebook.com/examplestore
But because its so shallow in depth, it could easily conflict with internal pages. For example if I register the vanity URL of facebook.com/jobs then that means Facebook can never add a "Jobs" section to their site in the future without changing their URL. If they try to get rid of the existing page (maybe its a fan page for Steve Jobs) then it would mess up SEO for that page anyway.
I have thought that there should be a list of reserved words stored in a database table and the choice of vanity URL could be checked against that. But is this the only way to do it?
It would mean having to brainstorm every single possible section/page you may EVER have on your site (e.g. settings, index,... potentially any other section in the site) and even then you could miss a few. Is there any alternative/better way to do it?

Why are wicket pages serialized?

When I ask for a page, I see it serialized and stored on disk (and in the 2nd level cache), after it is rendered, so in the detach phase. Also the page itself is stored in the session.
When I ask for the page again, it is found in the session. So the serialized page is not consulted.
When I ask for the page in another session, the page is created anew. I thought that in this case the serialized page would have been used.
So can you give me an example, a scenario, where the serialized page is read from disk (or 2nd level cache)?
See this url trace:
direct your browser to your app:
http://localhost:8080/
Wicket creates an instance of the homepage and redirects to:
http://localhost:8080/?0
direct your browser to your app once again:
http://localhost:8080/
Wicket creates another instance of the homepage and redirects to
http://localhost:8080/?1
now press the back button so your browser requests the first instance again:
http://localhost:8080/?0
The first page instance is now deserialized from disk.
The http session keeps a live reference only to the page which has been used in the last request cycle. Any older pages are only in the disk. If your users use the browser back button then the old instance is loaded from the disk.
A file on the disk is used to store the pages per session. I.e.different users have different files with their own pages. Sharing the files would be a security issue, it is like sharing the http sessions.
Extra info: The disk storage is part of Wicket and used as default persistent storage. WicketStuff-DataStores module provides implementations with Redis, Hazelcast, Cassandra and Memcached. They could be used in case you want the old pages to be available in a cluster of web servers.

How best to setup 301 redirects from an old site that has many duplicate entries indexed on Google?

I am currently working with a client to redevelop their website. One of the final things I need to do before launch, is to make sure that their old website's pages are correctly redirected to the new URL structure of the new website.
Unfortunately, when I check Google to see how their current site is indexed, this relatively small website appears to have over 1500 pages indexed.
When I look at the indexed links on Google, many appear to be duplicates of the same page, but because of the terrible URI structure used on the old website, Google treats them differently.
For example, the 'Map' page is indexed at least twice on Google, under the following 2 URLs:
www.website.com/frame_page-map.html?mp_session=iris7k85851j05q55piqci31u3&mp_session=iris7k85851j05q55piqci31u3?page_code=map&mp_session=iris7k85851j05q55piqci31u3&mp_session=iris7k85851j05q55piqci31u3
www.website.com/frame_page-map.html?mp_session=sel6m8j5cu8lulep4dqa32sne7&mp_session=sel6m8j5cu8lulep4dqa32sne7?page_code=map&mp_session=sel6m8j5cu8lulep4dqa32sne7&mp_session=sel6m8j5cu8lulep4dqa32sne7
Only the session name is different in the URL (and I have no idea why it is repeated four times in a single URL, either).
For reference, the replacement URL for this page is:
www.website.com/contact/map
My question is: How do I setup a redirect for these multiple records on Google? Do I simply set-up the redirect for the old URL minus all of the URI parameters (i.e. www.website.com/frame_page-map.html) or is there another better method to do this?
Thanks for any help you might be able to offer!
It depends on what your goals are. If you don't care about the querystrings then setup a 301 (permanent redirect) that points to just your root page - map.html. To prevent google from indexing querystring params as separate pages use the canonical tag and have it reference the parent. This isn't guaranteed to work, but google takes your canonical into consideration when indexing.
If you care about the querystring values then you will have to setup a redirect for each one. There is a querystring parameter that you can append to your redirects that will tell it to be ignored so you don't have to write a regex that detects it.