Skip to main content
Flexible Top Header

Hi Community, 

 

we are planning to implement the Global Server Selector and as of now i’m wondering how the custom healthchecks are working. We can archive what we need we the onboard healthchecks but I’m curious how the custom healthchecks are working since it is no where described how to really implement one. 

 

Has anybody already implemented a custom healthcheck and can share more insights on them? 

 

Thx 

@augustineas & ​@dmuscat do you have any inputs here? I feel as though I remember an old discussion where ​@augustineas mentioned health checks. 

In case you haven’t seen them yet and they could be of some use we do have some Feature of the Month videos on these topics: 

 


I can’t speak to custom health checks in GSS specifically, but I have learned from sad experience that for modern complex web apps, simple HTTP checks looking at the response code often aren’t enough (especially when apps don’t use standard response codes the way people expect). Even checking for “normal” application content in the response hasn’t been enough to tell if the application is fully “up”, especially with the dynamic page content that is typical these days.  And there are different schools of thought when it comes to doing health checks and monitoring in general, and you will need to research what makes sense for your organization and potentially each app.

That said, for what its worth, what we ended up doing for more complex apps at the time was what we called “canary” (as in “canary in a coal mine”) pages. These were written by the developers so that when the particular page was requested, the backend system would test all important functions of the application and return some response indicating that everything was working as expected. This had the benefit of being developer maintained test, so the people who (arguably) knew the app the best could design the health check to suit their specific needs, making it as complex or simple as needed, and that kept control of the monitoring inside the app itself. Also, if a failure wasn’t detected, or in the opposite case a non-failure was reported as a failure, people knew that the developers were responsible, rather than blaming the load balancers for not doing the right thing. This kept the complexity out of the health checks on the load balancer side (which was important for scale when checking thousands of things), and allowed us to go back to the simpler HTTP check method looking for the specific responses that we standardized on with developer input to indicate that everything truly was “up”.

It doesn’t really answer you question about validating whether the custom health checks are working as intended though,


@augustineas I was so fascinated to read about this. Although as you mentioned it didn’t directly answer the question completely I think its a super valuable addition. Perhaps someone else can weigh in on this one. Out of curiosity, is the term “Canary” a commonly used term to denote this kind of tactic? Or was it something just your stakeholders used?  Asking in case others want to research and try searching the term.


The use of the term “canary” combined with something else is somewhat common in our industry. See, for example, https://kzero.com/resources/glossary/canary-cyber-security-definition/ for several variants in the security world, or https://semaphore.io/blog/what-is-canary-deployment for a devops related term.

That said, I suspect the term “canary page”, as we used it at least, was unique to the company at the time, but I could not say for certain. It may well have logically derived from the “warrant canaries” term (https://en.wikipedia.org/wiki/Warrant_canary) which was becoming popular at the time, or maybe other types of canary terms , but I couldn’t say for certain.


Really appreciate the clarification and added context around the term. ​@augustineas 🙌🏻

@dmuscat ​@rharolde have either of you implemented custom health checks and have any perspective to share? I feel as though I am remembering some discussions from the past but I could be mistaken


Reply