Banking and Modern Observability: Dynatrace Insights

Andreas (Andi) Grabner

How Observability is Changing

Observability has been around for as long as software engineering has been around. But it has clearly changed over the years!

If we look back 15 years, the classic three-tier application stack (web server, app server, database) with application runtimes such as Java and .NET resulted in a new area of observability tools called ApplicationPerformance Monitoring (APM).

Those APM tools focused on monitoring rather static infrastructure and the dynamic behaviour of the application runtimes and business transactions. Financial organisations who connected those three-tier applications with their backend mainframe systems also started leveraging APM as some vendors expanded their support into z/OS, CICS or IMS to monitor the behaviour of end-to-end financial transactions.

As changes to those applications only happened occasionally, detecting problems and incidents got easier by analysing all the collected metrics, logs and distributed traces.

Fast forward to today: Infrastructure is no longer static. It runs on a multi-layer virtualised stack onVMs, Container Orchestration Platforms (such as Kubernetes or OpenShift), or even in Serverless PaaS (Platform as a Service) offerings distributed in global data centres. Applications are no longer three-tier, but rather a mesh of services spanning self-developed cloud services to any type of SaaS-hosted third-party services.

Why It’s Time for a New Approach to Observability for Banks

Changes in applications (releasing new features) and adaptations to virtual infrastructure to change load or resource consumption behaviour happen continuously. Deployments are now decoupled from releases by applying progressive delivery techniques such as canary deployments or feature flagging, which reduce the risk of failure—especially for critical banking systems that power the globally connected financial transaction systems.

Services are becoming increasingly complex, though, and the shift towards dynamically releasing new features also means that the chances of becoming vulnerable to security problems and attacks are rising alarmingly!

This explosion of complexity (and the need to provide resilient and secure services) requires a new approach to observability. This new approach needs to be able to cope with constantly changing distributed virtual and cloud-based infrastructure. It also needs to handle continuously changing services and configurations, and the increased need to observe and mitigate any potential security vulnerabilities.

Without adopting modern observability practices, many organisations will not be able to:

• Identify and fix problems impacting end-user experience resulting in lost business

• Identify resource usage inefficiencies resulting in higher operational costs

• Identify and mitigate security threats jeopardisingyour business’s integrity

• Identify and roll back progressive delivery changes threatening your systems’ stability

What are the Hurdles to Modernising for Banks?

Like most industries, banks and financial institutions need to modernise their technology stack and software delivery processes for multiple reasons, including:

• Delivering modern and competitive digital experiences for end users

• Speeding up feature delivery to react faster to market changes

• Attracting new software engineering talent

• Providing and consuming APIs to connect with the larger financial ecosystem

Any type of change to an existing system comes with a certain risk, though, and there is understandable hesitancy to change at the organisational level. For example, the new cloud-native environments are complex, and microservice architectures and dependencies to third-party API providers are a blocker.

An adapted and modern approach to observabilityis still the way forward though. With modern observability, you can better understand the potential risk, impact and root cause of changes.

Shifting Left to Modern Observability

Traditionally, observability was implemented in production, providing operations teams insights into an unknown system (black box) to improve and identify potential problem areas in case issues arise.

With the increased complexity of modern cloud-native environments, observability must no longer be seen as an afterthought. Observability must be included as a software development requirement—hence the term “Shift-Left”.

Shift-left means that developers (who know the critical components and code best) must define what level of observability they need to determine whether their software runs as expected. This can be done through agent-based observability solutions that automatically instrument code, or through developers leveraging standard frameworks like OpenTelemetry to emit traces, metrics or logs directly from their custom code.

A shift-left approach also means that observability data must be collected and analysed in every stage of your software delivery pipeline. Software quality gates must include checks on whether expected observability data is collected successfully and whether systems are behaving as expected. The relevant observability data must also automatically be forwarded to the right people and tools to make automated decisions in case of any anomalies.

How to Shift to Modern Observability

Some First Steps

Observability must now be a primary focus. It can no longer be seen as a cost centre or an afterthought. So how can you shift the approach to observability in your teams’ processes?

First, it’s important to recognise that observability isa business differentiator and has to become part of the software development process by adding it as anon-functional requirement to every newly created software component.

Some best practices for doing this include educating developers on the available observability signals (logs, metrics, traces, events, etc.) and the usage of modern observability frameworks (such as OpenTelemetry).This can be achieved in training sessions, development processes, and in shifting the culture and team mindset towards development.

Secondly, observability platforms that collect all observable data should be made available as self-service from the first developer environment up to production with minimal or no effort to access and analyse the data.

Additionally, when software gets tested as part of the software development process, observability data should also be validated as part of the software delivery process. Modern observability platforms integrate well with CI/CD solutions so you can validate that all logs, metrics, traces, and so on are captured and analysed the way the developers have coded or configured them. This ensures that all relevant observability data is available in production to ensure healthy operations or to support with troubleshooting of unexpected problems.

Many observability tools will have experts on hand who can work with you to run an assessment of your current development processes and help implement changes, too.

Observability in the Context of Progressive Delivery

How Can You Modernise With Observability and Feature Flags?

Progressive delivery—the technique used to decouple deployments from releases—is a key component in keeping your systems reliable, resilient and secure. Feature flags have become a very popular aspect of this in the last few years. The use cases range from deploying new end-user features to supporting Operations teams with auto-remediating tasks by dynamically changing the code behaviour without having to redeploy.

Using feature flags only works successfully when making the use and impact of those flags observable. Therefore, it’s necessary to make feature flags observable by default.

Modern feature flagging and modern observability solutions can work together to provide out-of-the-box insights into things like:

• Which feature flags are used by which end user

• How the feature flags impact the users’ experience

• Whether a new feature negatively impacts the underlying dynamic infrastructure

For compliance reasons, observability tools also keep track of when features were enabled and disabled to provide auditable tracing in case a feature had anon-desired impact.

What’s more, modern observability can be applied to the feature flagging solutions themselves to ensure they are always able to deliver the right feature flagging configuration to the business app that is using them. Feature flagging tools and their backends are becoming critical system components and they have to be as resilient and available as all other components.

(This is where the Flagsmith-Dynatrace integration comes into play, letting you send flag change events from Flagsmith into your Dynatrace event stream.)

CONCLUSION

Modern Observability Leads to More Secure Development for Banks

Banking organisations that implement modern observability can make it part of their software delivery requirements and enable developers to think about observability right from the start. Those organisations end up delivering innovation faster, more reliably and more securely.

This article is part of Modern Development Practices in Banking: A Playbook. You can download the ungated PDF version of the article here.