Written by Harini Krish
Lead Technical Content WriterFor a contemporary, software-defined business, a platform for data in action is essential to connecting each part of digital architecture across an organization to harness data flow between databases, applications, cloud ecosystems, and much more. Businesses employ platforms like Apache Kafka to respond to ever-changing streams of data in real-time. When data stops being in movement, your business is at a halt. As data in action is the backbone of business-critical applications, halt or interruption can lead to mission-critical processes and applications collapsing, causing major business disruption and data loss.
With the new release of Confluent Platform 6.2, it is exciting to introduce Health+ that provides the tools and visibility needed to ensure the health of the data-in-motion infrastructure and reduce business disruption with intelligent alerts, cloud-based monitoring and visualizations, and an efficient support experience. In addition, health+ delivers three primary benefits to enhance the reliability of the data:
In this blog post, we will explore each of these benefits of Health+, along with additional enhancements in this release.
More and more enterprises use Kafka to power delightful customer experiences and data-driven backend operations in real-time. Nevertheless, Kafka lacks effective alerting mechanisms to help you find and troubleshoot issues within its ecosystem, jeopardizing the success of these business-critical initiatives. In addition, it is highly challenging to guarantee availability and resilience, often requiring organizations to devote significant time, resources, and capability to deal with any Kafka-related issues. Health+ helps reduce the risk of downtime and data loss with intelligent alerts to discover the potential problems sooner than they occur and prevent business disruption.
As cluster metadata is analyzed continuously through an extensive library of expert-tested rules and algorithms, Health+ provides insights into cluster performance and health, identifying potential problems before they occur. In addition, it provides a growing set of intelligent alerts accompanied by Confluent-backed recommendations to solve critical issues. It enables self-managed deployments to power reliable, durable real-time applications and systems without the heavy operational burden required to keep Kafka running smoothly.
There are ten total validations available as of now, including:
Request handler idle percentage
Network processor idle percentage
Active controller count
Offline partitions
Unclean leader elections
Under replicated partitions
Under min in-sync replicas
Disk usage
Unused topics
No metrics from a cluster in one hour
There will be future enhancements on the roadmap of these alerts in Health+.
Health+ is built on the deep understanding of data-in-motion infrastructure rooted in experience managing thousands of clusters in Confluent Cloud that meets 99.95% uptime SLA. It operates some of the largest use cases for data in motion while solving problems across thousands of clusters every week. However, it is very challenging for teams to determine the right metrics to track without the requisite expertise.
Additionally, you can customize the types of notifications that you receive and opt to receive them via Slack, email, or webhook that seamlessly fits into your day-to-day workflows and operations. Each notification aims to avoid larger downtime or data loss by helping identify minor issues before they become more significant problems than disrupt the business.
Kafka is frequently used for real-time, mission-critical applications. Still, it lacks GUI-driven monitoring to ensure that systems are built to deliver on the promises of data availability and business SLAs. In addition, it requires third-party tooling for monitoring, and nearly all tools fail to provide the most critical metrics around health, performance, and availability of the Kafka environment. Health+ helps guarantees the performance and stability of the environments and instantly troubleshoot issues through historical visualizations and real-time monitoring data.
As customers embrace data in motion all through their organization, their monitoring needs increase as well. It enables to view the most important monitoring metrics in a single dashboard.
Not only you can view real-time and historical monitoring data of the connected Confluent Platform services, but you also insights and recommendations. Health+ continues to help businesses using the self-managed platform for data in motion to experience the advantages of a cloud-native solution to make the troubleshooting process targeted, efficient, and context-driven.
Supporting Kafka in-house can be much expensive and resource-intensive at the same time leading to long and expensive issue resolution times. Additionally, when problems occur, the operation support team often lacks easy access to all the appropriate information needed to immediately identify and address issues to reduce interruption.
The metadata provided for Health+ is the same data that you would send in a support ticket—it is just provided securely on an ongoing basis to ensure the continued health of your environment. Manually sending the data is cumbersome and error-prone, especially if a critical issue arises and you have to provide us with the data while under pressure.
Securely share contextual data without a manual entry for smoother enterprise support. Distributing this contextual information can resolve issues significantly faster and offload many of the manual steps to provide targeted solution of an issue. The latest streamlined support experience helps diagnose problems instantly and significantly reduces the time to resolution.
Additionally, to Health+, Confluent Platform 6.2 delivers improvements to existing features that make it a comprehensive platform for data in motion to implement mission-critical use cases end to end.
Confluent Platform 6.2 has introduced a new feature: the failover command to Cluster Linking, making it easier to recover from disaster events. For example, if you create a backup cluster in a new region and have synced data with Cluster Linking, you can call failover on the backup cluster with a single command. This makes disaster recovery failover easy, intuitive, and quick, allowing operators to accomplish higher recovery time objectives (RTOs) with an easier-to-manage infrastructure and ensuring high availability and resiliency of Kafka deployment.
Royal Cyber is a Confluent Partner and has deep expertise in implementing Confluent Services for several global customers. Royal Cyber can be your efficient partner in implementing the latest release of Confluent in your business. Reach out to us by writing to info@royalcyber.com or visit www.royalcyber.com.