BankNext Case Study - Troubleshoot Production w/ ServiceMesh Istio Metrics - Part 1

Vijay Redkar
6 min readNov 20, 2021

Proactively aggregate metrics across entire production w/ ServiceMesh Istio configuration without any code change

BankNext is faced with a formidable challenge to consolidate a real time view of it’s huge number of production microservices (> 300). Given the sheer number of mSvcs, it is practically infeasible to manually change these. Not having the real time health metrics renders Bankwide blind & incapable of proactively preempting a brewing production disaster.

Current Architecture Challenges

  1. Multiple tools maintenance
    a. Grafana : machine level statistics, CPU, Heap, PODs
    b.
    Kibana APM : txn TPS, avg/peak throughput
    c.
    Zipkin : tracing txns
    d.
    Jaeger : span details
    e.
    VisualVM : Memory, CPU, Thread, Deadlock, Garbage collection stats
    f.
    JProfiler : Memory, CPU, Thread, Deadlock, Garbage collection stats
  2. Operational challenges
    a. Requires enhancement in each mSvc to enable metrics
    b. Manual changes require tedious regression testing
    c. Maintaining multiple tools is an operational nightmare
    d. Inability to preempt a production breakdown
    e. Exposes business to high risk & unpredictability
    f. Excessive system recovery time to emerge from a production incident
  3. Houston, we have a problem!

Solution w/ New Istio ServiceMesh Metrics Approach

  1. Engineering Objectives
    a. End-end view of the entire production application flow
    b. Integrated view of
    Logs + Metrics + Traces
    c. Ease of use -
    no additional coding
  2. Solution Approach & Capabilities
    a. Utilize Istio ServiceMesh’s Kiali metrics aggregation
    b. Provides full view of mSvc workloads
    across namespaces and components
    c. Seamless
    navigation from the logs to traces to span details
    d. Complete dashboard view of
    health metrics
    e.
    Real time statistics and error views updated dynamically
    f. Effective &
    expedited problem troubleshooting
  3. Part 1 of this article explains how to set up this architecture
  4. Part 2 will explore production incident root cause analysis w/ Kiali
Kiali integrated view — creditCheck mSvcs w/ Kafka + Mongo
Kiali integrated view — creditCheck mSvcs w/ Kafka + Mongo

Detailed implementation steps

  1. Application setup
    Create ServiceMesh with docker k8 istio sidecar
    GitHub — creditCheck mSvcs with Kafka + Mongo + Kiali integration
minikube delete
minikube stop
minikube start — driver=docker
docker login
minikube docker-env
istioctl install — set profile=demo -y

2. Update application config w/ Istio

  • Obtain Istio Gateway IP
    minikube ip #eg. 192.168.49.2
  • Obtain Istio Gateway port
kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}'  #eg. 32511
  • Update kyc-aggregator-mgt
\kyc-aggregator-mgt\src\main\resources\application.propertiesistio-base-url=http://192.168.49.2:
istio-gateway-port=32511

3. Kafka on Minikube w/ Istio enabled

cd /c/Vijay/Java/projects/minikube-kafka-cluster
kubectl apply -f 00-namespace/
kubectl label namespace kafka-ca1 istio-injection=enabled
kubectl label namespace default istio-injection=enabled
kubectl apply -f 01-zookeeper/
kubectl apply -f 02-kafka/
kubectl get pods -n kafka-ca1 -o wideecho " Check POD status. Ensure all PODs 2/2 "

4. Mongo on Minikube w/ Istio enabled

  • Mongo deploy
cd /c/Vijay/Java/projects/kyc-k8-docker-istio/networking
kubectl create ns mongo
kubectl label namespace mongo istio-injection=enabled
kubectl label namespace default istio-injection=enabled
kubectl create -f operations_mongo-deployment.yml -n mongo
kubectl get pods -n mongo -o wideecho " Check POD status. Ensure all PODs 2/2 "
  • Update kyc-credit-check-advanced
\kyc-credit-check-advanced\src\main\resources\application.propertiesspring.data.mongodb.uri=mongodb://mongo-nodeport-svc.mongo:27017/kyc

5. Docker images

  • kyc-aggregator-mgt
cd  /c/Vijay/Java/projects/kyc-k8-docker-istio/kyc-aggregator-mgtdocker rmi kyc-aggregator-mgt:latestdocker rmi -f kyc-aggregator-mgt:latest | docker rmi -f vijayredkar/kyc-aggregator-mgt:latestmvn clean installdocker build -t kyc-aggregator-mgt -f Dockerfile .
docker image ls
docker tag kyc-aggregator-mgt vijayredkar/kyc-aggregator-mgt:latest
docker push vijayredkar/kyc-aggregator-mgt
  • kyc-credit-check-basic
cd  /c/Vijay/Java/projects/kyc-k8-docker-istio/kyc-credit-check-basic
docker rmi kyc-credit-check-basic:latest
docker rmi -f kyc-credit-check-basic:latest | docker rmi -f vijayredkar/kyc-credit-check-basic:latest
mvn clean installdocker build -t kyc-credit-check-basic -f Dockerfile .
docker image ls
docker tag kyc-credit-check-basic vijayredkar/kyc-credit-check-basic:latest
docker push vijayredkar/kyc-credit-check-basic
  • kyc-credit-check-advanced
cd  /c/Vijay/Java/projects/kyc-k8-docker-istio/kyc-credit-check-advanced
docker rmi kyc-credit-check-advanced:latest
docker rmi -f kyc-credit-check-advanced:latest | docker rmi -f vijayredkar/kyc-credit-check-advanced:latest
mvn clean installdocker build -t kyc-credit-check-advanced -f Dockerfile .
docker image ls
docker tag kyc-credit-check-advanced vijayredkar/kyc-credit-check-advanced:latest
docker push vijayredkar/kyc-credit-check-advanced

6. PODs

  • kyc-aggregator-mgt
kubectl create ns consumer
kubectl label namespace consumer istio-injection=enabled
kubectl create -f /c/Vijay/Java/projects/kyc-k8-docker-istio/networking/operations_kyc-aggregator-mgt-k8-istio.yml -n consumer
kubectl get pods -n consumer -o wide
kubectl get services -n consumer -o wide
echo " Check POD status. Ensure all Kafka PODs 2/2 "
  • kyc-credit-check-basic
kubectl create ns basic
kubectl label namespace basic istio-injection=enabled
kubectl apply -f /c/Vijay/Java/projects/kyc-k8-docker-istio/networking/operations_kyc-credit-check-basic-k8-istio.yml -n basic
kubectl get pods -n basic -o wide
kubectl get services -n basic -o wide
echo " Check POD status. Ensure all Kafka PODs 2/2 "
  • kyc-credit-check-advanced
kubectl create ns advanced
kubectl label namespace advanced istio-injection=enabled
kubectl apply -f /c/Vijay/Java/projects/kyc-k8-docker-istio/networking/operations_kyc-credit-check-advanced-k8-istio.yml -n advanced
kubectl get pods -n advanced -o wide
kubectl get services -n advanced -o wide
echo " Check POD status. Ensure all PODs 2/2 "

7. Istio configurations

  • Gateway create
kubectl apply -f operations_kyc-istio-gateway.yml
  • Virtual Services create
kubectl apply -f operations_kyc-istio-virtualsvc-basic-routing-headers.yml
  • Routing Rules create
kubectl apply -f operations_kyc-istio-virtualsvc-advanced-routing-headers.yml
  • Destination Rules create
kubectl apply -f operations_kyc-istio-destrule-basic.yaml
kubectl apply -f operations_kyc-istio-destrule-advanced.yaml
  • Verify resource status
kubectl get services -n basic -o wide
kubectl get services -n advanced -o wide
kubectl get services -n consumer -o wide
kubectl get pods -n basic -o wide
kubectl get pods -n advanced -o wide
kubectl get pods -n consumer -o wide
echo " Check POD status. Ensure all PODs 2/2 "

Generate traffic for test simulations

#create traffic from kyc-aggregator-mgt to credit-check-basicfor((i=1;i<=3;i++)); do 
kubectl exec "$(kubectl get pod -l app=kyc-aggregator-mgt -n consumer -o jsonpath={.items..metadata.name})" -c kyc-aggregator-mgt -n consumer -- curl http://kyc-credit-check-basic.basic:8080/credit-check/basic -s -o /dev/null -w "%{http_code}\n";
done
sleep 20s#create traffic from kyc-aggregator-mgt to credit-check-advancedfor((i=1;i<=3;i++)); do
kubectl exec "$(kubectl get pod -l app=kyc-aggregator-mgt -n consumer -o jsonpath={.items..metadata.name})" -c kyc-aggregator-mgt -n consumer -- curl http://kyc-credit-check-advanced.advanced:8080/credit-check/advanced -s -o /dev/null -w "%{http_code}\n"
done

echo " repeat the script run if you need to create more test data "

Prometheus & Jaeger deploy

#Run the below commands in a new BASH terminalkubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.13/samples/addons/prometheus.yamlkubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.14/samples/addons/jaeger.yaml
#echo " run only if Jaeger UI need. May cause memory overload "
#istioctl dashboard jaeger

Kiali deploy & launch

kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.13/samples/addons/kiali.yamlsleep 180s   # allow some time for supporting PODs to be readyistioctl dashboard kiali
Dashboard — Services Across All Namespaces
Dashboard - Services Across All Namespaces
Traffic Flows -  Integrated View w/ Realtime Metrics
Traffic Flows - Integrated View w/ Realtime Metrics

Grafana deploy & launch

# run below cmds as an Adminnet stop winnat
net start winnat
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.13/samples/addons/grafana.yamlsleep 180s # allow some time for PODs to be readyistioctl dashboard grafana
Grafana View
Grafana View

Conclusion — Objectives Accomplished

1- Accomplished integrated view of the entire production system
2- Realtime dynamic health statistics without any code change
3- Provided capability to preempt production breakdowns
4- Enabled expedited production incident recovery
5- Part 2 will explore real-life production incident troubleshooting w/ Kiali

--

--

Vijay Redkar

15+ years Java professional with extensive experience in Digital Transformation, Banking, Payments, eCommerce, Application architecture and Platform development