BankNext Case Study - Troubleshoot Production w/ ServiceMesh Istio Metrics - Part 1

Proactively aggregate metrics across entire production w/ ServiceMesh Istio configuration without any code change

BankNext is faced with a formidable challenge to consolidate a real time view of it’s huge number of production microservices (> 300). Given the sheer number of mSvcs, it is practically infeasible to manually change these. Not having the real time health metrics renders Bankwide blind & incapable of proactively preempting a brewing production disaster.

Current Architecture Challenges

  1. Multiple tools maintenance
    a. Grafana : machine level statistics, CPU, Heap, PODs
    b.
    Kibana APM : txn TPS, avg/peak throughput
    c.
    Zipkin : tracing txns
    d.
    Jaeger : span details
    e.
    VisualVM : Memory, CPU, Thread, Deadlock, Garbage collection stats
    f.
    JProfiler : Memory, CPU, Thread, Deadlock, Garbage collection stats

Solution w/ New Istio ServiceMesh Metrics Approach

  1. Engineering Objectives
    a. End-end view of the entire production application flow
    b. Integrated view of
    Logs + Metrics + Traces
    c. Ease of use -
    no additional coding
Kiali integrated view — creditCheck mSvcs w/ Kafka + Mongo
Kiali integrated view — creditCheck mSvcs w/ Kafka + Mongo

Detailed implementation steps

  1. Application setup
    Create ServiceMesh with docker k8 istio sidecar
    GitHub — creditCheck mSvcs with Kafka + Mongo + Kiali integration
minikube delete
minikube stop
minikube start — driver=docker
docker login
minikube docker-env
istioctl install — set profile=demo -y

2. Update application config w/ Istio

  • Obtain Istio Gateway IP
    minikube ip #eg. 192.168.49.2
kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}'  #eg. 32511
  • Update kyc-aggregator-mgt
\kyc-aggregator-mgt\src\main\resources\application.propertiesistio-base-url=http://192.168.49.2:
istio-gateway-port=32511

3. Kafka on Minikube w/ Istio enabled

cd /c/Vijay/Java/projects/minikube-kafka-cluster
kubectl apply -f 00-namespace/
kubectl label namespace kafka-ca1 istio-injection=enabled
kubectl label namespace default istio-injection=enabled
kubectl apply -f 01-zookeeper/
kubectl apply -f 02-kafka/
kubectl get pods -n kafka-ca1 -o wideecho " Check POD status. Ensure all PODs 2/2 "

4. Mongo on Minikube w/ Istio enabled

  • Mongo deploy
cd /c/Vijay/Java/projects/kyc-k8-docker-istio/networking
kubectl create ns mongo
kubectl label namespace mongo istio-injection=enabled
kubectl label namespace default istio-injection=enabled
kubectl create -f operations_mongo-deployment.yml -n mongo
kubectl get pods -n mongo -o wideecho " Check POD status. Ensure all PODs 2/2 "
  • Update kyc-credit-check-advanced
\kyc-credit-check-advanced\src\main\resources\application.propertiesspring.data.mongodb.uri=mongodb://mongo-nodeport-svc.mongo:27017/kyc

5. Docker images

  • kyc-aggregator-mgt
cd  /c/Vijay/Java/projects/kyc-k8-docker-istio/kyc-aggregator-mgtdocker rmi kyc-aggregator-mgt:latestdocker rmi -f kyc-aggregator-mgt:latest | docker rmi -f vijayredkar/kyc-aggregator-mgt:latestmvn clean installdocker build -t kyc-aggregator-mgt -f Dockerfile .
docker image ls
docker tag kyc-aggregator-mgt vijayredkar/kyc-aggregator-mgt:latest
docker push vijayredkar/kyc-aggregator-mgt
  • kyc-credit-check-basic
cd  /c/Vijay/Java/projects/kyc-k8-docker-istio/kyc-credit-check-basic
docker rmi kyc-credit-check-basic:latest
docker rmi -f kyc-credit-check-basic:latest | docker rmi -f vijayredkar/kyc-credit-check-basic:latest
mvn clean installdocker build -t kyc-credit-check-basic -f Dockerfile .
docker image ls
docker tag kyc-credit-check-basic vijayredkar/kyc-credit-check-basic:latest
docker push vijayredkar/kyc-credit-check-basic
  • kyc-credit-check-advanced
cd  /c/Vijay/Java/projects/kyc-k8-docker-istio/kyc-credit-check-advanced
docker rmi kyc-credit-check-advanced:latest
docker rmi -f kyc-credit-check-advanced:latest | docker rmi -f vijayredkar/kyc-credit-check-advanced:latest
mvn clean installdocker build -t kyc-credit-check-advanced -f Dockerfile .
docker image ls
docker tag kyc-credit-check-advanced vijayredkar/kyc-credit-check-advanced:latest
docker push vijayredkar/kyc-credit-check-advanced

6. PODs

  • kyc-aggregator-mgt
kubectl create ns consumer
kubectl label namespace consumer istio-injection=enabled
kubectl create -f /c/Vijay/Java/projects/kyc-k8-docker-istio/networking/operations_kyc-aggregator-mgt-k8-istio.yml -n consumer
kubectl get pods -n consumer -o wide
kubectl get services -n consumer -o wide
echo " Check POD status. Ensure all Kafka PODs 2/2 "
  • kyc-credit-check-basic
kubectl create ns basic
kubectl label namespace basic istio-injection=enabled
kubectl apply -f /c/Vijay/Java/projects/kyc-k8-docker-istio/networking/operations_kyc-credit-check-basic-k8-istio.yml -n basic
kubectl get pods -n basic -o wide
kubectl get services -n basic -o wide
echo " Check POD status. Ensure all Kafka PODs 2/2 "
  • kyc-credit-check-advanced
kubectl create ns advanced
kubectl label namespace advanced istio-injection=enabled
kubectl apply -f /c/Vijay/Java/projects/kyc-k8-docker-istio/networking/operations_kyc-credit-check-advanced-k8-istio.yml -n advanced
kubectl get pods -n advanced -o wide
kubectl get services -n advanced -o wide
echo " Check POD status. Ensure all PODs 2/2 "

7. Istio configurations

  • Gateway create
kubectl apply -f operations_kyc-istio-gateway.yml
  • Virtual Services create
kubectl apply -f operations_kyc-istio-virtualsvc-basic-routing-headers.yml
  • Routing Rules create
kubectl apply -f operations_kyc-istio-virtualsvc-advanced-routing-headers.yml
  • Destination Rules create
kubectl apply -f operations_kyc-istio-destrule-basic.yaml
kubectl apply -f operations_kyc-istio-destrule-advanced.yaml
  • Verify resource status
kubectl get services -n basic -o wide
kubectl get services -n advanced -o wide
kubectl get services -n consumer -o wide
kubectl get pods -n basic -o wide
kubectl get pods -n advanced -o wide
kubectl get pods -n consumer -o wide
echo " Check POD status. Ensure all PODs 2/2 "

Generate traffic for test simulations

#create traffic from kyc-aggregator-mgt to credit-check-basicfor((i=1;i<=3;i++)); do 
kubectl exec "$(kubectl get pod -l app=kyc-aggregator-mgt -n consumer -o jsonpath={.items..metadata.name})" -c kyc-aggregator-mgt -n consumer -- curl http://kyc-credit-check-basic.basic:8080/credit-check/basic -s -o /dev/null -w "%{http_code}\n";
done
sleep 20s#create traffic from kyc-aggregator-mgt to credit-check-advancedfor((i=1;i<=3;i++)); do
kubectl exec "$(kubectl get pod -l app=kyc-aggregator-mgt -n consumer -o jsonpath={.items..metadata.name})" -c kyc-aggregator-mgt -n consumer -- curl http://kyc-credit-check-advanced.advanced:8080/credit-check/advanced -s -o /dev/null -w "%{http_code}\n"
done

echo " repeat the script run if you need to create more test data "

Prometheus & Jaeger deploy

#Run the below commands in a new BASH terminalkubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.13/samples/addons/prometheus.yamlkubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.14/samples/addons/jaeger.yaml
#echo " run only if Jaeger UI need. May cause memory overload "
#istioctl dashboard jaeger

Kiali deploy & launch

kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.13/samples/addons/kiali.yamlsleep 180s   # allow some time for supporting PODs to be readyistioctl dashboard kiali
Dashboard — Services Across All Namespaces
Dashboard - Services Across All Namespaces
Traffic Flows -  Integrated View w/ Realtime Metrics
Traffic Flows - Integrated View w/ Realtime Metrics

Grafana deploy & launch

# run below cmds as an Adminnet stop winnat
net start winnat
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.13/samples/addons/grafana.yamlsleep 180s # allow some time for PODs to be readyistioctl dashboard grafana
Grafana View
Grafana View

Conclusion — Objectives Accomplished

1- Accomplished integrated view of the entire production system
2- Realtime dynamic health statistics without any code change
3- Provided capability to preempt production breakdowns
4- Enabled expedited production incident recovery
5- Part 2 will explore real-life production incident troubleshooting w/ Kiali

--

--

15+ years Java professional with extensive experience in Digital Transformation, Banking, Payments, eCommerce, Application architecture and Platform development

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vijay Redkar

15+ years Java professional with extensive experience in Digital Transformation, Banking, Payments, eCommerce, Application architecture and Platform development