BankNext Case Study - Troubleshoot Production w/ ServiceMesh Istio Metrics - Part 1
Proactively aggregate metrics across entire production w/ ServiceMesh Istio configuration without any code change
BankNext is faced with a formidable challenge to consolidate a real time view of it’s huge number of production microservices (> 300). Given the sheer number of mSvcs, it is practically infeasible to manually change these. Not having the real time health metrics renders Bankwide blind & incapable of proactively preempting a brewing production disaster.
Current Architecture Challenges
- Multiple tools maintenance
a. Grafana : machine level statistics, CPU, Heap, PODs
b. Kibana APM : txn TPS, avg/peak throughput
c. Zipkin : tracing txns
d. Jaeger : span details
e. VisualVM : Memory, CPU, Thread, Deadlock, Garbage collection stats
f. JProfiler : Memory, CPU, Thread, Deadlock, Garbage collection stats - Operational challenges
a. Requires enhancement in each mSvc to enable metrics
b. Manual changes require tedious regression testing
c. Maintaining multiple tools is an operational nightmare
d. Inability to preempt a production breakdown
e. Exposes business to high risk & unpredictability
f. Excessive system recovery time to emerge from a production incident - Houston, we have a problem!
Solution w/ New Istio ServiceMesh Metrics Approach
- Engineering Objectives
a. End-end view of the entire production application flow
b. Integrated view of Logs + Metrics + Traces
c. Ease of use - no additional coding - Solution Approach & Capabilities
a. Utilize Istio ServiceMesh’s Kiali metrics aggregation
b. Provides full view of mSvc workloads across namespaces and components
c. Seamless navigation from the logs to traces to span details
d. Complete dashboard view of health metrics
e. Real time statistics and error views updated dynamically
f. Effective & expedited problem troubleshooting - Part 1 of this article explains how to set up this architecture
- Part 2 will explore production incident root cause analysis w/ Kiali
Detailed implementation steps
- Application setup
Create ServiceMesh with docker k8 istio sidecar
GitHub — creditCheck mSvcs with Kafka + Mongo + Kiali integration
minikube delete
minikube stop
minikube start — driver=docker
docker login
minikube docker-env
istioctl install — set profile=demo -y
2. Update application config w/ Istio
- Obtain Istio Gateway IP
minikube ip #eg. 192.168.49.2
- Obtain Istio Gateway port
kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}' #eg. 32511
- Update kyc-aggregator-mgt
\kyc-aggregator-mgt\src\main\resources\application.propertiesistio-base-url=http://192.168.49.2:
istio-gateway-port=32511
3. Kafka on Minikube w/ Istio enabled
cd /c/Vijay/Java/projects/minikube-kafka-cluster
kubectl apply -f 00-namespace/
kubectl label namespace kafka-ca1 istio-injection=enabled
kubectl label namespace default istio-injection=enabled
kubectl apply -f 01-zookeeper/
kubectl apply -f 02-kafka/kubectl get pods -n kafka-ca1 -o wideecho " Check POD status. Ensure all PODs 2/2 "
4. Mongo on Minikube w/ Istio enabled
- Mongo deploy
cd /c/Vijay/Java/projects/kyc-k8-docker-istio/networking
kubectl create ns mongo
kubectl label namespace mongo istio-injection=enabled
kubectl label namespace default istio-injection=enabled
kubectl create -f operations_mongo-deployment.yml -n mongokubectl get pods -n mongo -o wideecho " Check POD status. Ensure all PODs 2/2 "
- Update kyc-credit-check-advanced
\kyc-credit-check-advanced\src\main\resources\application.propertiesspring.data.mongodb.uri=mongodb://mongo-nodeport-svc.mongo:27017/kyc
5. Docker images
- kyc-aggregator-mgt
cd /c/Vijay/Java/projects/kyc-k8-docker-istio/kyc-aggregator-mgtdocker rmi kyc-aggregator-mgt:latestdocker rmi -f kyc-aggregator-mgt:latest | docker rmi -f vijayredkar/kyc-aggregator-mgt:latestmvn clean installdocker build -t kyc-aggregator-mgt -f Dockerfile .
docker image ls
docker tag kyc-aggregator-mgt vijayredkar/kyc-aggregator-mgt:latest
docker push vijayredkar/kyc-aggregator-mgt
- kyc-credit-check-basic
cd /c/Vijay/Java/projects/kyc-k8-docker-istio/kyc-credit-check-basic
docker rmi kyc-credit-check-basic:latest
docker rmi -f kyc-credit-check-basic:latest | docker rmi -f vijayredkar/kyc-credit-check-basic:latestmvn clean installdocker build -t kyc-credit-check-basic -f Dockerfile .
docker image ls
docker tag kyc-credit-check-basic vijayredkar/kyc-credit-check-basic:latest
docker push vijayredkar/kyc-credit-check-basic
- kyc-credit-check-advanced
cd /c/Vijay/Java/projects/kyc-k8-docker-istio/kyc-credit-check-advanced
docker rmi kyc-credit-check-advanced:latest
docker rmi -f kyc-credit-check-advanced:latest | docker rmi -f vijayredkar/kyc-credit-check-advanced:latestmvn clean installdocker build -t kyc-credit-check-advanced -f Dockerfile .
docker image ls
docker tag kyc-credit-check-advanced vijayredkar/kyc-credit-check-advanced:latest
docker push vijayredkar/kyc-credit-check-advanced
6. PODs
- kyc-aggregator-mgt
kubectl create ns consumer
kubectl label namespace consumer istio-injection=enabled
kubectl create -f /c/Vijay/Java/projects/kyc-k8-docker-istio/networking/operations_kyc-aggregator-mgt-k8-istio.yml -n consumerkubectl get pods -n consumer -o wide
kubectl get services -n consumer -o wideecho " Check POD status. Ensure all Kafka PODs 2/2 "
- kyc-credit-check-basic
kubectl create ns basic
kubectl label namespace basic istio-injection=enabled
kubectl apply -f /c/Vijay/Java/projects/kyc-k8-docker-istio/networking/operations_kyc-credit-check-basic-k8-istio.yml -n basic
kubectl get pods -n basic -o wide
kubectl get services -n basic -o wideecho " Check POD status. Ensure all Kafka PODs 2/2 "
- kyc-credit-check-advanced
kubectl create ns advanced
kubectl label namespace advanced istio-injection=enabled
kubectl apply -f /c/Vijay/Java/projects/kyc-k8-docker-istio/networking/operations_kyc-credit-check-advanced-k8-istio.yml -n advanced
kubectl get pods -n advanced -o wide
kubectl get services -n advanced -o wideecho " Check POD status. Ensure all PODs 2/2 "
7. Istio configurations
- Gateway create
kubectl apply -f operations_kyc-istio-gateway.yml
- Virtual Services create
kubectl apply -f operations_kyc-istio-virtualsvc-basic-routing-headers.yml
- Routing Rules create
kubectl apply -f operations_kyc-istio-virtualsvc-advanced-routing-headers.yml
- Destination Rules create
kubectl apply -f operations_kyc-istio-destrule-basic.yaml
kubectl apply -f operations_kyc-istio-destrule-advanced.yaml
- Verify resource status
kubectl get services -n basic -o wide
kubectl get services -n advanced -o wide
kubectl get services -n consumer -o widekubectl get pods -n basic -o wide
kubectl get pods -n advanced -o wide
kubectl get pods -n consumer -o wideecho " Check POD status. Ensure all PODs 2/2 "
Generate traffic for test simulations
#create traffic from kyc-aggregator-mgt to credit-check-basicfor((i=1;i<=3;i++)); do
kubectl exec "$(kubectl get pod -l app=kyc-aggregator-mgt -n consumer -o jsonpath={.items..metadata.name})" -c kyc-aggregator-mgt -n consumer -- curl http://kyc-credit-check-basic.basic:8080/credit-check/basic -s -o /dev/null -w "%{http_code}\n";
donesleep 20s#create traffic from kyc-aggregator-mgt to credit-check-advancedfor((i=1;i<=3;i++)); do
kubectl exec "$(kubectl get pod -l app=kyc-aggregator-mgt -n consumer -o jsonpath={.items..metadata.name})" -c kyc-aggregator-mgt -n consumer -- curl http://kyc-credit-check-advanced.advanced:8080/credit-check/advanced -s -o /dev/null -w "%{http_code}\n"
done
echo " repeat the script run if you need to create more test data "
Prometheus & Jaeger deploy
#Run the below commands in a new BASH terminalkubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.13/samples/addons/prometheus.yamlkubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.14/samples/addons/jaeger.yaml
#echo " run only if Jaeger UI need. May cause memory overload "
#istioctl dashboard jaeger
Kiali deploy & launch
kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.13/samples/addons/kiali.yamlsleep 180s # allow some time for supporting PODs to be readyistioctl dashboard kiali
Grafana deploy & launch
# run below cmds as an Adminnet stop winnat
net start winnatkubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.13/samples/addons/grafana.yamlsleep 180s # allow some time for PODs to be readyistioctl dashboard grafana
Conclusion — Objectives Accomplished
1- Accomplished integrated view of the entire production system
2- Realtime dynamic health statistics without any code change
3- Provided capability to preempt production breakdowns
4- Enabled expedited production incident recovery
5- Part 2 will explore real-life production incident troubleshooting w/ Kiali