BankNext CaseStudy: GenAi MultiModal Local LLM to detect memory leaks

5 min readJul 30, 2024

BankNext CaseStudy: GenAi MultiModal Local LLM to detect memory leaks

GenAi with vision capable LLM llava to pre-empt critical application performance issues.

Problem Statement :

BankNext’s ambitious digital transformation requires fast paced production deployments of ~100 microservices. The performance testing (PT) team is struggling with time to certify these applications. In the heat of the moment, PT inadvertently fails to report metrics pointing to serious degradation. Memory leaks creep in, causing a serious knock-on effect on the pace & quality of deliveries. Business is loosing confidence. BankNext needs a practical solution & it needs it fast.

Current State : Manual laborious inspections

1.Every application is subjected to rigorous performance testing.
2. Metrics are evaluated with standard monitoring tools like Grafana.
3. PT team diligently examines the stats to look for potential problems.
4. Heap memory graphs are critical in detecting potential memory leaks.
5. Manually examining volumes of such stats is laborious & error prone.
6. Warning stats pointing to a potential memory leak get overlooked.
7. Production environment suffers serious performance degradation.
8. Business gets severely affected & losses mount.
9. Houston, we have a problem!

Solution : Generative Ai Multimodal LLM

GenAi MultiModal LLM Architecture — Memory Leak Detect

1. GitHub : multimodal local LLM detect performance issues
2. Pre-trained large language model - llava:v1.6
3. Local LLM server for inferencing engine - Ollama
5. Java based tool for LLM interactions - LangChain4J
6. Embedding model - AllMiniLmL6V2EmbeddingModel
7. General purpose framework - SpringBoot/Java
8. Docker - my Docker runtime setup
9. Hardware requirements:
- local machine: higher than RAM:16GB, Storage:10GB, Cores:4
- else execute on cloud with the qParam switch shown in the demo.

System Design : Workflow

1. The solution utilizes the vision capabilities of the multimodal LLMs.
2. The chosen LLM is the opensource llava:v1.6
3. PT generated memory utilization graphs are located here
5. System Message:
You are an expert performance tester. This picture shows the graph of JVM memory utilized. The well behaved application shows graph patterns with long sharp increases in consumption and then long sharp decrease. If the utilized memory graph shows continuously increasing without dropping sharply then this indicates that a memory leak problem exists.
6. User Prompt:
Based on the provided graph, please state in your response if the possibility of memory leak problem exists or not. Your response should be in 1 line.
7. Image source for LLM analysis.
8. Input to local LLM server : System msg + User prompt + Image link
9. Ollama along with llava LLM analyzes the picture.
10. llava model interprets that graph line shows the memory consumed.
11. AI can now answer the user’s question w/ reasonable accuracy.
12. cURL to test :

#build and launch the application
git clone https://github.com/vijayredkar/gen-ai-llm-multimodal-analyze.git
cd <YOUR/PROJECT/LOCATION>/gen-ai-llm-vision-app-performance
mvn clean install
java -jar target/gen-ai-llm-vision-app-performance.jar

#cURL to test
# if you wish to execute this heavy workload on your local machine set endpoint qParam executeOnLocalMachine=Y 
# if your local machine is not powerful enough then utilize this Cloud resource
# 1- login/signup           https://auth.instill.tech/
# 2- create/use tokens       https://instill.tech/settings/api-tokens
# 3- update application.properties cloud.resource.token=<YOUR_TOKEN>
# 4- when invoking the endpoint ensure that the executeOnLocalMachine=   is blank
curl --request POST \
  --url 'http://localhost:8888/gen-ai/v1/llm/vision-examine?executeOnLocalMachine=' \
  --header 'Content-Type: application/json' \
  --data '{
 "text":"You are an expert performance tester. This picture shows the graph of JVM memory utilized. The happy path scenario should show a line patterns with sharp increase and decrease in usage. If the utilized memory continuously increases without dropping then this indicates that a memory leak problem exists. Based on the provided graph, please state in your response if the possibility of memory leak problem exists or not. Your response should be in 1 line.",
 "imgSrc":"https://github.com/vijayredkar/vijayredkar.github.io/blob/main/memory-usage-1.png?raw=true"
}'

Output : Scenario specific

Application Setup :

1. Refer to the section “Application Setup” here
2. Incorporate multimodal LLM llava.

#integrate with llava vision LLM
#choose 1 model based on your local machine capacity
ollama run llava:latest            #4.7GB  (small but less accurate)
ollama run llava:v1.6              #4.7GB  (recommended)
ollama run llava:7b                #4.7GB  (lesser accurate)
ollama run llava:13b               #8GB    (heavy on resources)

#application.properties
#provide the specific model name that you chose above
llm.model.name=llava:v1.6

3. Modes of operations : execute workload on local machine or cloud

if you wish to execute this heavy workload on your local machine 
   1- set endpoint qParam executeOnLocalMachine=Y 

if your local machine is not powerful enough then utilize this Cloud resource
   1- login/signup                    https://auth.instill.tech/
   2- create/use tokens               https://instill.tech/settings/api-tokens
   3- update application.properties   cloud.resource.token=<YOUR_TOKEN>
   4- set endpoint qParam executeOnLocalMachine=      (i.e. blank)

Application Video:

GenAi application demo end-end

Conclusion :

Positives:
- BankNext successfuly applied multimodal vision capable llava GenAi.
- Drastically reduced occurrences of memory leak oversight.
- Tremendously helped lower Performance tester’s workload.
- Significantly improved production deployment success rates.
- Created flexible solution that can be extended to multitude scenarios.
Negatives:
- Requires massive computing GPU resources.
- Latency is high on regular CPU based machines.
- Variations occur in response content with almost every run.
- Unexpected/inaccurate LLM response, at times.
- Additional manual review of the output is a must.
Accomplishment:
Multimodal GenAi solution provided high level of performance confidence in applications deployed to production.
Massively lowered the operational costs while maintaining an acceptable level of accuracy.

BankNext CaseStudy: GenAi MultiModal Local LLM to detect memory leaks

Written by Vijay Redkar

No responses yet