Running ML/AI model — Part 2
Welcome to Part 2 of the AI/ML tutorial from AquilaX.
In this part, we’ll run a simple model designed for code-oriented purposes. There are numerous models available, each pre-trained on specific datasets to solve particular problems. Some larger models (30GB+) aim to be all-encompassing solutions, but at AquilaX, we believe in minimizing size to achieve faster results and lower costs. This approach may require using multiple models for different tasks, but we find it more efficient.
Let’s get started:
We will run the model on a GPU-powered pod via RunPod. Follow these steps to create your environment:
1. Sign up at [RunPod](https://www.runpod.io/).
2. Add credits to your account ([Billing](https://www.runpod.io/console/user/billing)) — $25 should be sufficient.
3. Navigate to the [pods section](https://www.runpod.io/console/pods) and click “Deploy”.
4. Select the RTX A5000 Ada (at $0.44/hour) or another GPU machine, and click “Deploy on demand”.
5. After a few seconds, you will have access to your pod with the GPU.
6. Click “Connect” and then “Connect to Jupyter Lab”.
7. A new tab will open, providing access to a shell and notebook interface to create and run your code.
Once inside the pod, we suggest opening two tabs in Jupyter Lab:
The first tab will be the shell/terminal where you execute the following commands:
pip install transformers
pip install accelerate
The second tab will be for a Python file named lab1.py (or any other name). Paste the code below into this file:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import time
device = "cuda" # or "cpu"
model_path = "ibm-granite/granite-3b-code-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
def run_this(text_input):
chat = [
{ "role": "user", "content": text_input },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt")
for i in input_tokens:
input_tokens[i] = input_tokens[i].to(device)
output = model.generate(**input_tokens, max_new_tokens=1000)
output = tokenizer.batch_decode(output)
for i in output:
print(i)
questions = [
"Write a code in java to show the median of an array of integers",
"Write a simple go lang code",
"what is XSS?",
"write a python script to expose a simple API",
"What is an example of SQL Injection in Java",
"what is the capital of France",
"My application is developed in java and have 2 vulnerabilites, write a summary about executives"
]
for question in questions:
start_ts = time.time()
run_this(question )
end_ts = time.time()
print("##############", question, "executed in", (end_ts - start_ts))
To run the model, go back to the terminal and execute:
python lab1.py
Voilà, the model should run in your environment. The first run will take some time to download over 10GB of data, but subsequent runs will be much faster.
Below are the results of the execution on our enviroment
Stay tuned for Part 3, where we will expose and API service of our model.