sexta-feira, 7 de março de 2025

LLM-Powered Java Development: Using Quarkus and Langchain4J to Run Python-Generated Code



Code on my github 


You may have heard that Auto Regressive LLMs (Large Language Models) are not good with certain tasks, given wrong or inaccurate results. Classical examples is the prompt: count the number of r in word strawberry


Asking local llama 3.2 to count the number of 'r' on word 'strawberry' and it wrongly says 2


Or which number is bigger: 9.11 or 9.9

Asking Llama 3.2 the prompt which number is bigger: 9.11 or 9.9 and it wrongly says that 9.11 is bigger

Of course modern LLMs may be trained to target such cases and may get it right, but if you try similar prompts you may get inaccurate results or even make the LLM "think" that they should answer something else and change the answer to something wrong. But still, there are issues that LLMs simply can't solve due the tokenization process and the context lenght. This is discussed in depth in recent Andrej Karpathy video about LLMs:



In this same video Karpathy suggests the use of code to solve such issues. For this purpose he uses a Tool to execute Python code and it happens that LLMs are very good in generating code, so the results are consistently right.


Andrej Karpathy showing how to solve issues with LLMs and the execution of Python code


Unfortunately the use code tool does not seem to be available for free users in ChatGPT and not in other free Chat LLMs available. In the other hand, we can make use of Java, Quarkus and Langchain4J with free LLMs to have the same behavior locally!


Tools and LLMs


An important feature of LLMs are Tools, which allow it to result in some special tokens that can be used for a framework or program to execute code that you can develop. These tokens are supposed to be used by programs and not by the user, so generally they are hidden from you on the response, the program that you are using to talk to the LLM will identify these tokens, run your code, get the result and feed it back into the LLM, which will continue the execution using the tool call in its context.

For example, let's say you ask traffic information for a given street. LLMs do not "know" real time information, it is trained on past data, but you can register Tools available to it, so it will be able to out the following information to tell the program running the LLM that it wants to call a tool, in this case the get_traffic_info tool:

1. User Input: Let's consider that the user asked the following to the LLM:

"What's the current traffic like on Main Street?"

2. LLM Processing and Tool Calling: The LLM recognizes that it needs to fetch real-time traffic data and decides to call an external tool (e.g., a Traffic API) and generates the following JSON inside a special token that is used by the program to identify that the LLM is trying to call a tool (e.g. tool_call). 

Sample JSON generated by a LLM that wants to call an external Tool

3. Tool Execution (External API Call): The system then executes the get_traffic_info tool by calling a traffic API (e.g., Google Maps API, TomTom API, etc.). Here's a sample response sent back to the LLM:

Sample response sent back to the LLM


4. LLM Processes the API Response: After receiving the response from the Traffic API, the LLM processes the data and generates a human-readable response for the user:

"The current traffic on Main Street in YourCity is moderate, with an average speed of 25 mph. There are no reported incidents at this time."

As a programmer you would have to read the LLM response and parse special tokens that indicates user messages, system message, end of messages and others. As you know, using a tool as qwen.ai,or chat.deepseek.com you don't see these tokens. The same is valid for the tokens that are used to identify tool calls. For interpreting Python code in a local running LLM we will need to use a tool, but we won't be doing this alone, instead we will use Quarkus LangChain4J extension!


Quarkus Lanchain4J extension



Creating applications with Large Language Models is not easy, but to help on this mission we have the popular framework called LangChain. Inspired by it we also have a Java framework called LangChain4J and it can integrate with the best cloud native Java framework Quarkus using the Langchain4J extension.

I know, it is a lot of names and concepts, but you can create a project using all of it using Code Quarkus Tool and you can simply select Langchain4J, a backend to run the LLM, such as Ollama and it will setup a project for you!

Our project

Finally it is time for our code! We followed the steps below to setup the project:

1. Make sure the pre-requisites are met: We are using Java 21 and Maven. I also installed ollama, which is a great tool to experiment LLMs locally, but you can use any LLM backend supported by LangChain4J

2. Generate the Quarkus project: In my case I used code.quarkus.io, but it is possible to use Quarkus CLI. I also added the extensions: quarkus-langchain4j-ollama and quarkus-rest. 

3. Configure LangChain4J: After the project is generated we must edit application.properties. Since we are using ollama, we select a language model. In my case I am using llama3.2, which is small and fast to run on my machine. I also increased the timeout!

quarkus.langchain4j.ollama.chat-model.model-id=qwq
quarkus.langchain4j.ollama.timeout=600s


4. Import the project in an IDE: I am using VSCode with Java extensions

5. Create a Tool: Let's finally write some code. Create a new class that will run our code. Notice that this class will be executed by LangChain4j so we need to annotate it with annotation "Tool" and the class must have the annotation "ApplicationScoped" to make it a recognizable class to Quarkus (it also tell Quarkus to create the class once and only destroy it when the application is finished). Here's our class code:


import java.io.StringWriter;
import org.python.util.PythonInterpreter;
import dev.langchain4j.agent.tool.Tool;
import jakarta.enterprise.context.ApplicationScoped;

@ApplicationScoped
public class CodeTool {
@Tool("Execute the given Python code")
public String execCode(String code) {
try (var pyInterp = new PythonInterpreter()) {
var output = new StringWriter();
System.out.println("Running the following code: \n" + "*".repeat(10) + "\n" + code + "\n" + "*".repeat(10));
pyInterp.setOut(output);
pyInterp.exec(code);

var result = output.toString().trim();
System.out.println("Result is: " + result);
return result;
} catch (Exception e) {
String error = "Error running code: " + e.getMessage();
System.out.println(error);
return error;
}

}
}


This code gets a Python Script that will be received as parameter and run it, gets the result and return back to the LLM. This code was taken from this great Baeldung blog post: How to Call Python From Java!

6. Register your AI Service: Our last piece of code is a Java interface to register our AI Service. It has a method with the system message injected by LangChain4J before passing the user prompt to the LLM.  The interface has the annotation RegisterAiService which we also use to register our Tool. Without this the tool is not registered with this service and it will not be used. The method generate has the SystemMessage annotation with a message that we want. Here's our code: 

import dev.langchain4j.service.SystemMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(tools = { CodeTool.class })
public interface TestService {

@SystemMessage("You are an assistant that
                        generates Python code and execute it.")
String input(String input);

}



Bear in mind that this is a service that could be used inside your application, you could even provide an user Message with the annotation UserMessage and more, but for our tests this is enough!

7. Finally Run your project and test it: Just run mvn compile quarkus:dev on the folder root of your project, wait it to start and go to localhost:8080/q/dev-ui/io.quarkiverse.langchain4j.quarkus-langchain4j-core/chat to start chatting with the LLM. Have fun!








My experiments with the Python Interpreter Tool


I had a lot of fun experimenting with this tool. I also tried different LLMs, including variations of llama, qwen and the new awesome qwq. You can check all the models supported by ollama, just bear in mind that some models may not run on your machine or the model may not support Tool Calling (remembers it must be trained with some special tokens to "understand" it). Here are some screenshots of my tests:


Asking Llama3.2 to responde the classical is 9.11 biger than 9.9 this time with code


Just asking llama3.2 to run some code that it wants


Asking qwq to run some code that it wants to run



Conclusion


In this post we shared how did we created a tool to execute python code generated by a LLM from a Java application!

Check the code on my github!






Nenhum comentário:

Postar um comentário