Friday, February 14, 2025

Building a Retrieval-Augmented Generation (RAG) Application with Ollama 3.2 and Spring Boot

Building a RAG Application with Ollama 3.2 and Spring Boot

This blog post demonstrates how to build a Retrieval-Augmented Generation (RAG) application using Ollama 3.2 for large language models (LLMs) and Spring Boot for creating REST APIs. RAG combines information retrieval with LLMs to provide more accurate and contextually relevant answers. We'll leverage Docker Desktop for containerization and pgvector for vector storage.

Project Setup

We'll use Spring Boot version 3.3.7 for this project. Here's a breakdown of the key components and configurations:

1. Dependencies (Gradle):

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-jdbc'
    implementation 'org.springframework.boot:spring-boot-starter-web'
    implementation 'com.fasterxml.jackson.module:jackson-module-kotlin'
    implementation 'org.springframework.ai:spring-ai-ollama-spring-boot-starter'
    implementation 'org.springframework.ai:spring-ai-pgvector-store-spring-boot-starter'
}

This includes the necessary Spring Boot starters, Jackson for Kotlin support, and the Spring AI libraries for Ollama and pgvector integration.

2. application.properties:

spring.application.name=spring-boot-ai
server.port=8082

spring.ai.ollama.embedding.model=mxbai-embed-large
spring.ai.ollama.chat.model=llama3.2

spring.datasource.url=jdbc:postgresql://localhost:5432/sbdocs
spring.datasource.username=admin
spring.datasource.password=password

spring.ai.vectorstore.pgvector.initialize-schema=true
spring.vectorstore.pgvector=
spring.vectorstore.index-type=HNSW
spring.vectorstore.distance-type=COSINE_DISTANCE
spring.vectorstore.dimension=1024
spring.ai.vectorstore.pgvector.dimensions=1024

spring.docker.compose.lifecycle-management=start_only

This configuration sets the application name, port, Ollama model names, database connection details, and pgvector settings. Critically, spring.docker.compose.lifecycle-management=start_only allows Spring Boot to manage the Docker Compose lifecycle.

3. RagConfiguration.kt:


@Configuration
open class RagConfiguration {

    @Value("myDataVector.json")
    lateinit var myDataVectorName: String

    @Value("classpath:/docs/myData.txt")
    lateinit var originalArtical: Resource

    @Bean
    open fun getVector(embeddingModel: EmbeddingModel): SimpleVectorStore {
        val simpleVectorStore = SimpleVectorStore(embeddingModel)
        val vectorStoreFile = getVectorStoreFile()
        if (vectorStoreFile.exists()) {
            simpleVectorStore.load(vectorStoreFile)
        } else {
            val textReader = TextReader(originalArtical)
            textReader.customMetadata["filename"] = "myData.txt"
            val documents = textReader.get()
            val splitDocs = TokenTextSplitter()
                .split(documents)
            simpleVectorStore.add(splitDocs)
            simpleVectorStore.save(vectorStoreFile)
        }
        return simpleVectorStore
    }

    private fun getVectorStoreFile(): File {
        val path = Path("src", "main", "resources", "docs", myDataVectorName)
        return path.toFile()
    }
}
    
This configuration class creates a SimpleVectorStore bean. It loads existing vector data from database or generates it by reading the myData.txt file, splitting it into chunks, and embedding them using the specified embedding model.

4. RagController.kt:


@RestController
@RequestMapping("/rag")
class RagController(val chatClient: ChatClient, val vectorStore: SimpleVectorStore) {

    @Value("classpath:/prompts/ragPrompt.st")
    lateinit var ragPrompt: Resource

    @GetMapping("question")
    fun getAnswer(@RequestParam(name = "question", defaultValue = "What is the latest news about Olympics?") question: String): String? {

        return chatClient.prompt()
            .advisors(QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
            .user(question)
            .call()
            .content()
    }
}    
    
This controller defines a /rag/question endpoint that takes a question as a parameter. It uses the ChatClient and QuestionAnswerAdvisor to query the Ollama model, retrieving relevant context from the vectorStore and generating an answer.

Running the Application with Docker

1. Start pgvector Docker Container:

docker run --name pgvector-container -e POSTGRES_USER=admin -e POSTGRES_PASSWORD=password -e POSTGRES_DB=sbdocs -d -p 5432:5432 pgvector/pgvector:0.8.0-pg1

2. Pull Ollama Models:

Open a terminal in your Docker Desktop, exec of the springboot-ai-ollama-1 container and run:

ollama pull llama3.2
ollama pull mxbai-embed-large

3. Run the Spring Boot Application:

Start your Spring Boot application. Because of the spring.docker.compose.lifecycle-management property, Spring Boot will manage the Docker Compose file.

4. Access the API:

You can now access the RAG API at http://localhost:8082/rag/question?question=Your question here.

This setup provides a robust and scalable way to use Ollama 3.2 for RAG applications. The use of Docker and Spring Boot simplifies deployment and management. Remember to replace placeholder values like database credentials and file paths with your actual values. This example provides a foundation that you can extend to build more complex RAG applications.

Building a Retrieval-Augmented Generation (RAG) Application with Ollama 3.2 and Spring Boot

Building a RAG Application with Ollama 3.2 and Spring Boot This blog post demonstrates how to build a Re...