Showing posts with label spring-boot. Show all posts
Showing posts with label spring-boot. Show all posts

Monday, May 19, 2025

Enhancing LLM Responses with Prompt Stuffing in Spring Boot AI

Enhancing LLM Responses with Prompt Stuffing in Spring Boot AI

Large Language Models (LLMs) like OpenAI's GPT series are incredibly powerful, but they sometimes need a little help to provide the most accurate or context-specific answers. One common challenge is their knowledge cut-off date or their lack of access to your private, domain-specific data. This is where "prompt stuffing" (a basic form of Retrieval Augmented Generation or RAG) comes into play.

In this post, we'll explore how you can use Spring Boot with Spring AI to "stuff" relevant context into your prompts, guiding the LLM to generate more informed and precise responses. We'll use a practical example involving fetching information about a hypothetical IPL 2025 schedule.

What is Prompt Stuffing?

Prompt stuffing, in simple terms, means providing the LLM with relevant information or context directly within the prompt you send it. Instead of just asking a question, you give the LLM a chunk of text (the "stuffing" or "context") and then ask your question based on that text. This helps the LLM focus its answer on the provided information, rather than relying solely on its pre-trained knowledge.

This technique is particularly useful when:

  • You need answers based on very recent information not yet in the LLM's training data.
  • You're dealing with private or proprietary documents.
  • You want to reduce hallucinations and ensure answers are grounded in specific facts.

Setting Up Our Spring Boot Project

First, let's look at the essential dependencies and configuration for our Spring Boot application.

Dependencies (build.gradle)

We'll need Spring Web, Spring AI, and the Spring AI OpenAI starter. Here's a snippet from our build.gradle (Kotlin DSL):


plugins {
    // ... other plugins
    id 'org.springframework.boot' version '3.3.7'
    id 'io.spring.dependency-management' version '1.1.7'
    id 'org.jetbrains.kotlin.jvm' version '1.9.25' // Or your Kotlin version
}

// ... group, version, java toolchain

ext {
    set('springAiVersion', "1.0.0-M4") // Use the latest stable/milestone Spring AI version
}

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-web'
    implementation 'org.jetbrains.kotlin:kotlin-reflect'
    implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
    // ... other dependencies like jackson-module-kotlin, lombok
}

dependencyManagement {
    imports {
        mavenBom "org.springframework.ai:spring-ai-bom:${springAiVersion}"
    }
}
        

Configuration (application.properties)

Next, configure your OpenAI API key and desired model in src/main/resources/application.properties. Remember to keep your API key secure and never commit it to public repositories!


spring.application.name=spring-boot-ai
server.port=8082

# Replace with your actual OpenAI API Key or use environment variables
spring.ai.openai.api-key=YOUR_OPENAI_API_KEY_PLACEHOLDER
spring.ai.openai.chat.options.model=gpt-4o-mini # Or your preferred model
        

Using gpt-4o-mini is a good balance for cost and capability for many tasks, but you can choose other models like gpt-3.5-turbo or gpt-4 depending on your needs.

The Core: Prompt Template and Context Document

The magic of prompt stuffing lies in how we structure our prompt and the context we provide.

1. The Prompt Template (promptToStuff.st)

We use a prompt template to structure our request to the LLM. This template will have placeholders for the context we want to "stuff" and the actual user question.

src/main/resources/prompts/promptToStuff.st

Use the following pieces of context to answer the question at the end. If you don't know the answer just say "I'm sorry but I don't know the answer to that".

{context}

Question: {question}
        

Here, {context} will be replaced by the content of our document, and {question} will be the user's query.

2. The Context Document (Ipl2025.txt)

This is a simple text file containing the information we want the LLM to use. For our example, it's about IPL 2025 schedule.

src/main/resources/docs/Ipl2025.txt

IPL 2025 will resume on May 17 and end on June 3, as per the revised schedule announced by the BCCI on Monday night.

The remainder of the tournament, which was suspended on May 9 for a week due to cross-border tensions between India and Pakistan, will be played at six venues: Bengaluru, Jaipur, Delhi, Lucknow, Mumbai and Ahmedabad.
The venues for the playoffs will be announced later, but the matches will be played on the following dates: Qualifier 1 on May 29, the Eliminator on May 30, Qualifier 2 on June 1 and the final on June 3. A total of 17 matches will be played after the resumption, with two double-headers, both of which will be played on Sundays.
... (rest of the document content) ...
        

Implementing the Stuffing Logic in Spring Boot (Kotlin)

Now, let's see how to tie this all together in a Spring Boot controller using Kotlin.

src/main/kotlin/com/swapnil/spring_boot_ai/stuffPrompt/OlympicController.kt

package com.swapnil.spring_boot_ai.stuffPrompt

import org.slf4j.LoggerFactory
import org.springframework.ai.chat.client.ChatClient
import org.springframework.ai.chat.client.ChatClient.PromptUserSpec
import org.springframework.beans.factory.annotation.Value
import org.springframework.core.io.Resource
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.web.bind.annotation.RequestMapping
import org.springframework.web.bind.annotation.RequestParam
import org.springframework.web.bind.annotation.RestController
import java.nio.charset.Charset

@RestController
@RequestMapping("stuff/")
class OlympicController(builder: ChatClient.Builder) {

    val log: org.slf4j.Logger? = LoggerFactory.getLogger(OlympicController::class.java)

    private val chatClient: ChatClient = builder.build()

    // Load the prompt template
    @Value("classpath:/prompts/promptToStuff.st")
    lateinit var promptToStuff: Resource

    // Load the context document
    @Value("classpath:/docs/Ipl2025.txt")
    private lateinit var stuffing: Resource

    @GetMapping("ipl2025")
    fun get(
        @RequestParam(
            value = "message",
            defaultValue = "Why IPL was stopped in 2025?"
        ) message: String,
        @RequestParam(value = "isStuffingEnabled", defaultValue = "false") isStuffingEnabled: Boolean
    ): String {

        // Read the content of our context document
        val contextDocumentContent: String = stuffing.getContentAsString(Charset.defaultCharset())
        log?.info("Context Document Loaded. Length: {}", contextDocumentContent.length)

        // Use ChatClient to build and send the prompt
        return chatClient.prompt()
            .user { userSpec: PromptUserSpec ->
                userSpec.text(promptToStuff) // Our template resource
                userSpec.param("question", message)
                // Conditionally add the context
                userSpec.param("context", if (isStuffingEnabled) contextDocumentContent else "")
            }
            .call()
            .content() ?: "Error: Could not get response from LLM!"
    }
}
        

Key Parts Explained:

  • ChatClient.Builder and ChatClient: Spring AI provides ChatClient as a fluent API to interact with LLMs. It's injected and built in the constructor.
  • @Value annotation: We use this to inject our promptToStuff.st template and Ipl2025.txt context document as Spring Resource objects.
  • Reading Context: stuffing.getContentAsString(Charset.defaultCharset()) reads the entire content of our Ipl2025.txt file.
  • Dynamic Prompting:
    • chatClient.prompt().user { ... } starts building the user message.
    • userSpec.text(promptToStuff) sets the base prompt template.
    • userSpec.param("question", message) injects the user's actual question into the {question} placeholder.
    • userSpec.param("context", if (isStuffingEnabled) contextDocumentContent else "") is the crucial part. If isStuffingEnabled is true, it injects the content of Ipl2025.txt into the {context} placeholder. Otherwise, it injects an empty string.
  • .call().content(): This sends the constructed prompt to the LLM and retrieves the response content.

Seeing it in Action!

Let's test our endpoint. You can use tools like curl, Postman, or even your browser.

Consider the question: "Why IPL was stopped in 2025?"

Scenario 1: Stuffing Disabled (isStuffingEnabled=false)

Request URL: http://localhost:8082/stuff/ipl2025?message=Why%20IPL%20was%20stopped%20in%202025%3F&isStuffingEnabled=false

Since we are not providing any context, and the LLM (e.g., GPT-4o-mini) doesn't know about IPL 2025 suspension from its general training, it will likely respond based on the instruction in our prompt template:


I'm sorry but I don't know the answer to that.
        
Response with prompt stuffing disabled

Expected response when prompt stuffing is disabled.

Scenario 2: Stuffing Enabled (isStuffingEnabled=true)

Request URL: http://localhost:8082/stuff/ipl2025?message=Why%20IPL%20was%20stopped%20in%202025%3F&isStuffingEnabled=true

Now, the content of Ipl2025.txt is "stuffed" into the prompt. The LLM uses this provided context to answer.

Expected Response (based on the provided Ipl2025.txt):


The remainder of the tournament, which was suspended on May 9 for a week due to cross-border tensions between India and Pakistan, will be played at six venues: Bengaluru, Jaipur, Delhi, Lucknow, Mumbai and Ahmedabad.
        

Or a more direct answer like:


IPL 2025 was suspended on May 9 for a week due to cross-border tensions between India and Pakistan.
        
Response with prompt stuffing enabled

Expected response when prompt stuffing is enabled, using the provided context.

Here's an example of how you might make these requests using curl (as shown in your request.http file):


# Request with stuffing disabled
curl -L -X GET 'http://127.0.0.1:8082/stuff/ipl2025?message=Why%20IPL%20was%20stopped%20in%202025%3F&isStuffingEnabled=false'

# Request with stuffing enabled
curl -L -X GET 'http://127.0.0.1:8082/stuff/ipl2025?message=Why%20IPL%20was%20stopped%20in%202025%3F&isStuffingEnabled=true'
        

Making requests to the API endpoint using curl.

Benefits of This Approach

  • Improved Accuracy: LLMs can answer questions based on specific, up-to-date, or private information you provide.
  • Reduced Hallucinations: By grounding the LLM in provided text, you lessen the chance of it inventing facts.
  • Contextual Control: You decide what information the LLM should consider for a particular query.
  • Simplicity: Spring AI makes it relatively straightforward to implement this pattern.

Conclusion

Prompt stuffing is a powerful yet simple technique to significantly enhance the quality and relevance of LLM responses. By leveraging Spring Boot and Spring AI, you can easily integrate this capability into your Java or Kotlin applications, allowing you to build more intelligent and context-aware AI-powered features.

This example focused on a single document, but you can extend this concept to more sophisticated RAG pipelines where relevant document chunks are dynamically retrieved from a vector database based on the user's query before being "stuffed" into the prompt. Spring AI also offers support for these more advanced scenarios.

Happy coding, and I hope this helps you build amazing AI applications!

Friday, February 14, 2025

Building a Retrieval-Augmented Generation (RAG) Application with Ollama 3.2 and Spring Boot

Building a RAG Application with Ollama 3.2 and Spring Boot

This blog post demonstrates how to build a Retrieval-Augmented Generation (RAG) application using Ollama 3.2 for large language models (LLMs) and Spring Boot for creating REST APIs. RAG combines information retrieval with LLMs to provide more accurate and contextually relevant answers. We'll leverage Docker Desktop for containerization and pgvector for vector storage.

Project Setup

We'll use Spring Boot version 3.3.7 for this project. Here's a breakdown of the key components and configurations:

1. Dependencies (Gradle):

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-jdbc'
    implementation 'org.springframework.boot:spring-boot-starter-web'
    implementation 'com.fasterxml.jackson.module:jackson-module-kotlin'
    implementation 'org.springframework.ai:spring-ai-ollama-spring-boot-starter'
    implementation 'org.springframework.ai:spring-ai-pgvector-store-spring-boot-starter'
}

This includes the necessary Spring Boot starters, Jackson for Kotlin support, and the Spring AI libraries for Ollama and pgvector integration.

2. application.properties:

spring.application.name=spring-boot-ai
server.port=8082

spring.ai.ollama.embedding.model=mxbai-embed-large
spring.ai.ollama.chat.model=llama3.2

spring.datasource.url=jdbc:postgresql://localhost:5432/sbdocs
spring.datasource.username=admin
spring.datasource.password=password

spring.ai.vectorstore.pgvector.initialize-schema=true
spring.vectorstore.pgvector=
spring.vectorstore.index-type=HNSW
spring.vectorstore.distance-type=COSINE_DISTANCE
spring.vectorstore.dimension=1024
spring.ai.vectorstore.pgvector.dimensions=1024

spring.docker.compose.lifecycle-management=start_only

This configuration sets the application name, port, Ollama model names, database connection details, and pgvector settings. Critically, spring.docker.compose.lifecycle-management=start_only allows Spring Boot to manage the Docker Compose lifecycle.

3. RagConfiguration.kt:


@Configuration
open class RagConfiguration {

    @Value("myDataVector.json")
    lateinit var myDataVectorName: String

    @Value("classpath:/docs/myData.txt")
    lateinit var originalArtical: Resource

    @Bean
    open fun getVector(embeddingModel: EmbeddingModel): SimpleVectorStore {
        val simpleVectorStore = SimpleVectorStore(embeddingModel)
        val vectorStoreFile = getVectorStoreFile()
        if (vectorStoreFile.exists()) {
            simpleVectorStore.load(vectorStoreFile)
        } else {
            val textReader = TextReader(originalArtical)
            textReader.customMetadata["filename"] = "myData.txt"
            val documents = textReader.get()
            val splitDocs = TokenTextSplitter()
                .split(documents)
            simpleVectorStore.add(splitDocs)
            simpleVectorStore.save(vectorStoreFile)
        }
        return simpleVectorStore
    }

    private fun getVectorStoreFile(): File {
        val path = Path("src", "main", "resources", "docs", myDataVectorName)
        return path.toFile()
    }
}
    
This configuration class creates a SimpleVectorStore bean. It loads existing vector data from database or generates it by reading the myData.txt file, splitting it into chunks, and embedding them using the specified embedding model.

4. RagController.kt:


@RestController
@RequestMapping("/rag")
class RagController(val chatClient: ChatClient, val vectorStore: SimpleVectorStore) {

    @Value("classpath:/prompts/ragPrompt.st")
    lateinit var ragPrompt: Resource

    @GetMapping("question")
    fun getAnswer(@RequestParam(name = "question", defaultValue = "What is the latest news about Olympics?") question: String): String? {

        return chatClient.prompt()
            .advisors(QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
            .user(question)
            .call()
            .content()
    }
}    
    
This controller defines a /rag/question endpoint that takes a question as a parameter. It uses the ChatClient and QuestionAnswerAdvisor to query the Ollama model, retrieving relevant context from the vectorStore and generating an answer.

Running the Application with Docker

1. Start pgvector Docker Container:

docker run --name pgvector-container -e POSTGRES_USER=admin -e POSTGRES_PASSWORD=password -e POSTGRES_DB=sbdocs -d -p 5432:5432 pgvector/pgvector:0.8.0-pg1

2. Pull Ollama Models:

Open a terminal in your Docker Desktop, exec of the springboot-ai-ollama-1 container and run:

ollama pull llama3.2
ollama pull mxbai-embed-large

3. Run the Spring Boot Application:

Start your Spring Boot application. Because of the spring.docker.compose.lifecycle-management property, Spring Boot will manage the Docker Compose file.

4. Access the API:

You can now access the RAG API at http://localhost:8082/rag/question?question=Your question here.

This setup provides a robust and scalable way to use Ollama 3.2 for RAG applications. The use of Docker and Spring Boot simplifies deployment and management. Remember to replace placeholder values like database credentials and file paths with your actual values. This example provides a foundation that you can extend to build more complex RAG applications.

Tuesday, December 31, 2024

Securing Microservices with JWT Authentication and Data Encryption

Securing Microservices with JWT Authentication and Data Encryption

Securing Microservices with JWT Authentication and Data Encryption

In modern microservices architectures, securing communication and data integrity are paramount. This article explores how JWT (JSON Web Token) authentication and data encryption can bolster security, ensuring that data exchanges between services remain confidential and trusted.

What is JWT Authentication?

JWT is a compact, URL-safe token format that securely transmits information between parties as a JSON object. It is widely used in microservices for its simplicity and efficiency.

Parts of a JWT Token

A JSON Web Token (JWT) consists of three parts, separated by periods (.):

  • Header: Specifies the token type (JWT) and signing algorithm (e.g., HS256 or RS256).
  • Example: { "alg": "HS256", "typ": "JWT" }
  • Payload: Contains claims about the user or the token itself. Claims can be:
    • Registered claims: Predefined fields like iss (issuer), sub (subject), exp (expiration time), etc.
    • Public claims: Custom claims, such as user roles or permissions.
    • Private claims: Claims specific to the application, like user IDs.
    Example: { "sub": "1234567890", "name": "John Doe", "admin": true, "iat": 1516239022 }
  • Signature: Ensures the token's integrity and authenticity. It is generated by signing the encoded header and payload with a secret or private key.
    Example for HMAC-SHA256:
    HMACSHA256( base64UrlEncode(header) + "." + base64UrlEncode(payload), secret )
A full JWT might look like this: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Shared Key vs. Public Key JWT in Microservices

Shared Key-Based JWT:

  1. How It Works:
    • A single secret key is used for both signing and verifying the token.
    • This secret must be shared between the microservices.
  2. Advantages:
    • Simple setup.
    • Suitable for small-scale systems with fewer services.
  3. Disadvantages:
    • Security Risk: If the key is compromised, all services relying on it are at risk.
    • Key Distribution: Sharing the key securely across multiple services can be challenging.

Public Key-Based JWT in Microservice

  1. How It Works:
    • The authentication server uses a private key to sign the JWT.
    • Microservices use a public key to verify the token's signature.
  2. Advantages:
    • Better Security: The private key remains on the authentication server, and only the public key is distributed.
    • Scalability: New services can independently verify tokens without needing access to the private key.
    • No Shared Secrets: Eliminates the need to distribute a secret key.
  3. Disadvantages:
    • Slightly more complex setup due to key management.
    • Requires a system to distribute the public key, like a JWKS (JSON Web Key Set) endpoint.
    • No Shared Secrets: Eliminates the need to distribute a secret key.

Data Encryption in Microservices

Encryption ensures sensitive data remains confidential and secure during transmission and storage.

Types of Encryption

  • Symmetric Encryption: Uses the same key for encryption and decryption.
  • Asymmetric Encryption: Utilizes a public key for encryption and a private key for decryption.

Encryption in Microservices Communication

  • Transport-Level Encryption: Secures data in transit using TLS (HTTPS).
  • Message-Level Encryption: Encrypts specific message payloads for added confidentiality.

Combining JWT and Encryption

  • Token Encryption: Adds a layer of security to JWTs by making intercepted tokens unreadable.
  • Public Key Infrastructure: Manages keys securely for token validation and encrypted communication.

Best Practices

  • Set reasonable expiration times for tokens and use refresh tokens for longer sessions.
  • Rotate encryption keys periodically to minimize security risks.
  • Audit and log token usage to detect anomalies.

Conclusion

JWT authentication and encryption are foundational to building secure microservices. By combining these technologies, you can ensure robust authentication, data confidentiality, and integrity across your system. Follow best practices to simplify implementation and focus on delivering high-quality services.

Automate Library Integration with Cursor's Agent Mode

Automate Android Library Integration with Cursor's Agent Mode Automate Android Library Integration with Cursor...