Large Language Models (LLMs) like OpenAI's GPT series are incredibly powerful, but they sometimes need a little help to provide the most accurate or context-specific answers. One common challenge is their knowledge cut-off date or their lack of access to your private, domain-specific data. This is where "prompt stuffing" (a basic form of Retrieval Augmented Generation or RAG) comes into play.
In this post, we'll explore how you can use Spring Boot with Spring AI to "stuff" relevant context into your prompts, guiding the LLM to generate more informed and precise responses. We'll use a practical example involving fetching information about a hypothetical IPL 2025 schedule.
What is Prompt Stuffing?
Prompt stuffing, in simple terms, means providing the LLM with relevant information or context directly within the prompt you send it. Instead of just asking a question, you give the LLM a chunk of text (the "stuffing" or "context") and then ask your question based on that text. This helps the LLM focus its answer on the provided information, rather than relying solely on its pre-trained knowledge.
This technique is particularly useful when:
- You need answers based on very recent information not yet in the LLM's training data.
- You're dealing with private or proprietary documents.
- You want to reduce hallucinations and ensure answers are grounded in specific facts.
Setting Up Our Spring Boot Project
First, let's look at the essential dependencies and configuration for our Spring Boot application.
Dependencies (build.gradle)
We'll need Spring Web, Spring AI, and the Spring AI OpenAI starter. Here's a snippet from our build.gradle
(Kotlin DSL):
plugins {
// ... other plugins
id 'org.springframework.boot' version '3.3.7'
id 'io.spring.dependency-management' version '1.1.7'
id 'org.jetbrains.kotlin.jvm' version '1.9.25' // Or your Kotlin version
}
// ... group, version, java toolchain
ext {
set('springAiVersion', "1.0.0-M4") // Use the latest stable/milestone Spring AI version
}
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'org.jetbrains.kotlin:kotlin-reflect'
implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
// ... other dependencies like jackson-module-kotlin, lombok
}
dependencyManagement {
imports {
mavenBom "org.springframework.ai:spring-ai-bom:${springAiVersion}"
}
}
Configuration (application.properties)
Next, configure your OpenAI API key and desired model in src/main/resources/application.properties
. Remember to keep your API key secure and never commit it to public repositories!
spring.application.name=spring-boot-ai
server.port=8082
# Replace with your actual OpenAI API Key or use environment variables
spring.ai.openai.api-key=YOUR_OPENAI_API_KEY_PLACEHOLDER
spring.ai.openai.chat.options.model=gpt-4o-mini # Or your preferred model
Using gpt-4o-mini
is a good balance for cost and capability for many tasks, but you can choose other models like gpt-3.5-turbo
or gpt-4
depending on your needs.
The Core: Prompt Template and Context Document
The magic of prompt stuffing lies in how we structure our prompt and the context we provide.
1. The Prompt Template (promptToStuff.st
)
We use a prompt template to structure our request to the LLM. This template will have placeholders for the context we want to "stuff" and the actual user question.
src/main/resources/prompts/promptToStuff.st
Use the following pieces of context to answer the question at the end. If you don't know the answer just say "I'm sorry but I don't know the answer to that".
{context}
Question: {question}
Here, {context}
will be replaced by the content of our document, and {question}
will be the user's query.
2. The Context Document (Ipl2025.txt
)
This is a simple text file containing the information we want the LLM to use. For our example, it's about IPL 2025 schedule.
src/main/resources/docs/Ipl2025.txt
IPL 2025 will resume on May 17 and end on June 3, as per the revised schedule announced by the BCCI on Monday night.
The remainder of the tournament, which was suspended on May 9 for a week due to cross-border tensions between India and Pakistan, will be played at six venues: Bengaluru, Jaipur, Delhi, Lucknow, Mumbai and Ahmedabad.
The venues for the playoffs will be announced later, but the matches will be played on the following dates: Qualifier 1 on May 29, the Eliminator on May 30, Qualifier 2 on June 1 and the final on June 3. A total of 17 matches will be played after the resumption, with two double-headers, both of which will be played on Sundays.
... (rest of the document content) ...
Implementing the Stuffing Logic in Spring Boot (Kotlin)
Now, let's see how to tie this all together in a Spring Boot controller using Kotlin.
src/main/kotlin/com/swapnil/spring_boot_ai/stuffPrompt/OlympicController.kt
package com.swapnil.spring_boot_ai.stuffPrompt
import org.slf4j.LoggerFactory
import org.springframework.ai.chat.client.ChatClient
import org.springframework.ai.chat.client.ChatClient.PromptUserSpec
import org.springframework.beans.factory.annotation.Value
import org.springframework.core.io.Resource
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.web.bind.annotation.RequestMapping
import org.springframework.web.bind.annotation.RequestParam
import org.springframework.web.bind.annotation.RestController
import java.nio.charset.Charset
@RestController
@RequestMapping("stuff/")
class OlympicController(builder: ChatClient.Builder) {
val log: org.slf4j.Logger? = LoggerFactory.getLogger(OlympicController::class.java)
private val chatClient: ChatClient = builder.build()
// Load the prompt template
@Value("classpath:/prompts/promptToStuff.st")
lateinit var promptToStuff: Resource
// Load the context document
@Value("classpath:/docs/Ipl2025.txt")
private lateinit var stuffing: Resource
@GetMapping("ipl2025")
fun get(
@RequestParam(
value = "message",
defaultValue = "Why IPL was stopped in 2025?"
) message: String,
@RequestParam(value = "isStuffingEnabled", defaultValue = "false") isStuffingEnabled: Boolean
): String {
// Read the content of our context document
val contextDocumentContent: String = stuffing.getContentAsString(Charset.defaultCharset())
log?.info("Context Document Loaded. Length: {}", contextDocumentContent.length)
// Use ChatClient to build and send the prompt
return chatClient.prompt()
.user { userSpec: PromptUserSpec ->
userSpec.text(promptToStuff) // Our template resource
userSpec.param("question", message)
// Conditionally add the context
userSpec.param("context", if (isStuffingEnabled) contextDocumentContent else "")
}
.call()
.content() ?: "Error: Could not get response from LLM!"
}
}
Key Parts Explained:
ChatClient.Builder
andChatClient
: Spring AI providesChatClient
as a fluent API to interact with LLMs. It's injected and built in the constructor.@Value
annotation: We use this to inject ourpromptToStuff.st
template andIpl2025.txt
context document as SpringResource
objects.- Reading Context:
stuffing.getContentAsString(Charset.defaultCharset())
reads the entire content of ourIpl2025.txt
file. - Dynamic Prompting:
chatClient.prompt().user { ... }
starts building the user message.userSpec.text(promptToStuff)
sets the base prompt template.userSpec.param("question", message)
injects the user's actual question into the{question}
placeholder.userSpec.param("context", if (isStuffingEnabled) contextDocumentContent else "")
is the crucial part. IfisStuffingEnabled
is true, it injects the content ofIpl2025.txt
into the{context}
placeholder. Otherwise, it injects an empty string.
.call().content()
: This sends the constructed prompt to the LLM and retrieves the response content.
Seeing it in Action!
Let's test our endpoint. You can use tools like curl
, Postman, or even your browser.
Consider the question: "Why IPL was stopped in 2025?"
Scenario 1: Stuffing Disabled (isStuffingEnabled=false
)
Request URL: http://localhost:8082/stuff/ipl2025?message=Why%20IPL%20was%20stopped%20in%202025%3F&isStuffingEnabled=false
Since we are not providing any context, and the LLM (e.g., GPT-4o-mini) doesn't know about IPL 2025 suspension from its general training, it will likely respond based on the instruction in our prompt template:
I'm sorry but I don't know the answer to that.
Expected response when prompt stuffing is disabled.
Scenario 2: Stuffing Enabled (isStuffingEnabled=true
)
Request URL: http://localhost:8082/stuff/ipl2025?message=Why%20IPL%20was%20stopped%20in%202025%3F&isStuffingEnabled=true
Now, the content of Ipl2025.txt
is "stuffed" into the prompt. The LLM uses this provided context to answer.
Expected Response (based on the provided Ipl2025.txt
):
The remainder of the tournament, which was suspended on May 9 for a week due to cross-border tensions between India and Pakistan, will be played at six venues: Bengaluru, Jaipur, Delhi, Lucknow, Mumbai and Ahmedabad.
Or a more direct answer like:
IPL 2025 was suspended on May 9 for a week due to cross-border tensions between India and Pakistan.
Expected response when prompt stuffing is enabled, using the provided context.
Here's an example of how you might make these requests using curl
(as shown in your request.http
file):
# Request with stuffing disabled
curl -L -X GET 'http://127.0.0.1:8082/stuff/ipl2025?message=Why%20IPL%20was%20stopped%20in%202025%3F&isStuffingEnabled=false'
# Request with stuffing enabled
curl -L -X GET 'http://127.0.0.1:8082/stuff/ipl2025?message=Why%20IPL%20was%20stopped%20in%202025%3F&isStuffingEnabled=true'
Making requests to the API endpoint using curl.
Benefits of This Approach
- Improved Accuracy: LLMs can answer questions based on specific, up-to-date, or private information you provide.
- Reduced Hallucinations: By grounding the LLM in provided text, you lessen the chance of it inventing facts.
- Contextual Control: You decide what information the LLM should consider for a particular query.
- Simplicity: Spring AI makes it relatively straightforward to implement this pattern.
Conclusion
Prompt stuffing is a powerful yet simple technique to significantly enhance the quality and relevance of LLM responses. By leveraging Spring Boot and Spring AI, you can easily integrate this capability into your Java or Kotlin applications, allowing you to build more intelligent and context-aware AI-powered features.
This example focused on a single document, but you can extend this concept to more sophisticated RAG pipelines where relevant document chunks are dynamically retrieved from a vector database based on the user's query before being "stuffed" into the prompt. Spring AI also offers support for these more advanced scenarios.
Happy coding, and I hope this helps you build amazing AI applications!
No comments:
Post a Comment