Featured
Streaming Agent Responses using Bedrock Agent API

AI agents are quickly replacing traditional automation solutions and chatbots. It’s safe to say that they are the next big thing in the tech landscape that organizations cannot afford to overlook. They bring a level of autonomy and operational efficiency that revolutionize the way businesses operate. While agents demonstrate remarkable capabilities in handling complex tasks, their response times in customer-facing applications can present challenges, leading to frustration and a less engaging experience for users.
A solution to this challenge is streaming responses, which allows partial results to be displayed as soon as they are available. In this blog, we’ll demonstrate how to implement a streaming solution using AWS Bedrock Agents, Boto3 API, Flask, and ReactJS. We’ll walk you through creating a Flask API to stream responses from Bedrock Agents, building a ReactJS frontend to handle the streaming API, and combining these components into a seamless real-time user experience.
Addressing Response Latency with Real-time Streaming
When invoking Bedrock Agents, one significant challenge is the response generation time, which can take up to 15–20 seconds depending on the complexity of the request. This latency can create a perception of a slow application and diminish user engagement, as waiting for the complete response delays interaction and reduces communication efficiency. To address this challenge, we implement a streaming solution that delivers responses incrementally to the frontend, allowing users to see partial outputs in real-time, enhancing responsiveness and usability. Let us see how to implement this solution.
Prerequisites:
Before we dive into the implementation, make sure you have:
- Python 3.10 or higher installed
- Node.js and npm installed
- Any IDE as per your need
- AWS credentials configured on your machine
- Basic understanding of Flask and React
Backend: Flask API with Streaming
First, let’s set up our Python environment and install the necessary packages.
- Create a virtual environment and activate it:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install the required Python packages:
pip install flask boto3 #And other packages to run your application
- The backend will use Flask to set up an API endpoint. It streams the agent’s response using AWS Bedrock’s invoke_agent method. Here’s a simplified implementation:
from flask import Flask, Response, request
import boto3
import os
import json
app = Flask(__name__)
# Initialize Bedrock client
bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')
@app.route('/api/stream-agent-response', methods=['POST'])
def stream_agent_response():
user_input = request.json.get('input')
session_id = request.json.get('sessionId')
agent_id = os.getenv('AGENT_ID')
def generate():
try:
response = bedrock_client.invoke_agent(
agentAliasId=os.getenv('ALIAS_ID'),
agentId=agent_id,
enableTrace=False,
endSession=False,
inputText=user_input,
sessionId=session_id,
streamingConfigurations={'streamFinalResponse': True}
)
if response.get('completion'):
for event_chunk in response['completion']:
if 'chunk' in event_chunk and 'bytes' in event_chunk['chunk']:
chunk_text = event_chunk['chunk']['bytes'].decode('utf-8')
yield f"data: {json.dumps({'response': chunk_text,
'sessionId': session_id})}\n\n"
except Exception as e:
error_response = json.dumps({'error': str(e)})
yield f"data: {error_response}\n\n"
return Response(generate(), mimetype='text/event-stream')
if __name__ == '__main__':
app.run(debug=True)
Key Points
- Generator Function: Streams chunks of data as they are received from Bedrock
- Error Handling: Captures exceptions and streams error messages to the client
- Session Management: Supports unique sessions for different users or conversations
Frontend: ReactJS with Streaming API
The frontend will consume the streaming API using fetch and display the incremental responses in real time. Here’s an implementation snippet:
const handleSendMessage = async (query, languageCode) =>
const request = {
input: query,
sessionId: sessionId,
};
try {
const response = await
fetch(`${backendURL}api/stream-agent-response`, {
method: "POST", body: JSON.stringify(request),});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let ongoingBotResponse = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
const messages = chunk.split("\n\n");
for (const message of messages) {
if (message.trim().startsWith("data: ")) {
try {
const jsonStr = message.trim().slice(6);
const data = JSON.parse(jsonStr);
if (data.response) {
ongoingBotResponse += data.response;
setSessionConversation((prev) => [
...prev,
{ type: "AGENT", body: ongoingBotResponse },]);
}
} catch (e) {
console.warn("Error parsing chunk:", e);
}
}
}
}
} catch (error) {
console.error("Streaming error:", error);
}
};
Key Points
- Streaming Responses: Processes chunks of data in real-time
- Error Handling: Logs errors and avoids breaking the UI
- State Management: Updates the conversation dynamically
End-to-End Workflow
Here’s how the components work together:
- User Interaction: The user submits a query via the ReactJS frontend
- API Call: The frontend sends the query to the Flask backend
- Streaming Response: Flask streams partial responses from Bedrock to the frontend
- Real-Time Updates: ReactJS updates the UI with each chunk of data
Conclusion
Given their indispensability, agents are here to stay in customer-facing applications, not only for their conversational abilities but also for their adaptability, automation, and contextual understanding.
Minimizing response delays is crucial for delivering a frictionless user experience. As explored in this blog, streaming agent responses can significantly enhance engagement by reducing perceived latency from 15–20 seconds to just 5–6 seconds. This makes applications feel faster, more responsive, and intuitive.
By leveraging the power of AWS Bedrock, Flask, and React, you can build highly efficient, real-time AI-driven applications that provide seamless, intelligent, and instant assistance — ensuring users stay engaged and satisfied.
Author
Sai Chandan — www.linkedin.com/in/sai-chandan
Contributor
Bakrudeen — https://www.linkedin.com/in/bakrudeen-k-6790219b/
