Featured

Streaming Agent Responses using Bedrock Agent API

Published in

ShellKode Blog

4 min readFeb 4, 2025

AI agents are quickly replacing traditional automation solutions and chatbots. It’s safe to say that they are the next big thing in the tech landscape that organizations cannot afford to overlook. They bring a level of autonomy and operational efficiency that revolutionize the way businesses operate. While agents demonstrate remarkable capabilities in handling complex tasks, their response times in customer-facing applications can present challenges, leading to frustration and a less engaging experience for users.

A solution to this challenge is streaming responses, which allows partial results to be displayed as soon as they are available. In this blog, we’ll demonstrate how to implement a streaming solution using AWS Bedrock Agents, Boto3 API, Flask, and ReactJS. We’ll walk you through creating a Flask API to stream responses from Bedrock Agents, building a ReactJS frontend to handle the streaming API, and combining these components into a seamless real-time user experience.

Addressing Response Latency with Real-time Streaming

When invoking Bedrock Agents, one significant challenge is the response generation time, which can take up to 15–20 seconds depending on the complexity of the request. This latency can create a perception of a slow application and diminish user engagement, as waiting for the complete response delays interaction and reduces communication efficiency. To address this challenge, we implement a streaming solution that delivers responses incrementally to the frontend, allowing users to see partial outputs in real-time, enhancing responsiveness and usability. Let us see how to implement this solution.

Prerequisites:

Before we dive into the implementation, make sure you have:

Python 3.10 or higher installed
Node.js and npm installed
Any IDE as per your need
AWS credentials configured on your machine
Basic understanding of Flask and React

Backend: Flask API with Streaming

First, let’s set up our Python environment and install the necessary packages.

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate

Install the required Python packages:

pip install flask boto3 #And other packages to run your application

The backend will use Flask to set up an API endpoint. It streams the agent’s response using AWS Bedrock’s invoke_agent method. Here’s a simplified implementation:


from flask import Flask, Response, request
import boto3
import os
import json

app = Flask(__name__)

# Initialize Bedrock client
bedrock_client = boto3.client('bedrock-runtime', region_name='us-east-1')

@app.route('/api/stream-agent-response', methods=['POST'])
def stream_agent_response():
    user_input = request.json.get('input')
    session_id = request.json.get('sessionId')
    agent_id = os.getenv('AGENT_ID')

    def generate():
        try:
            response = bedrock_client.invoke_agent(
                agentAliasId=os.getenv('ALIAS_ID'),
                agentId=agent_id,
                enableTrace=False,
                endSession=False,
                inputText=user_input,
                sessionId=session_id,
                streamingConfigurations={'streamFinalResponse': True}
            )

if response.get('completion'):
   for event_chunk in response['completion']:
       if 'chunk' in event_chunk and 'bytes' in event_chunk['chunk']:
           chunk_text = event_chunk['chunk']['bytes'].decode('utf-8')

           yield f"data: {json.dumps({'response': chunk_text, 
                                      'sessionId': session_id})}\n\n"

        except Exception as e:
            error_response = json.dumps({'error': str(e)})
            yield f"data: {error_response}\n\n"

    return Response(generate(), mimetype='text/event-stream')


if __name__ == '__main__':
    app.run(debug=True)

Key Points

Generator Function: Streams chunks of data as they are received from Bedrock
Error Handling: Captures exceptions and streams error messages to the client
Session Management: Supports unique sessions for different users or conversations

Frontend: ReactJS with Streaming API

The frontend will consume the streaming API using fetch and display the incremental responses in real time. Here’s an implementation snippet:

const handleSendMessage = async (query, languageCode) => 


 const request = {
     input: query,
     sessionId: sessionId,
 };

  try {
     const response = await  
     fetch(`${backendURL}api/stream-agent-response`, {
         method: "POST", body: JSON.stringify(request),});

     const reader = response.body.getReader();
     const decoder = new TextDecoder();
     let ongoingBotResponse = "";

     while (true) {
         const { done, value } = await reader.read();
         if (done) break;


         const chunk = decoder.decode(value, { stream: true });
         const messages = chunk.split("\n\n");


         for (const message of messages) {
             if (message.trim().startsWith("data: ")) {
                 try {
                     const jsonStr = message.trim().slice(6);
                     const data = JSON.parse(jsonStr);


                     if (data.response) {
                         ongoingBotResponse += data.response;
                         setSessionConversation((prev) => [
                             ...prev,
                     { type: "AGENT", body: ongoingBotResponse },]);
                     }
                 } catch (e) {
                     console.warn("Error parsing chunk:", e);
                 }
             }
         }
     }
 } catch (error) {
     console.error("Streaming error:", error);
 }
};

Key Points

Streaming Responses: Processes chunks of data in real-time
Error Handling: Logs errors and avoids breaking the UI
State Management: Updates the conversation dynamically

End-to-End Workflow

Here’s how the components work together:

User Interaction: The user submits a query via the ReactJS frontend
API Call: The frontend sends the query to the Flask backend
Streaming Response: Flask streams partial responses from Bedrock to the frontend
Real-Time Updates: ReactJS updates the UI with each chunk of data

Conclusion

Given their indispensability, agents are here to stay in customer-facing applications, not only for their conversational abilities but also for their adaptability, automation, and contextual understanding.

Minimizing response delays is crucial for delivering a frictionless user experience. As explored in this blog, streaming agent responses can significantly enhance engagement by reducing perceived latency from 15–20 seconds to just 5–6 seconds. This makes applications feel faster, more responsive, and intuitive.

By leveraging the power of AWS Bedrock, Flask, and React, you can build highly efficient, real-time AI-driven applications that provide seamless, intelligent, and instant assistance — ensuring users stay engaged and satisfied.

Author

Sai Chandan — www.linkedin.com/in/sai-chandan

Contributor

Bakrudeen — https://www.linkedin.com/in/bakrudeen-k-6790219b/