When you enter a question in the front end, a POST
request to the /api/chat endpoint is sent. The body of the request must include the question from the user, in the following format:
{
"question": "the question goes here"
}
The response from the application is an event stream, as defined in the Server-Sent Events (SSE) specification. The events that the server returns to the client have the following sequence:
data: [SESSION_ID] session-id-assigned-to-this-chat-session
data: [SOURCE] json-formatted-document
(repeated for each relevant document source that was identified)data: response chunk
(repeated for each response chunk returned by the LLM)data: [DONE]
The client can choose to ask a follow-up question by adding a session_id
query string argument to the request URL.
The high-level logic for the chatbot endpoint is in the api_chat()
function of the Flask application, in file api/app.py:
@app.route("/api/chat", methods=["POST"])
def api_chat():
request_json = request.get_json()
question = request_json.get("question")
if question is None:
return jsonify({"msg": "Missing question from request JSON"}), 400
session_id = request.args.get("session_id", str(uuid4()))
return Response(ask_question(question, session_id), mimetype="text/event-stream")
The ask_question()
function in file api/chat.py is a generator function that streams the events described above using Flask's response streaming feature, which is based on the yield
keyword:
@stream_with_context
def ask_question(question, session_id):
yield f"data: {SESSION_ID_TAG} {session_id}\n\n"
# ...
yield f"data: {DONE_TAG}\n\n"
Previously
IngestNext
Retrieval Phase