It's that typewriter effect that makes the user feel that the answer will appear immediately after he/she asks the question. Although the whole answer will take several minutes to come out completely, the psychological feeling will be different.

It is a little bit tricky to implement on Streamlit. Usually Streamlit displays a complete output, as

import streamlit as st
full_string="hello world"
st.markdown(full_string)
st.markdown("other content")

result:

hello world
other content

If you want to make "hello world" appear letter by letter without disrupting the content below, the method should be as follows:

  • First, set a placeholder with st.empty()
  • Then continuously output gradually lengthening characters from this placeholder.
import streamlit as st
full_string="hello world"
display_string=""
show_string=st.empty()
st.markdown("other content")	
for c in full_string:
	display_string+=c
	show_string.markdown(diskplay_string)

Now you can see the typewriter effect.

Now let's consider the call of GPT reply using the encapsulated langchain. A technique called callbacks is used in langchain. When GPT is set to stream response, a callback will be invoked every time a character is returned. To display the streaming response in Streamlit, we need to:

  • Set an st.empty() placeholder.
  • Write a callback that continuously outputs characters that gradually increase.
class StreamDisplayHandler(BaseCallbackHandler):
    def __init__(self, container, initial_text="", display_method='markdown'):
        self.container = container
        self.text = initial_text
        self.display_method = display_method

    def on_llm_new_token(self, token: str, **kwargs) -> None:
        self.text += token

        display_function = getattr(self.container, self.display_method, None)
        if display_function is not None:
            display_function(self.text)
        else:
            raise ValueError(f"Invalid display_method: {self.display_method}")

    def on_llm_end(self, response, **kwargs) -> None:
        self.text=""

During initialization, this callback should pass the placeholder container to the StreamDisplayHandler. All other actions are the same as in the simple typewriter example above.

Here's a little trick: display_function = getattr(self.container, self.display_method, None). The meaning of this sentence is that I'm not sure which way I want to display text using streamlit. It could be st.markdown or something else like st.write. So I can specify it with a string and then check if there is such a method for displaying text in st. If there is, it will be displayed; otherwise an error will occur.

How to call this callback:

chat_box = st.empty()
display_handler = StreamDisplayHandler(
    chat_box,
    display_method='write')
chat = ChatOpenAI(
    max_tokens=100, streaming=True,
    callbacks=display_handler)

Furthermore, since GPT is called every time a character is replied, it can be used to handle some "real-time" things, such as almost synchronous speech. Of course, if the speech is spoken word by word, it will not sound good. Therefore, it can be accumulated into a sentence and broadcasted at one time later. The method is basically the same as above. For detailed code please refer to gist