It's that typewriter effect that makes the user feel that the answer will appear immediately after he/she asks the question. Although the whole answer will take several minutes to come out completely, the psychological feeling will be different.
It is a little bit tricky to implement on Streamlit. Usually Streamlit displays a complete output, as
import streamlit as st
full_string="hello world"
st.markdown(full_string)
st.markdown("other content")
result:
hello world
other content
If you want to make "hello world" appear letter by letter without disrupting the content below, the method should be as follows:
- First, set a placeholder with st.empty()
- Then continuously output gradually lengthening characters from this placeholder.
import streamlit as st
full_string="hello world"
display_string=""
show_string=st.empty()
st.markdown("other content")
for c in full_string:
display_string+=c
show_string.markdown(diskplay_string)
Now you can see the typewriter effect.
Now let's consider the call of GPT reply using the encapsulated langchain. A technique called callbacks is used in langchain. When GPT is set to stream response, a callback will be invoked every time a character is returned. To display the streaming response in Streamlit, we need to:
- Set an st.empty() placeholder.
- Write a callback that continuously outputs characters that gradually increase.
class StreamDisplayHandler(BaseCallbackHandler):
def __init__(self, container, initial_text="", display_method='markdown'):
self.container = container
self.text = initial_text
self.display_method = display_method
def on_llm_new_token(self, token: str, **kwargs) -> None:
self.text += token
display_function = getattr(self.container, self.display_method, None)
if display_function is not None:
display_function(self.text)
else:
raise ValueError(f"Invalid display_method: {self.display_method}")
def on_llm_end(self, response, **kwargs) -> None:
self.text=""
During initialization, this callback should pass the placeholder container to the StreamDisplayHandler. All other actions are the same as in the simple typewriter example above.
Here's a little trick: display_function = getattr(self.container, self.display_method, None)
. The meaning of this sentence is that I'm not sure which way I want to display text using streamlit. It could be st.markdown or something else like st.write. So I can specify it with a string and then check if there is such a method for displaying text in st. If there is, it will be displayed; otherwise an error will occur.
How to call this callback:
chat_box = st.empty()
display_handler = StreamDisplayHandler(
chat_box,
display_method='write')
chat = ChatOpenAI(
max_tokens=100, streaming=True,
callbacks=display_handler)
Furthermore, since GPT is called every time a character is replied, it can be used to handle some "real-time" things, such as almost synchronous speech. Of course, if the speech is spoken word by word, it will not sound good. Therefore, it can be accumulated into a sentence and broadcasted at one time later. The method is basically the same as above. For detailed code please refer to gist