Post

Retrieval-Augmented Generation (RAG) ChatBot

Result

Backend

  • designed by Flask
    • contains two URLs: “/setup” and “/chat”
    • “/setup” is used for init connection with HuggingFace/OpenAI and Pinecone
    • “/chat” is used for processing message from client
  • chat service
    • Using LLM model ‘google/flan-t5-xxl’ and embedding model ‘intfloat/e5-small-v2’ from HuggingFace
      1
      2
      3
      4
      5
      6
      
        {
        "embedding": HuggingFaceInstructEmbeddings(model_name="intfloat/e5-small-v2"),
        "dimension": 384,
        "metric": "cosine",
        "chat": HuggingFaceHub(repo_id="google/flan-t5-xxl", model_kwargs={"temperature":0.7, "max_length":512})
        }
      
    • Option: LLM model ChatGPT and embedding model ‘text-embedding-ada-002’ from OpenAI (will get better result)
      1
      2
      3
      4
      5
      6
      7
      8
      9
      
        {
        "embedding": OpenAIEmbeddings(model="text-embedding-ada-002"),
        "dimension": 1536,
        "metric": "cosine",
        "chat": ChatOpenAI(
                    openai_api_key=os.environ["OPENAI_API_KEY"],
                    model='gpt-3.5-turbo'
                )
        }
      
    • Use vectordatabase Pinecone to ensure persistence, avoid extract documents every time
      1
      2
      3
      4
      5
      
        import pinecone
        pinecone.init(
        api_key=os.getenv('PINECONE_API_KEY'),
        environment=os.getenv('PINECONE_ENVIRONMENT')
        )
      
    • Use Spacy to extract keywords from user question to get a better query result from Pinecone
      1
      2
      3
      4
      5
      6
      
        import spacy
        # extract keywords from question, en-core-web-trf is focused on accuracy
        nlp = spacy.load("en_core_web_trf")
        doc = nlp(query)
        # Extract keywords based on POS tags (e.g., NOUN, PROPN for proper nouns, VERB)
        keywords = [token.text for token in doc if token.pos_ in ('NOUN', 'PROPN', 'VERB')]
      
    • Customize prompt to allow LLM refer the knowledges from vector database and the chat history
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
      source_knowledge = "\n".join([x.page_content for x in results])
      history = "\n".join([x.content for x in chat_history])
      augmented_prompt = f"""Based on the contexts below and the chat history, conclude the contexts and answer the question in your own words. You can answer you don't know if you cannot find related information from contexts.
    
      Contexts:
      {source_knowledge}.
    
      Chat history:
      {history}.
    
      Question: {query}"""
    

Frontend

  • designed by React and refered to GPT’s UI
  • setup the chat history like the backend
  • using axios to make request to the backend

Source Code

Available on Github

1
2
git clone git@github.com:zhaojun-szh-9815/rag-chatbot.git
cd rag-chatbot
1
2
3
cd server
pip install -r requirements.txt
flask run
1
2
3
cd client
npm install
npm start
This post is licensed under CC BY 4.0 by the author.