Building AI chatbot using Rasa NLU, Flask and Telegram

Posted by Vishnu Vardhan Chikoti

Posted on 05-Aug-2018 11:19:28

Some of you might be already aware of chatbots, bots that automatically respond to users chat. There are a number of ways businesses are now using chatbots - customer support, shopping assistance, FAQ bots, admin queries in colleges and offices, etc. Thanks to the fast-paced improvements in technology and tools, building chatbots has become simpler than before. There is a lot of work going on in this area across companies and couple of years from now, things might become more simpler and better than what it is today.

In this article, I will explain how to build a chatbot using Rasa NLU, Flask and Telegram.

Basic functionalities of a chatbot

Before jumping into the technical details, here are few basic functional requirements for chatbots.

The bot should understand what the user is looking for technically broken into understanding users intent and the entities. For eg, for a restaurant search bot (the famous example from most of the blogs), if a user asks can you help find a good restaurant in hitec city here the users intent is to find a good restaurant where good is a quality and restaurant is the type of food outlet (entity) hitec city is the location (entity). Other things users might ask for is lets say about a specific food where can I get the best Haleem near hitec city so here intent is to find the best outlet for Haleem (entity) and location (hitec city).

This first part of identifying the intent and entity is the most difficult part of a chatbot and technically falls in the area of Natural Language Understanding (NLU).

The bot should then respond back with an answer, technically referred to as an action so it needs a knowledge database. This is an easy part. Instead of looking up from a request from a UI with a typeahead search/SAYT search or a traditional search with filters, the lookup against the database will be based on the entities extracted by the bot.
Since this a chatbot and a chat is a conversation between the user and the bot, the bot should maintain the context of the conversation. For example, once the bot responds with an answer to the first question about good restaurant in Hitec city, user might ask a further question what about Gachibowli. This next question is just partial and related to the previous intent of searching for a restaurant or searching for Haleem. Unless the bot maintains the context, the bot cannot continue with a meaningful conversation.

With a conversation we also have a possibility that after suggesting the user with a restaurant, user might say not this one. The bot should the suggest a new one within the same context and exact entities.

Above point covers couple of examples of a conversational flow within a context. However, there might be numerous conversational flows with user switching to a different context in between. The user might at the end say thank you and close it. The bot should be able to handle all these possible flows.

These conversational flows are technically referred to as Dialogues or Stories.

The choices

With these as the basic requirements, there are a number of available options to achieve these technically.

Dialogflow (formerly api.ai) by Google
Wit.ai by Facebook
Lex by Amazon
Watson Assistant by IBM
Microsoft Bot framework
LUIS.ai
Rasa NLU or Rasa Core by Rasa

From these, I chose Rasa. Rasa Core is a dialogue engine which allows to configure actions, maintain context/slots, train the model with stories (conversational flows), etc. Rasa NLU is the natural language interpreter, Rasa Core with Rasa NLU covers all of the requirements above for a chatbot.

I chose Rasa as it is open source and I can install it in my own local machine or cloud server and I can configure it with few choices of NLP and ML libraries. With Dialogflow, Wit.ai, Lex, etc, they are services and what happens behind the scenes is abstract to us. Also, I dont have to share my training data or model with those services. With Rasa, I just train a model and that model is with me.

The configuration options Rasa gives are to choose between Spacy or MITIE for NLP, sklearn-crfsuite for Conditional Random Field (CRF) - Named Entity Recognition (NER), MITIE or scikit-learn for intent classification. It internally also uses Tensorflow and numpy.

However for the purpose of this article, I will cover Rasa NLU which is the interpreter and handles the first, important and the difficult requirement. We will worry about the dialogue, context handling and creating a knowledge database in the future articles.

Installation

Rasa NLU can be installed using pip.

pip install rasa_nlu --user

It is safe enough to run the same with the –upgrade option to get any latest changes which are not pulled with the first command. For some reason (most likely that I had installed and started using after couple of days), I had a stale file that caused issues. And it got fixed when I upgraded it.

pip install rasa_nlu --upgrade

You will most likely run into some kind of installation failures which you have to fix as there are a number of required modules that also get installed along with Rasa NLU and there may be some other requirements or conflicts with it.

For example, I had to:

a) upgrade by Mac OS from OS X Yosemite to Mac OS High Sierra as required Tensorflow version is not supported on OS X. You can use the wheels (.whl) for forcibly installing required Tensorflow version based on TensorFlow docs but that didnt help me.

b) I had to use the –user option with pip as there were modules like Six which already shipped with Mac OS but with a different version than what was required for Rasa NLU. I faced this same problem with Six installing Rasa NLU on PythonAnywhere (PAW by itself is hosted on AWS internally but is a suitable hosting option for apps built using Python). Using a virtual environment probably would have avoided the user option but I didnt try that though.

Configuration

Once installed the choice has to be made about which configuration to use for Rasa NLU. Rasa NLU processes the input messages with different components, one after the other and this is called a Processing Pipeline. It is this pipeline that needs to be configured.

To use spacy and sklearn in the pipeline, the config.yml file is very simple just 2 lines. This found to be the fastest and best option for me. As claimed by Spacy, it is the fastest in natural language processing.

language: "en"
pipeline: "spacy_sklearn"

To use the English language with spacy, you have to download the English package.

python -m spacy download en

I ran into the following error on PythonAnywhere and most likely those installing on AWS will also have the same problem.

error: could not create '/usr/local/lib/python2.7/dist-packages/en_core_web_sm': Permission denied

Thanks to PythonAnywhere support, they pointed to the below workaround which fixed my problem.

pip2.7 install --user https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz


python -m spacy link en_core_web_sm en_default

In order to use MITIE and Sklearn, the following is the pipeline configuration to be kept in the config.yml file. The total_word_feature_extractor.dat is the file you need to download from MITIE and it is a big enough 330 MB file.

language: "en"


pipeline:


- name: "nlp_mitie"


model: "data/total_word_feature_extractor.dat"


- name: "tokenizer_mitie"


- name: "ner_mitie"


- name: "ner_synonyms"


- name: "intent_entity_featurizer_regex"


- name: "intent_featurizer_mitie"


- name: "intent_classifier_sklearn"

We have seen 2 configs above Spacy + Sklearn and MITIE + Sklearn. Now, which one is the best? I have read in enough forums that MITIE is better in entity extraction with less number of training examples and sklearn is good in intent classification. The forums also pointed out that Spacy is known for its faster training time but needs more examples like somewhere in 5000 range.

Just like many others would have chosen, I chose MITIE and Sklearn as slower training sounded OK for me. However, only after adding some 100 examples and 10 entities, I realized that the word slower refers to few hours. I ran the MITIE training on my Mac (with a 4 GB RAM) and it did not even complete training after 12 hours. It was finishing with couple of minutes initially when I had 2-3 entities and some 15-20 training examples.

I then killed the process trying to build the trained model using MITIE and used Spacy. And to no surprise, as mentioned in the forums, training of the model finished in 1-2 seconds.

However, as opposed to what was stated in the forums, Spacy was better with fewer than 10 examples. Atleast based on a friends trial on the bot.

Training

In order to train the model, a json file is needed with data in the format specified by Rasa. Here is a sample file with an example each for regex, synonyms and text examples with entities marked for the model to learn.

{


"rasa_nlu_data": {


"regex_features": [


    {


    "name": "greet",


    "pattern": "hi[^\\s]*"


    },


],


"entity_synonyms": [


    {


    "value": "Stock",


    "synonyms": ["Security", "Securities", "Stock"]


    }


],


"common_examples": [


    {


    "text": "hey",


    "intent": "greet",


    "entities": []


    },


    {


    "text": "which stock to buy today",


    "intent": "requestAnswer",


    "entities": [


         {


        "start": 6,


        "end": 11,


        "value": "Stock",


        "entity": "Instrument"


        },


        {


        "start": 15,


        "end": 18,


        "value": "Buy",


        "entity": "Direction"


        },


        {


        "start": 19,


        "end": 24,


        "value": "Today",


        "entity": "Time"


        }


    ]


    }


]


}


}

In order to train the model, just run the something like the below. The input json file mentioned above in this case is in the data sub-directory and the model will be generated in the projects directory.

sudo python -m rasa_nlu.train --config config_spacy.yml --data data/boKnowledge.json --path projects

Testing the model

Once the model is trained, its all about testing it with a text. As the interpreter expects Unicode text, the u in the front is needed. The output will be a json. It clearly gives you all the 3 entity/value pairs that it extracted using the NER - CRF (Named Entity Recognition - Conditional Random Fields).

Followed by it will be the intent with the highest confidence, which in this case is the requestAnswer. For reference, it also gives the comparison against other possible intents in the model.

>>> from rasa_nlu.model import Metadata, Interpreter


>>> interpreter = Interpreter.load('/Users/vishnu/Documents/Bots/boBot/projects/default/model_20180609-131219/')


>>> interpreter.parse(u"which stock to buy today")


{u'entities': [{u'extractor': u'ner_crf', u'confidence': 0.7263402714333376, u'end': 11, u'processors': [u'ner_synonyms'], u'value': u'Stock', u'entity': 'Instrument', u'start': 6}, {u'extractor': u'ner_crf', u'confidence': 0.8529053977518952, u'end': 18, u'processors': [u'ner_synonyms'], u'value': u'Buy', u'entity': 'Direction', u'start': 15}, {u'extractor': u'ner_crf', u'confidence': 0.7752217065011565, u'end': 24, u'processors': [u'ner_synonyms'], u'value': u'Today', u'entity': 'Time', u'start': 19}], u'intent': {u'confidence': 0.9876618781642348, u'name': u'requestAnswer'}, 'text': u'which stock to buy today', u'intent_ranking': [{u'confidence': 0.9876618781642348, u'name': u'requestAnswer'}, {u'confidence': 0.003703342270580323, u'name': u'goodbye'}, {u'confidence': 0.003387133203970305, u'name': u'abuse'}, {u'confidence': 0.00230407785555639, u'name': u'deny'}, {u'confidence': 0.0015590872488458996, u'name': u'affirm'}, {u'confidence': 0.0013844812568123326, u'name': u'greet'}]}

Setting it up for end users

So far we have seen how to install Rasa NLU, train the model and test the trained model. Now it is time to expose this to end users via a chat interface.

There are a couple of options in this case:

Build your own chat interface.
Use existing apps like Facebook Messenger, Telegram, Slack, etc

I chose to expose the model via a Telegram Bot. The reason is its simplicity and ease of setting up. A new bot can be created through the BotFather.

Open a chat window with the BotFather, type the command /newbot. It will then ask for a user id and name of your choice and generate a token to be used with the telegram api.

The newly created Telegram Bot can be accessed via Telegram. Telegram can get the user input via the chat window. This needs to be now sent somewhere to be interpreted against the trained model. That somewhere is the webhook. A Telegram Bot needs a webhook where the user input will be sent and responses will be received.

As its a WebHook, we need a WebApp for it. This is exactly where I used the Flask framework. Flask apps are simpler to create than Django or any other way. Open a chat window with the BotFather, type the command /newbot. It will then ask for a user id and name of your choice and generate a token to be used with the telegram api.

As its a WebHook, we need a WebApp for it. This is exactly where I used the Flask framework. Flask apps are simpler to create than Django or any other way.

Flask can be installed with the pip command.

pip install flask --user

Flask is a minimalistic framework which is REST based and WSGI (Web Server Gateway Interface) compliant - internally using Werkzeug. There are not many restrictions other than using the Jinja templates.

I also chose to use the telepot package. The webhook is basically exposed via a secret URL so that only Telegram would know and no one else knows it. No one else can access the URL via a web browser nor they can attack it. Its also important to note that communication between Telegram and webook needs to be secured and hence needs https.

The Flask app code would look something like the below. The view functions in Flask are given as decorators and method is mentioned within it. In this case it is a POST method. The try catch is important as telegram will keep trying for a successful response if the call to the webhook throws an unhandled exception. In my case I noticed, Telegram was calling every minute for a message that returns an exception from the webhook. The message text from Telegram is Unicode text and no explicit conversion is required. In my case the app runs on Nginx on PythonAnywhere and setup to use a custom domain.

from flask import Flask, request

from answersdb import getBotResponse


import telepot


secret = <Some random hexa-decimal-value>


bot = telepot.Bot('API-KEY-FROM-BOT-FATHER')


bot.setWebhook("https://www.<domain-name>.com/{}".format(secret), max_connections=1)


app = Flask(__name__)


@app.route('/{}'.format(secret), methods=["POST"])


def telegram_webhook():


    update = request.get_json()


    interpreter = Interpreter.load('<model-path>/model_20180513-103752/')


    if "message" in update:

    text       = update["message"]["text"]

    chat_id    = update["message"]["chat"]["id"]

    parseddata = interpreter.parse(format(text))

    botresponse = ''

    if parseddata['intent'] and parseddata['intent']['name'] == 'requestAnswer':

                    if parseddata['entities']:

                        answerjson = getBotResponse(parseddata['entities'][0]['entity'], parseddata['entities'][0]['value'].upper(), '', '', '', '', 'What')

                        if answerjson:

                            botresponse = answerjson['answer']

    try:

           bot.sendMessage(chat_id, botresponse)

   except:

                                                 print "Error sending message"

    return "OK"

if __name__ == '__main__':

             app.run()

Hope you found this article useful. Let me know if you have any comments or feedback. In the next part, I will hopefully talk about handling dialogue, contexts and managing the knowledge database for the Bot.

About the author

Vishnu Vardhan Chikoti is a co-author for the book "Hands-on Site Reliability Engineering". He is a technology leader with diverse experience in the areas of Application and Database design and development, Micro-services & Micro-frontends, DevOps, Site Reliability Engineering and Machine Learning.