As smart and adaptive technologies have become more and more integrated into our personal lives, assistive robots are expected to become true partners and companions of the human users they serve. While general purpose robots are not yet ready, we do however see emerging technologies that can help people in the near future. One of the fundamental functionalities for accepting a socially assistive robot is its communication capability. Modern domotics also implies natural interaction through natural language commands.

For the Romanian language, voice interaction is still a great challenge, with some encouraging but limited experiments. The communication context which is of interest for this project is that of a situational dialogue, which implies reference to the immediate reality. One of the most effective methodologies for designing the situational communication component in natural language is based on micro-world scenarios. The great advantage of this approach is the ability to predict the intentions of the human partner, the most likely content of a request or order, and to formulate clever requests for clarification under conditions of insufficient knowledge, in order to process the transmitted message.


The purpose of the ROBIN-DIALOG project is to develop a series of scenarios for several micro-worlds, as well as to develop the technology of processing the Romanian language for situational dialogues in these micro-worlds.
This technology will be validated on the researched scenarios and micro-worlds, but it will be developed in such a way that it can be easily applied to other scenarios and / or micro-worlds. The generic nature will be provided by deep learning methods and by specifying resources (e.g. knowledge bases) in standard languages ​​(eg XML / RDF) that will ensure system operation in any micro-world and / or scenario, as long as the training data and specific resources will be available to them.
The solution will be used in development of computer visions methods to solve a wider and more sophisticated range of computer vision tasks, the development of intelligent “Hands-off driving” “And” Automated driving “and a prototype system that will be tested on a semi-autonomous electric vehicle made available to the consortium by PRIME Motors Industry, during the course of the project. The system will be able to observe, recognize and monitor the scene, the road, the objects and people in the outside environment as well as the driver’s expression, providing it with the necessary information in a non-invasive way (including voice interactions through simple commands) pilotage and decision-making.

The specific objectives of the project are:

  • The design of these scenarios and the situational dialogue system in Romanian language that involves the following activities:
    • Building a lexicon of words and expressions representative for the target micro-world. Examples of micro-worlds are: i. A smart house; ii. A robot acting in a specified environment / space.
    • Automatic extension using the “vector spaces” of the lexicon created manually in step 1a.
    • Creating the world of speech for the micro-world / selected scenario. This step involves identifying semantic relationships that are established between words and thus become predicates that will be validated (true / false) in the context of the dialogue.
  • The inputs of the lexico-semantic resource created in step 1b will be phonetically transcribed and aligned with the corresponding voice signal in the case that these entries exist in CoRoLa.
  • The ASR and TTS drives will be fed with the results of step 2. The ASR and TTS systems will be tested and validated.
  • Implementing a cooperative dialogue system for the selected micro-worlds.

Originality and innovation:

The originality of the proposed solution lies primarily with the fact that there is no similar solution implemented for the Romanian language. The combination of the scenarios methodology with the distribution semantics in the development of a dialogue monitoring system is, to our knowledge, a new and ambitious approach given the character of generality assumed.

The technological solutions envisaged will be implemented modularly, using “deep learning” automatic learning techniques, which allows the adaptation to new micro-worlds as well as an easy migration into new applications. The specification of the speech universe and its correspondence with the words and phrases in the Romanian language will be done in a standardized way (e.g. XML / RDF files).