Waverley has partnered with a company of thinkers and innovators to implement a new kind of virtual assistant that can turn the idea-creation process upside down.
The customer is a group of innovators united by the idea of a product to help people and companies drive their creative processes. Inspired by the capabilities of virtual smart assistants like Alexa, Siri, and Google Home, they came up with a concept of a virtual smart assistant that can work as an Ideation Facilitator. Generally, the role of Ideation Facilitator is filled by a person who is involved in group or individual brainstorming sessions. This person directs the flow of a discussion so that it naturally results in an executable idea, doable action items, or effective problem solving. The customer suggested this role could be well accomplished by an AI-powered machine.
The client envisioned the end product as an Ideation Tool in the form of a physical smart speaker that acts as a virtual Ideation Facilitator. It should be able to perceive and reproduce human oral speech so that it substitutes for a live person and can hold a natural conversation on absolutely any topic. By asking relevant directive questions and using a pack of creativity tools and brainstorming techniques, this virtual facilitator must guide the user towards creative ideas or brand-new conclusions that can help the user invent and develop something novel. Moreover, the Ideation Tool has to assist in validating the idea, identifying its business value, checking its novelty, filling out patent submission forms, and other activities related to bringing concepts to reality.
For Waverley engineers, this turned out to be a proof-of-concept R&D project with lots of challenges and continuous search for improvement. We kicked off with a prototyping phase and a limited budget which meant using some ready-to-use solutions. We chose Google tech stack as the most advanced and satisfying toolkit for real-time speech recognition and natural language processing, including Google Cloud Platform for hosting (Kubernetes, Cloud SQL), Speech-to-Text and Text-to-Speech, Dialogflow, and Google Home smart speaker as the product hardware.
As the client brought us new ideas and requirements, our software engineers realized that some of the ready-to-use solutions were not a good fit. For example, Google Home smart speaker with Google NLP services can not be used for continuous dictation due to privacy restrictions. Also, the Dialogflow service does not process speech that converts to more than 256 characters, and we did not want our product to put such a restriction on target users. How should users count how many characters they pronounce? Moreover, the goal was to let the user speak for as long as they want. Meeting this need would mean developing custom software from scratch for a piece of bare smart speaker hardware.
As a result of our research, we pivoted moving away from the Google Home solution with their NLP services and Dialogflow. The idea of a smart speaker was temporarily put on hold, when the client realized that developing custom embedded software requires more time, effort, and financial resources than they’d hoped. Thus, we shifted our main focus to the development of a web application that supports both chat and voice interfaces, creativity tools, ideas storage, the functions of collaboration, patent search, and patent form submission, as the outcome of this process . It is designed to function as a supporting tool for a live ideation facilitator rather than as a substitute of a human being.
Waverley developers continued utilizing Google services such as Speech-to-Text and Text-to-Speech for their proven human speech recognition and synthesis quality. The application is hosted in Google Kubernetes Engine used for containerization and orchestration. For local containerization, Docker is used. MySQL was chosen as the database. The back-end programming language is TypeScript and NestJS framework which facilitates fast development as a good analog for Java Spring. We engaged our JavaScript developers with expertise in the React framework for front-end development. The ML-powered component is built with Python programming language as a separate microservice.
Waverley’s Data Science specialists were tasked to meet two major challenges in the project:
The tech stack involved in this task is based on Python infrastructure including Numpy, Scikit-learn, NLTK, and FastText library.
Initially, the project did not involve any BA expertise, but the client soon realized the need for a Business Analysis expert to increase the overall business efficiency of the project. At this point, the main task of our business analyst is to find ways to make the discussion flow of the chatbot with the user as natural as possible. To reach this aim, we are working out an effective workflow for the chatbot. It should be able to provide guidance in the form of subtle, unintrusive, organic responses, and questions. Plus, the system then will switch between the brainstorming tools and methods it has in the toolkit quite smoothly.
As we work with the client in the rhythm of short development cycles, we communicate with the subject matter expert – the actual ideation facilitator – to discuss any improvements that can be made. As a result, we have to make changes, add and try out new tools and functions, decide on what to preserve and what to omit. We are still working out the problem of increasing relevance, improving question categorization and labeling so the system will get better hints on the discussion flow.
We have built an AI-powered product capable of facilitating individual and group brainstorming sessions. The application is designed to work in several phases: initial data collection, ideation with creativity tools and brainstorming methodologies, detail clarification, and idea submission form creation to start the patenting process. It is ready to recognize and analyze voice input, provide relevant output in the form of text and speech synthesis, as well as storage and collaboration capabilities. A set of additional features are still in development, for example a built-in demonstration feature of the app functionality, an even more natural discussion flow, pause detection algorithms improvement, and question labeling. We are also looking forward to resuming our work on custom embedded software to implement the initially planned smart speaker vision.