Amazon will set a standard and enter the Chatbot Competition by releasing a new language model
ChatGPT was released to the public by OpenAI, which catapulted the AI-powered chatbot into the center of mainstream discourse, sparking debates about how it could change business, education, and more. Amazon has released a new language model that outperforms GPT-3.5 and the question arises whether Amazon enters the Chatbot Competition.
In China, the tech giant Google and Baidu launched chatbots to demonstrate to the public that their so-called “generative AI”, a technology capable of producing conversational text, graphics, and more, is also ready for prime time. Large language models (LLMs) can now perform well on tasks requiring complex reasoning thanks to recent technological advances. This is accomplished through CoT prompting, which is the process of developing intermediate steps of sense to demonstrate how to do something. Already, industry giants are working to develop a standard for chatbot advancement. Amazon has joined the fray. Other businesses must rise to the occasion, these competitions will undoubtedly pave the way for the best solution and product.
Multimodal-CoT divides multi-step problems into intermediate reasoning processes that lead to the final answer, even if the inputs come from different modalities such as language and vision. One of the most common approaches to Multimodal-CoT is to combine information from multiple modalities into a single modality before requesting CoT from LLMs. However, there are a few drawbacks to this method, one of which is that much information is lost when moving data from one format to another. Small language models that have been fine-tuned can also perform CoT reasoning in multimodality by combining different aspects of language and vision. The main problem with this approach is that these language models have a proclivity to generate hallucinatory reasoning patterns, which significantly affect the answer. To mitigate the effects of these errors, Amazon researchers developed Multimodal-CoT, which combines visual features in a separate training framework. The framework divides the reasoning process into two parts: determining a reason and determining the answer. The inclusion of the vision in both stages strengthens the model’s arguments. Furthermore, it aids in reaching more accurate conclusions about the answers. It is the first study of its kind to investigate how CoT reasoning differs. On the ScienceQA benchmark, the technique demonstrated cutting-edge performance, outperforming GPT-3.5 accuracy by 16 percentage points and outperforming human performance.