MotionGPT: text-to-motion | | | | Turtles AI

MotionGPT: text-to-motion
DukeRem26 July 2023
  #Researchers build #MotionGPT, a large language model (LLM) that can also handle #human #motion #data. By representing motion as "motion #tokens", MotionGPT can #generate human motions from text descriptions and performs tasks like motion captioning and prediction at state-of-the-art levels. Click here for the original paper and click here for the GitHub repository. While current LLMs (see our guides by clicking here) have achieved impressive results, they lack the ability to model multimodal data such as human motion. The researchers propose MotionGPT, a unified motion-language model that can handle tasks related to human motion. They use vector quantization to represent human motion as discrete "motion tokens" similar to word tokens. They then perform language modeling on both motion and text tokens, treating human motion as a "foreign language". MotionGPT is pre-trained on a mixture of motion-language data and fine-tuned using prompt-based question-answering tasks. Their experiments show that MotionGPT achieves state-of-the-art performance on various motion tasks like text-driven motion generation, motion captioning, motion prediction and motion interpolation. While still in the early stages, MotionGPT shows promise as a first step towards incorporating human motion into large language models.