Parametric modeling is a prevailing 3D modeling approach in design, architecture, and engineering. The emergence of multimodal large language models (LLMs) brings a new opportunity to lower the entry barriers to this powerful tool. However, describing 3D geometries through natural language can be fuzzy and challenging. We introduce co-speech gesture, a natural and expressive interaction modality to complement text prompts for LLM-empowered generative parametric modeling. We first conducted an elicitation study to explore and categorize co-speech gesture expressions. Based on the findings, we designed a multimodal fusion pipeline that parametrizes gestures and synthesizes them with speech. This approach reduces language ambiguity by translating implicit user intentions into explicit parametric attributes, thus lifting the model generation performance. We conducted a two-session user study testing and comparing it with traditional language and sketch inputs. This work streamlines the parametric modeling workflow and explores novel multimodal interaction paradigms for LLM-empowered design and creation.
ACM CHI Conference on Human Factors in Computing Systems