ControlChat

This poster was presented at the annual Singapore HCI Meetup at the National University of Singapore.

As a visiting Research Engineer at Singapore Management University, I developed a prototype of ControlChat using the Shoelace.js framework and OpenAI GPT-4o API. With support from Neil Chulpongsatorn, Tianyi Zhang, and Yuki Onishi, and the supervision of Jiannan Li and Tony Tang, we set out to explore how interactive user interface elements could address the ambiguous aspects of conversations and reduce the need for laborious prompt instructions.

For example, if a user asks a code-generating LLM to produce “A spinning bright ball on the page”, most systems today will currently jump right into programming a result. The reality is, it is highly unlikely that the produced result will be the same as what the user had in mind. There are simply too many ambiguous aspects of this request to get it right on the first try. How fast should the ball be spinning? What kind of ball is it? What color should it be? Where should the ball be placed on the page? Should it be an image of a ball, a vector graphic, or perhaps something else entirely? Meanwhile, once a result is produced, the AI generated code is often difficult to update and makes assumptions that can challenging to reverse. While additional prompting can clarify instructions, it can be taxing to write and specify all requirements for a given task.

We propose that future generative AI systems should embrace ambiguity in the requests provided to them, and supplement their solutions with generated user interface elements that enable end users to interactively modify results.

*Early ControlChat sketches exploring how a request to “Place a light in the corner of the room” might be handled in an XR environment.*

After some initial success in manually testing the idea directly in ChatGPT + VSCode, I created some rough mockups of the system and an loose architecture diagram showcasing the flow of different interactions and API requests expected to take place. The mockup UI itself was rather aspirational, as the ability for LLMs to produce compelling consistent UI designs still had to be tested. Nonetheless, this served as an early direction for the project and helped with grounding our initial prototype.

Next I developed a web-based prototype of the system, which would take instructions via text field input and translate them into a 3D BabylonJS scene. The LLM also generated controls to address ambiguous aspects of requests on the right hand side of the interface as I experimented with using ChakraUI and Shoelace.js as a source component library for the system. Here are some of the results of the initial prototypes below.

Notably, while the system managed to generate plausibly useful controls with the JS libraries, it struggled with the spatial positioning elements in 3D (see the screenshots with the black backgrounds above). This led us to pivot towards generated 2D websites instead of 3D scenes. The results (below) were much more consistent as, I hypothesize, many web layouts have responsive designs that rely less on the absolute positioning of elements.

Published by Sasha Ivanov