Although OpenAI’s ChatGPT has already become remarkably articulate, the company is far from satisfied. According to information obtained by The Information, OpenAI is preparing to extend its ambitions into the realm of physical devices, with its core technological focus shifting decisively toward audio-based interaction. To support this goal, OpenAI has reportedly carried out a sweeping internal reorganization over the past two months, reallocating significant resources toward the development of audio models. All of these moves are said to be laying the groundwork for a long-rumored, mysterious AI hardware product expected to debut roughly a year from now, around early 2027.
Most current AI voice assistants—including ChatGPT Voice—still rely on a familiar pipeline: speech-to-text (STT), text-based model processing, and text-to-speech (TTS). While functional, this architecture inevitably introduces latency, and industry insiders note that today’s audio models generally lag behind pure text models in terms of reasoning capability.
The report suggests that OpenAI’s newly formed team is working on an “audio-first” model designed to let AI understand and generate sound directly, eliminating the intermediate text conversion step. This approach promises not only dramatically more responsive conversations, but also a heightened sensitivity to emotional nuance embedded in tone and cadence. As for what this enigmatic hardware might actually look like, speculation is rife.
Across Silicon Valley, AI development appears to be shifting away from screen-centric devices toward wearables. Google is pushing voice-driven search through its Audio Overviews initiative, while Meta has enjoyed early success with its Ray-Ban smart glasses and has reportedly acquired Limitless, a startup focused on wearable AI audio recording.
OpenAI, for its part, has hinted that its device will be “more than just a pair of glasses.” While concrete details remain closely guarded, the hardware is said to emphasize an “always-on” mode of operation.
This suggests a device that does not require waking or unlocking like a smartphone, but instead functions as a discreet, ever-present assistant—continuously listening, sensing its surroundings, and standing ready to help at any moment. The vision aligns neatly with Silicon Valley’s growing fascination with “screenless computing,” where AI fades into the background and surfaces only when needed. Further reports indicate that OpenAI may be developing not one, but at least three distinct hardware designs. One, codenamed “Gumdrop,” is rumored to take the form of an “AI pen.” Earlier speculation also described devices designed to be worn or clipped onto clothing, reminiscent of the AI Pin developed by Humane before its acquisition by HP.
As for the “Gumdrop” device, OpenAI was reportedly considering Luxshare as a manufacturing partner. However, amid ongoing U.S.–China trade tensions and the prospect of steep tariffs on China-made products, production may instead shift to Foxconn facilities in Vietnam—or possibly even to Foxconn assembly lines within the United States.
In my view, OpenAI’s pivot toward audio represents a strikingly precise strategic call. Looking back at 2024 and 2025, the failure of devices like the Humane AI Pin or the Rabbit r1 stemmed largely from two shortcomings: sluggish response times and insufficient intelligence. If OpenAI can truly deliver native audio models capable of “zero-latency” conversation infused with emotional awareness, then whether the hardware takes the form of glasses, a pendant, or earbuds becomes largely incidental.
If, a year from now, we see a device that requires no phone, no wake word, and allows users to speak as naturally as they would to another human, that moment may well mark the true “iPhone moment” of AI hardware.