Controlling your computer with voice: Windows and Linux
Controlling a computer with voice sounds simple: speak a command, and the system should respond. In practice, the quality of the experience depends on speed, accuracy, setup, and how efficiently the system turns spoken commands into real actions.
Voice control is useful for accessibility, repetitive strain injury, hands-free work, and faster interaction with software. But not every voice-control option is built for the same purpose. Some tools are mainly for dictation. Some are speech-recognition engines for developers. Some can control parts of the desktop. A smaller number are designed for full hands-free computer control.
This article focuses on Windows and Linux, what is available today, and where a more specialized tool like BenASR fits.
Windows Voice Access
Windows includes built-in voice control through Microsoft Voice Access. It allows users to control the PC, open apps, switch between windows, click buttons, scroll, select items, and enter text using spoken commands.
To enable it on Windows 11:
- Open Settings
- Go to Accessibility
- Open Speech
- Turn on Voice access
Once Voice Access is enabled, a control bar appears at the top of the screen. From there, you can start using spoken commands to interact with Windows.
Useful commands include:
| Command | Action |
|---|---|
| ”Voice access wake up” | Start listening |
| ”Voice access sleep” | Stop active listening |
| ”What can I say?” | Show available commands |
| ”Open File Explorer” | Open File Explorer |
| ”Open Settings” | Open Windows Settings |
| ”Switch to Chrome” | Switch to an open app |
| ”Click Recycle Bin” | Click an item by name |
| ”Double-click Recycle Bin” | Double-click an item |
| ”Right-click” | Perform a right-click |
| ”Scroll down” | Scroll the current page or window |
| ”Delete that” | Delete the last dictated phrase |
| ”Correct that” | Correct recent text |
Voice Access is useful because it is already part of Windows and does not require a separate commercial tool. It is a good starting point for users who want to test hands-free control.
The downside is efficiency. Many tasks require several spoken steps, and recognition or command execution can feel delayed. When a user has to speak, wait, check the result, correct a mistake, and repeat the process, the workflow becomes slower. For occasional use, this may be fine. For daily full-computer control, the feedback loop can become frustrating.
Windows Voice Typing
Windows also includes voice typing for dictation. This is separate from full computer control.
To start voice typing:
- Click inside any text field
- Press Windows key + H
- Wait for the listening prompt
- Speak normally
Useful voice typing commands include:
| Command | Action |
|---|---|
| ”Stop listening” | Stop dictation |
| ”Delete that” | Delete the last phrase |
| ”Select that” | Select the last phrase |
| ”Press Enter” | Insert a line break |
| ”Press Backspace” | Delete backward |
| ”Undo that” | Undo the previous action |
Voice typing is helpful for entering text quickly, but it is not the same as controlling the computer. It is best understood as a dictation feature, not a complete hands-free workflow.
Linux Voice Control
Linux does not have one polished, universal voice-control system built into the operating system. Instead, users usually rely on third-party tools, open-source engines, scripts, and custom workflows.
Some Linux options include:
| Tool | What it is | Best for |
|---|---|---|
| Julius | Open-source speech recognition engine | Developers and research projects |
| CMU Sphinx / PocketSphinx | Open-source speech recognition toolkit | Custom offline recognition projects |
| Voice2JSON | Offline speech and intent recognition toolkit | Small command-based workflows |
| Google2Ubuntu | Older Linux voice-command project | Legacy/experimental setups |
| Talon | Voice, noise, and eye-tracking control system | Power users, programmers, RSI/accessibility workflows |
The main issue on Linux is fragmentation. These tools can be powerful, but many require technical setup, scripting, configuration, or community command packs. Some are older or project-like rather than polished products. For a technical user, Linux voice control is possible. For most users, it is not plug-and-play.
Example root-style setup commands you may see in older Linux guides look like this:
add-apt-repository ppa:benoitfra/google2ubuntu
apt update
apt install google2ubuntu
That kind of setup shows the problem clearly: many Linux voice-control paths are possible, but they often feel like engineering projects rather than finished daily-use tools.
Third-Party Alternatives
There are also several third-party tools worth knowing about. They do not all solve the same problem, so the best choice depends on whether you want dictation, transcription, coding control, meeting notes, or full desktop control.
| Tool | Platform | Best for | Limitations |
|---|---|---|---|
| Dragon Professional | Windows | Professional dictation, transcription, custom voice commands | Expensive, mostly focused on speech-to-text and professional documentation |
| Microsoft Dictate | Microsoft 365 apps | Dictation inside Word, Outlook, OneNote, and PowerPoint | Mainly text entry, not full PC control |
| Google Docs Voice Typing | Google Docs in browser | Free dictation inside Google Docs | Limited to Google Docs/Slides workflows |
| Talon | Windows, Linux, macOS | Hands-free coding, accessibility, advanced customization | Powerful but technical; setup takes time |
| Otter | Web and mobile | Meeting transcription, summaries, speaker identification | Not designed for controlling the computer |
| Notta | Web and mobile | Transcription and note-taking | More about converting speech/audio to text than controlling software |
| Julius | Linux, Unix-like systems, Windows via ports | Speech-recognition research and custom systems | Engine/toolkit, not a finished desktop control product |
| CMU Sphinx | Cross-platform | Offline speech-recognition projects | Developer toolkit, not a modern full-control interface |
These tools are useful, but they serve different jobs. Dragon is strong for dictation. Otter and Notta are better for meetings and transcription. Google Docs Voice Typing is convenient for writing in Docs. Talon is powerful for hands-free coding and accessibility, but it requires commitment and customization. Julius and CMU Sphinx are more like building blocks for developers.
What Practical Voice Control Needs
A practical voice-control system needs more than speech recognition. It needs:
- Fast response time
- High command accuracy
- Short spoken commands
- Keyboard control
- Mouse control
- Window control
- App-specific commands
- A workflow that does not require constant correction
If a voice-control system is accurate but slow, it breaks concentration. If it is fast but command-heavy, it becomes tiring. If it only supports dictation, it cannot replace the mouse and keyboard. For full computer control, the system has to be designed around action, not just transcription.
Where BenASR Fits
BenASR is built specifically for hands-free voice control on Windows and Linux. It focuses on short commands, low-latency recognition, custom voice training, local daily use, keyboard control, mouse control, global shortcuts, and application-specific commands.
That makes it different from tools that are mainly for dictation or meeting transcription. BenASR is aimed at users who want to control the computer itself: switching windows, pressing keys, clicking, scrolling, navigating apps, triggering shortcuts, and working hands-free.
BenASR also includes a dictation mode, so it can be used for text entry as well. But its main strength is computer control.
If built-in Windows tools feel slow, Linux tools feel too patchy, and dictation apps are not enough, BenASR is worth exploring.
Visit BenASR.com to learn more.