The NSA has created a tool for transcribing phone calls on mass and converting them into searchable text, according to documents released by the whistleblower Edward Snowden.
Called "Google for Voice", the nine-year-old programme enabled spies to extensively search conversations using keywords, and included an algorithm for flagging particular records.
Dan Froomkin, a journalist at the Intercept, released the latest files which claimed the tool was used in war zones such as Iraq and Afghanistan, but may have employed more widely.
"Spying on international telephone calls has always been a staple of NSA surveillance, but the requirement that an actual person do the listening meant it was effectively limited to a tiny percentage of the total traffic," he wrote on the journal’s website.
"By leveraging advances in automated speech recognition, the NSA has entered the era of bulk listening."
A document released by the Intercept showed that the British spy agency GCHQ had been investigating speech-to-text tools since at least 2001, when IBM informed them that its speech recognition technology was not yet ready for use in the field.
One obstacle in the British use of the technology was that early systems were mostly tested on American accents, which prompted GCHQ to set up its own scheme to test it on British ones, including 56 hours of intercepted calls from Northern Ireland.
However the document also claims that the automatic transcription systems used in 2009 had word error rates of 30-40%, and such programmes required significant investments in training in order to be exploited.
Because of these costs and the limits of the systems, the chair of Speech Technology Working Group within GCHQ recommended that British spying agencies collaborate to maximise their potential.
However the chair also said: "[Speech-to-text technology] has still to prove itself in large-scale applications, but the potential for major benefits in productivity in the future is clear, given sufficient investment in further developing the systems for our target speech."