AWS User Data is Being Stored, Used Outside User's Chosen Regions

UPDATED July 10, 11:15 BST with minor changes, to reflect AWS’s updated guidance.

AWS customers are sharing sensitive AI data sets including biometrical data and voice inputs with Amazon by default — and many didn’t even know.

The cloud provider is using customers’ “AI content” for its own product development purposes. It also reserves the right in its small print to store this material outside the geographic regions that AWS customers have explicitly selected.

It may also share this with AWS “affiliates” it says, without naming them.

The move breaks widespread assumptions about data sovereignty, even if this is arguably on customers for not reading the small print. The cloud provider’s users may need to have read through 15,000+ words of service terms to notice this fact.

(The company says it also makes this clear and visible in product FAQs. Those seeking full definitions of “your content” and “AI content” will need to have read through service terms however, which define “your content as “any ‘company content’ and any ‘customer content,”’ and “AI content” as any of this that is processed by an AI service.)

Many appear to have not noticed that they had opted in to doing this by default. AWS has until recently required customers to actively raise a support ticket if they want to stop this happening (if they had noticed it was in the first place).

Less detail-oriented AWS users, who opted instead to just read 100 words of AWS’s data privacy FAQs — “AWS gives you ownership and control over your content through simple, powerful tools that allow you to determine where your content will be stored” — may be in for something of a shock. (Always read the small print…)

Wait, What?

The — startling for many — issue was flagged this week by Scott Piper, an ex-NSA staffer who now heads up Summit Route, an AWS security training consultancy.

He spotted it after the company updated its opt-out options to make it easier for customers to do so in the console, by API or command line.

Piper is a well-regarded expert in AWS, with a sustained interest in some of the cloud provider’s arcana and says he fears many did not know this was happening: he certainly didn’t. He told Computer Business Review: “It looks like it’s been in the terms since December 2, 2017 according to what I could find in archive.org.

“Apparently no one [sic] noticed this until now.

“This breaks some assumptions people have about what AWS does with their data. Competitors like Walmart are going to take notice.”

(AWS writes to Computer Business Review to emphasise a distinction it says it draws between “content” and “data”. It has not provided definitions of either, but appears to want to differentiate between customer data in the large, and explicit AI workloads).

Numerous AWS services are named by the company as doing this, including CodeGuru Profiler, which collects runtime performance data from live applications, Rekognition, a biometrics service, and Transcribe, an automatic speech recognition service.

Policy “Breaks Assumptions About Data Sovereignty”

Piper added: “The fact that AWS may move your data outside of the region breaks assumptions about data sovereignty. AWS has frequently made the claim about how your data doesn’t leave the region you put it in. That has been given as the reason why you have to specify the region for an S3 bucket for example, and AWS has advertised this point when comparing themselves to other cloud providers.

“The fact [is] that until now the only way you could opt out of this was to 1) know about it in the first place and 2) file a support ticket.”

AWS declined to comment on the record.

The company’s terms make it clear that AWS sees it as users’ responsibility to clearly notify their own customers that this is happening.

i.e.: 50.4 “You are responsible for providing legally adequate privacy notices to End Users of your products or services that use any AI Service and obtaining any necessary consent from such End Users for the processing of AI Content and the storage, use, and transfer of AI Content as described under this Section 50.”

How many AWS customers have pushed such privacy notices down to end-users remains an open question.

The revelation was also news to one experienced cloud user, Steve Chambers.

Chambers, who is an AWS consultant, told Computer Business Review: “The question should be: Why would anyone opt-in to this? If they wouldn’t opt-in by default, then surely the default should be opt-out? There’s a difference between using telemetry data about customer use of AI services, which I think should be fair game, but using the actual content — it’s like AWS accessing the records inside my RDS database (which they don’t do… do they?) rather than collecting telemetry about how I’m using RDS.”

(Editor’s note: No, AWS does not access records inside customers’ RDS databases. This is only AI workload content for product training).

AWS User Data: Storage/Use Opt-Out Updated

A document updated this week by AWS gives guidance to organisations on opting out and a new tool allows users to set a policy that activates it across their estate.

It notes: “AWS artificial intelligence (AI) services collect and store data as part of operating and supporting the continuous improvement life cycle of each service. As an AWS customer, you can choose to opt out of this process to ensure that your data is not persisted within AWS AI service data stores or used for service improvements.”

(Users can go to console > AI services opt-out policies or do so through the command line interface or API. (CLI: aws organizations create-policy; AWS API: CreatePolicy).

Which AWS Services Do This?

AWS Terms 50.3 mention CodeGuru Profiler, Lex, Polly, Rekognition, Textract, Transcribe, and Translate. 60.4 also mentions this for SageMaker. 75.3 mentions this for Fraud Detector. 76.2 mentions this for Mechanical Turk and Augment AI.

Summit Route’s Scott Piper notes: “Interestingly, the new opt-out ability that was added today mentions Kendra as being one of the service you can opt-out of having AWS use your data from, but the service terms do not mention that service. If AWS was using customer data from that service already, I think that is going to get them in trouble.”

UPDATED: AWS says this was an oversight and the opt-out guidance has been updated.

Nicky Stewart, commercial director at UKCloud, a British cloud provider, said: “Its always really important to read the small print in any contract.

“Even the AWS G-Cloud terms (which are ‘bespoked’ to an extent) have hyperlinks out to the service terms which give AWS rights to use Government’s valuable data (which AWS can then profit from) and to move the data into other jurisdictions.

“Given the highly sensitive nature of some of Government’s data that AWS is processing and storing… it would be great to have an assurance from Government that the opt out is being applied as a de-facto policy.”

Telemetry, Customer Data Use Are Getting Controversial

The revelation (for many) comes a week after Europe’s data protection watchdog said Microsoft had carte blanche to unilaterally change the rules on how it collected data on 45,000+ European officials, with the contractual remedies in place for institutions that didn’t like the changes essentially “meaningless in practice.”

The EDPS warned EU institutions to “carefully consider any purchases of Microsoft products and services… until after they have analysed and implemented the recommendations of the EDPS”, saying buyers could have little to no control over where data was processed, how, and by whom.

We always welcome our readers’ thoughts. You can get in touch here.