AWS Connection

The Problem

We plan to build an AWS S3 sink, in which we write to an S3 bucket under the end user's control. We'll need to integrate with AWS IAM so that our users can grant their Materialize region the ability to write to their desired S3 bucket.

Integration with AWS IAM would also be useful for facilitating IAM authentication with the following existing source types:

PostgreSQL on AWS RDS
Kafka on AWS MSK

While we support AWS RDS/MSK today via traditional username/password authentication, many users prefer to use AWS IAM with these services, as it eliminates the need to manage additional user accounts/passwords.

Success Criteria

Be able to create an AWS connection using either AWS credentials or by using the AssumeRole API.
Be able to check that the AWS connection is valid or not using VALIDATE CONNECTION.

Solution Proposal

Allow users to create an AWS CONNECTION with AWS Credentials or AWS IAM AssumeRole.

AWS Credentials

Creating an AWS connection with credentials should be quite straightforward. The SQL would look like,

CREATE SECRET aws_secret_key = 'secret';

CREATE CONNECTION aws_conn TO AWS (
  ACCESS KEY ID = 'access_key',
  SECRET ACCESS KEY = SECRET aws_secret_key
);

The aws_conn is now available to be used in supported queries.

AWS IAM AssumeRole

Users will need to first create a role on their end, say, MaterializeConn and note down it's arn, e.g: arn:aws:iam::001234567890:role/MaterializeConn.

User then creates a connection in Materialize with that arn.

CREATE CONNECTION aws_conn TO AWS (
  ASSUME ROLE ARN = 'arn:aws:iam::001234567890:role/MaterializeConn'
);

On Materialize's end we generate an External ID and create an entry in a new mz_aws_connections table.

Users should be able to get the external_id and the principal by querying this table.

SELECT principal, external_id
FROM mz_aws_connections aws
JOIN mz_connections c ON aws.id = c.id
WHERE c.name = 'aws_conn';

Users should then be able to add a trust policy in their AWS account to give access to Materialize.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "<principal>"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<external_id>"
        }
      }
    }
  ]
}

They can also retrieve a fully rendered version of the above policy from the example_trust_policy column in the mz_aws_connections table.

Implementation

CREATE CONNECTION

We already have some existing code, which can be re-used to handle the AssumeRole scenario.

In the parser we already support ROLE ARN option which can be renamed to ASSUME ROLE ARN to better reflect the usage. We will need to add an ASSUME ROLE SESSION NAME option as well.

The AWS Connection create statement with all the options will now look like:

CREATE CONNECTION <connector_name> TO AWS (
  ACCESS KEY ID [[=] <value>], -- existing
  SECRET ACCESS KEY [[=] SECRET <value>], -- existing
  SESSION TOKEN [[=] SECRET <value>], -- rename existing from TOKEN
  ASSUME ROLE ARN [[=] <value>], -- rename existing from ROLE ARN
  ASSUME ROLE SESSION NAME [[=] <value>], -- new option
  ENDPOINT [[=] <value>], -- existing
  REGION [[=] <value>] -- existing
);

Users should be able to provide either the ACCESS KEY options or the ASSUME ROLE options, but not both.

When creating the connection, users should specify WITH (VALIDATE = false) in the sql. This is because we won't be able to validate the connection unless the trust policy has been added on the user's end.

`mz_aws_connections` table

We'll add a new mz_aws_connections table with the following definition:

CREATE TABLE mz_internal.mz_aws_connections (
    -- The ID of the connection in `mz_connections`.
    id text NOT NULL,
    -- The value of the `ENDPOINT` option, if specified. `NULL` otherwise.
    endpoint text,
    -- The value of the `REGION` option, if specified. `NULL` otherwise.
    region text,
    -- The value of the `AWS ACCESS KEY ID` option, if specified as a string.
    -- `NULL` otherwise.
    access_key_id text,
    -- The ID of the `AWS ACCESS KEY ID` secret in `mz_secrets`, if specified as a secret.
    -- `NULL` otherwise.
    access_key_id_secret_id text,
    -- The value of the `ASSUME ROLE ARN` option, if specified. `NULL` otherwise.
    assume_role_arn text,
    -- The value of the `ASSUME ROLE SESSION ARN` option, if specified. `NULL` otherwise.
    assume_role_session_name text,
    -- The AWS IAM principal Materialize will use to assume the role, if
    -- the connection configured `ASSUME ROLE ARN`. `NULL` otherwise.
    principal text,
    -- The external ID Materialize will use to assume the role, if
    -- the connection configured `ASSUME ROLE ARN`. `NULL` otherwise.
    external_id text,
    -- A fully rendered AWS IAM trust policy that can be installed on the
    -- role specified by `ASSUME ROLE ARN` to grant Materialize permission to
    -- assume the role. `NULL` if `ASSUME ROLE ARN` is not specified.
    example_trust_policy jsonb
)

External ID and principal

We already have external ID prefix provided in the catalog state. The connection ID will be appended to the external ID prefix to get the complete External ID and then stored in the new mz_aws_connections table for that connection ID.

For principal, we'll create one global role named arn:aws:iam::<account_id>:role/MaterializeConnection which all environmentd/clusterd-s can assume. This will add some good restrictions on who could assume the customers role and also allow for migration between environment clusters.

Note that role is named with the end customer in mind. Internally, we might prefer a more verbose name, like MaterializeEnvironmentConnectionIntermediatingRole, to clearly specify this role's purpose relative to the other roles in the account. But for the end customer, this is the one and only role they'll interact with in our account, so we prefer the conciseness and simplicity of MaterializeConnection.

Note: There's a hard limit of 1 hour on the session duration when Role chaining in AWS. This can be revisited for workarounds if we hit this issue.

VALIDATE CONNECTION

We can make use of the GetCallerIdentity AWS API to check if the credentials are valid. We should make the call with the given External ID and without to make sure that the user has a secure setup. If we are able to access without the External ID, we should treat that as an error.

Optionally, with ASSUME ROLE ARN we can also try to get temporary credentials.

Rollout and Testing

We should put the AWS connection behind a feature flag.

Testing During development

Write cloudtests to test out the different AWS Connections. For the AssumeRole AWS Connection we should test that the expected principal and external_id is populated in the mz_aws_connections table.

Testing after code is merged

Switch on the feature flag for the staging environment and create an AWS Connection in staging followed by VALIDATE CONNECTION.

Future work

Possibly extend `VALIDATE CONNECTION` of an AWS connection to be comprehensive.

An AWS connection can be potentially re-used across multiple services like S3 or RDS. Having permission to an S3 bucket might not mean that RDS access is set up correctly. So a validate connection will not be comprehensive and we'll probably do some additional check with the intended resources when we actually make use of this connection.

To properly validate the connection we should ask the user to actually use the connection, something like COPY (SELECT 1) TO S3 USING aws_conn.

We could also extend VALIDATE CONNECTION sql to optionally specify an S3 or a database url, like, VALIDATE CONNECTION aws_conn WITH (S3 PATH = 's3://prefix') so the user can validate if the policies are set up correctly on their end in AWS. One reason this might be difficult though, because the only permissions we should require is ListBucket and PutObject. And to test if we can write something to the prefix without RemoveObject, we end up with leaked objects which we can't clean.

Alternatively, we could have a higher level S3 connection using this AWS connection,something like CREATE CONNECTION s3_conn TO S3 ( PREFIX = 'url') USING AWS CONNECTION aws_conn. Validating s3_conn could verify if we are able to list the prefix, create and remove a file there. This seems like an overkill though and the approach above would probably be better.

We will revisit this when designing the COPY ... TO S3 feature.

20231110_aws_connections.md 10 KB History Raw