# Introduction

cloudtest is a test framework that allows testing Materialize inside Kubernetes.

Using a Kubernetes environment for testing has the advantage of exercising the
same code paths used in production used to orchestrate cloud resources (e.g.,
clusters and secrets). Kubernetes will also be responsible for restarting any
containers that have exited.

Notable deviations from production include:

  * Using [MinIO] instead of S3 for `persist` blob storage.
  * Using a single-node CockroachDB installation instead of Cockroach Cloud.
  * No access to AWS resources like VPC endpoints.

The framework is based on [pytest] and [kind] and uses, for the most part, the
official [`kubernetes`] Python library to control the Kubernetes cluster.

# Setup

1. Install [kubectl], the official Kubernetes command-line tool:

    On macOS, use [Homebrew] to install it:

    ```
    brew install kubectl
    ```

    On Linux, use:

    ```
    curl -fL https://dl.k8s.io/release/v1.26.6/bin/linux/amd64/kubectl > kubectl
    chmod +x kubectl
    sudo mv kubectl /usr/local/bin
    ```

    See the [official kubectl installation instructions][kubectl-installation]
    for additional installation options.

2. Install [kind], which manages local Kubernetes clusters:

    On macOS, use:

    ```
    brew install kind
    ```

    On Linux, use:

    ```
    curl -fL https://kind.sigs.k8s.io/dl/v0.29.0/kind-linux-amd64 > kind
    chmod +x kind
    sudo mv kind /usr/local/bin
    ```

    See the [official kind installation instructions][kind-installation]
    for additional installation options.

3. Create and configure a dedicated kind cluster for cloudtest:

    ```
    cd test/cloudtest
    ./setup
    ```

4. On macOS, configure Docker to use "gRPC FUSE" as file sharing implementation for the containers
   (Docker settings, tab "General"). This will speed up the execution of cloudtests.

# Running tests

To run all short tests:

```
./pytest
```

To run a single test:

```
./pytest -k test_name_goes_here
```

⚠️ By default, cloudtest builds Materialize in release mode. You can instead
build in debug mode by passing the `--dev` flag:

```
./pytest --dev [-k TEST]
```

⚠️ By default, cloudtest only runs short tests. To include long tests you can include the `-m=long`
flag:

```
./pytest -m=long
```

To check the cluster status:

```
kubectl --context=kind-mzcloud get all
```

Consider also using the [k9s] terminal user interface:

```
k9s --context=kind-mzcloud
```

To remove all resources from the Kubernetes cluster, so that a test can be rerun
without needing to reset the cluster:

```
./reset
```

To remove the Kubernetes cluster entirely:

```
./teardown
```

# Interactive development

cloudtest is also the recommended tool for deploying a local build of
Materialize to Kubernetes, where you can connect to the cluster and
interactively run tests by hand.

Use the `test_wait` workflow, which does nothing but wait for the default
cluster to become ready:

```
./pytest --dev -k test_wait
```

# Writing tests

See the examples in `test/clustertest/test_smoke.py`.

The tests follow pytest conventions:

```python
from materialize.cloudtest.app.materialize_application import MaterializeApplication

def test_something(mz: MaterializeApplication) -> None:
    assert ...
```

The `MaterializeApplication` object is what creates the Kubernetes cluster. It
is instantiated once per `pytest` invocation

## Waiting for a resource to reach a particular state

```
from materialize.cloudtest.util.wait import wait

wait(condition="condition=Ready", resource="pod/compute-cluster-u1-replica-u1-0")
```

`wait` uses `kubectl wait` behind the scenes. Here is what the `kubectl wait`
documentation has to say about the possible conditions:

```shell
# Wait for the pod "busybox1" to contain the status condition of type "Ready"
kubectl wait --for=condition=Ready pod/busybox1

# The default value of status condition is true; you can wait for other targets after an equal delimiter (compared
after Unicode simple case folding, which is a more general form of case-insensitivity):
kubectl wait --for=condition=Ready=false pod/busybox1

# Wait for the pod "busybox1" to contain the status phase to be "Running".
kubectl wait --for=jsonpath='{.status.phase}'=Running pod/busybox1

# Wait for the pod "busybox1" to be deleted, with a timeout of 60s, after having issued the "delete" command
kubectl delete pod/busybox1
kubectl wait --for=delete pod/busybox1 --timeout=60s
```

In particular, to wait until a resource has been deleted:

```python
wait(condition="delete", resource="secret/some_secret")
```

## Running testdrive

```python
mz.testdrive.run(
    input=dedent(
        """
        > SELECT 1;
        1
        """
    )
)
```

Note that each invocation of `testdrive` will drop the current database and
recreate it. If you want to run multiple `testdrive` fragments within the same
test, use `no_reset=True` to prevent cleanup and `seed=N` to make sure they all
share the same random seed:

```python
mz.testdrive.run(..., no_reset=True, seed = N)
```

## Running one-off SQL statements

If no result set is expected:

```python
mz.environmentd.sql("DROP TABLE t1;")
```

To fetch a result set:

```python
id = mz.environmentd.sql_query("SELECT id FROM mz_secrets WHERE name = 'username'")[0][0]
```

## Interacting with the Kubernetes cluster via kubectl

You can call `kubectl` and collect its output as follows:

```
secret_description = mz.kubectl("describe", "secret", "some_secret")
```

## Interacting with the Kubernetes cluster via API

The following methods

```python
mz.environmentd.api()
mz.environmentd.apps_api()
mz.environmentd.rbac_api()
```

return API handles that can then be used with the official [`kubernetes`] Python
module.

[Homebrew]: https://brew.sh
[`kubernetes`]: https://github.com/kubernetes-client/python
[k9s]: https://k9scli.io
[kind-installation]: https://kind.sigs.k8s.io/docs/user/quick-start/#installing-with-a-package-manager
[kind]: https://kind.sigs.k8s.io
[kubectl-installation]: https://kubernetes.io/docs/tasks/tools/
[kubectl]: https://kubernetes.io/docs/reference/kubectl/
[MinIO]: https://min.io
[pytest]: https://pytest.org

# Troubleshooting

## DNS issues

If pods are failing with what seems like DNS issues (can't resolve redpanda, or
cannot connect to postgres) you can try and have a look at the [relevant
Kubernetes
documentation](https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/).
At least the list of [known
issues](https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues)
can be very relevant for your linux distribution, if it is running
`systemd-resolved`.

In at least one case, a VPN (mullvad) was interfering with DNS resolution. Try
de-activating your VPN and then tear down and restart your testing cluster to
see if that helps.

## botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied

If tests are failing almost immediately while trying to upload a file to S3, it may be a bug in our debuginfo upload logic. You can _**unset**_ all your AWS credentials to work around this.

## Failure joining worker nodes

If `./setup` fails during the `Joining worker nodes` step and spams 404 error messages, the kubelet has likely died on at least one node. You can troubleshoot this by adding `--retain` to the `kind create cluster` command in `setup`, and then `docker exec -it "$node" bash` to access the node. From there you can access the kubelet logs with `journalctl -xeu kubelet`.

Some common issues are listed at https://kind.sigs.k8s.io/docs/user/known-issues . We launch many nodes, so it is likely to be the inotify limits.