# `cogex`

`cogex` is a tool for managing extractors for Cognite Data Fusion written in Python. It provides
utilities for initializing a new extractor project and building self-contained executables of Python
based extractors.


## Important note for users running `pyenv`

`pyenv` is a neat tool for managing Python installations.

Since `cogex` uses PyInstaller to build executables, we need Python to be installed with a shared
instance of `libpython`, which `pyenv` does not do by default. To fix this, make sure to add the
`--enable-shared` flag when installing new Python versions with `pyenv`, like so:

```bash
env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.9.0
```

You can read more about it in the [PyInstaller documentation](https://pyinstaller.readthedocs.io/en/stable/development/venv.html#pyenv-and-pyinstaller)


## Overview of features


### Start a new extractor project

To start a new extractor project, move to the desired directory and run

```bash
cogex init
```

You will first be prompted for some information, before `cogex` will initialize a new project.


### Add dependencies

Extractor projects initiated with `cogex` will use `poetry` for managing dependencies. Running
`cogex init` will automatically install the Cognite SDK and extractor-utils framework, but if your
extractor needs any other dependency, simply add them using `poetry`, like so:

```bash
poetry add requests
```


### Type checking and code style

It is recommended that you run code checkers on your extractor, in particular:

 * `black` is an opinionated code style checker that will enforce a consistent code style throughout
   your project. This is useful to avoid unecessary changes and minimizing PR diffs.
 * `isort` is a tool that sorts your imports, also contributing to a consistent code style and
   minimal PR diffs.
 * `mypy` is a static type checker for Python which ensures that you are not making any type errors
   in your code that would go unnoticed before suddently breaking your extractor in production.

`cogex` will install all of these, and automatically run them on every commit. If you for some
reason need to perform a commit despite one of these failing, you can run `git commit --no-verify`,
although this is not recommended.


### Build and package an extractor project

#### Packaging a binary of your extractor

It is not always an option to rely on a Python installation at the machine your extractor will be
deployed at. For those scenarios it is useful to package the extractor, including its dependencies
and the Python runtime, into a single self-contained executable. To do this, run

```bash
cogex build
```

This will create a new executable (for the operating system you ran `cogex build` from) in the
`dist` directory.

#### Making docker images

To build a docker image, you first need to add a `[tools.cogex.docker]` section to your pyproject
file. The required fields are

 * `tags`: A list of tags to tag the resulting image with. These support some simple templating, if
   you include `{version}` in your tag, it will be replaced with the current version of the
   extractor. `{major}` will be replaced with the current major version.
 * If your `[tool.poetry.scripts]` includes multiple entries, you need to specify which one to use
   in the docker image with the `entrypoint` field

In addition, you have some additional fields:

 * `base-image`: Which base image to use. By default, the `debian-slim` based python image for the
   python version currently running with be chosen.
 * `install-dir` if you want to specify where in the image the extractor should be installed
 * `preamble` which can contain additional dockerimage statements to run in the beginning of the
   dockerfile.

Minimal example:

``` toml
[tool.cogex.docker]
tags = ["cognite/my-extractor:{version}"]
```

Larger example (from the DB Extractor):

``` toml
[tool.cogex.docker]
base-image = "python:3.10"
preamble = """
RUN apt-get update \
    && apt-get dist-upgrade -y dirmngr gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client gpg-wks-server \
    && gpgconf gpgsm gpgv libssl-dev libssl1.1 openssl
RUN apt-get install -y apt-utils build-essential
RUN apt-get install -y unixodbc-dev unixodbc
"""
tags = [
    "eu.gcr.io/cognite-registry/db-extractor-base:latest",
    "eu.gcr.io/cognite-registry/db-extractor-base:{version}",
    "cognite/db-extractor-base:{version}",
]
```

You can now build and tag docker images with

``` commandline
cogex build --dockerimage
```

If you just want to see the generated dockerfile, instead run

``` commandline
cogex build --dockerfile
```


### Creating a new version of your extractor

To keep track of which version of the code base is running at a given deployment it is very useful
to version your extractor. When releasing a new version, run

```bash
poetry version [patch/minor/major]
```

To automatically bump the corresponding version number. Note that this only updates the version
number in `pyproject.toml`. When running `cogex build` this new version number will be propagated
through the rest of the code base.

Any extractor project should follow semantic versioning, which means you should bump

 * `patch` for any minor bug fixes or improvements
 * `minor` for new features or bigger improvements that __doesn't__ break compatability
 * `major` for new feature or improvements that breaks compatability with previous versions, in
   other words for those scenarios where the new version is not a drop-in replacement for an old
   version. For example:
   - When adding a new required config field
   - When removing a config field
   - When changing defaults in a way that could break existing deployments
