Metadata-Version: 2.4
Name: labcas.workflow
Version: 0.1.5
Summary: Get Planetary Data from the Planetary Data System (PDS)
Home-page: https://github.com/NASA-PDS/peppi
Download-URL: https://github.com/NASA-PDS/peppi/releases/
Author: Labcas
Author-email: labcas@jpl.nasa.gov
License: apache-2.0
Keywords: pds,planetary data,api
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy~=1.26.4
Requires-Dist: scikit-image~=0.24.0
Requires-Dist: pandas~=2.2.3
Requires-Dist: matplotlib~=3.9.4
Requires-Dist: boto3==1.35.16
Requires-Dist: dask~=2024.8.0
Requires-Dist: distributed~=2024.8.0
Provides-Extra: ml-worker
Requires-Dist: tensorflow~=2.9.1; extra == "ml-worker"
Provides-Extra: dev
Requires-Dist: black~=23.7.0; extra == "dev"
Requires-Dist: flake8~=6.1.0; extra == "dev"
Requires-Dist: flake8-bugbear~=23.7.10; extra == "dev"
Requires-Dist: flake8-docstrings~=1.7.0; extra == "dev"
Requires-Dist: pep8-naming<0.15.0,>=0.13.3; extra == "dev"
Requires-Dist: mypy~=1.5.1; extra == "dev"
Requires-Dist: pydocstyle~=6.3.0; extra == "dev"
Requires-Dist: coverage~=7.3.0; extra == "dev"
Requires-Dist: pytest~=7.4.0; extra == "dev"
Requires-Dist: pytest-cov~=4.1.0; extra == "dev"
Requires-Dist: pytest-watch~=4.2.0; extra == "dev"
Requires-Dist: pytest-xdist~=3.3.1; extra == "dev"
Requires-Dist: pre-commit~=3.3.3; extra == "dev"
Requires-Dist: sphinx~=7.2.6; extra == "dev"
Requires-Dist: sphinx-rtd-theme~=2.0.0; extra == "dev"
Requires-Dist: tox~=4.11.0; extra == "dev"
Requires-Dist: types-setuptools<74.1.1,>=68.1.0; extra == "dev"
Requires-Dist: Jinja2<3.1; extra == "dev"
Requires-Dist: docutils~=0.20.1; extra == "dev"
Dynamic: download-url

# LabCas Workflow

Run workflows for Labcas

Depending on what you do, there are multiple ways of running a labcase workflow:

- **Developers:** for developers: local run, natively running on your OS
- **Integrators:** for AWS Managed Apache Airflow integrators (mwaa), with a local mwaa
- **System Administrators:** for System administors, deployed/configured on AWS
- **End users:** For end users, using the AWS deployment.


## Developers

The tasks of the workflow run independently from Airflow. TODO: integrate to the airflow python API.

### Install

With python 3.11, preferably use a virtual environment


    pip install -e '.[dev]'

### Start local dask cluster

    docker build -f docker/Dockerfile . -t labcas/workflow
    docker network create dask
    docker run --network dask -p 8787:8787 -p 8786:8786 labcas/workflow scheduler
    docker run --network dask -p 8787:8787 -p 8786:8786 labcas/workflow worker tcp://<scheduler ip>:8786

### Set AWS connection

    ./aws-login.darwin.amd64
    export AWS_PROFILE=saml-pub

### Run/Test the client

    python src/labcas/workflow/manager/main.py

### Deploy package on pypi

Upgrade the version in file "src/labcas/workflow/VERSION.txt"

Publish the package on pypi

    pip install build
    pip install twine
    python -m build
    twine upload dist/*
   


## Integrators

## Build the Dask worker image


    docker build -f docker/Dockerfile . -t labcas/workflow

Start the scheduler:

    docker network create dask
    docker run --network dask -p 8787:8787 -p 8786:8786 labcas/workflow scheduler

Start one worker

    docker run  --network dask -p 8786:8786 labcas/workflow worker 


Start the client, same as in following section


### With dask on ECS

Deploy the image created in the previous section on ECR

Have a s3 bucket `labcas-infra` for the terraform state.

Other pre-requisites are:
 - a VPC
 - subnets
 - a security group allowing incoming request whre the client runs, at JPL, on EC2 or Airflow, to port 8786 and port 8787
 - a task role allowing to write on CloudWatch
 - a task execution role which pull image from ECR and standard ECS task Excecution role policy "AmazonECSTaskExecutionRolePolicy"
 

Deploy the ECS cluster with the following terraform command:

    cd terraform
    terraform init
    terraform apply \
        -var consortium="edrn" \
        -var venue="dev" \
        -var aws_fg_image=<uri of the docker image deployed on ECR>
        -var aws_fg_subnets=<private subnets of the AWS account> \
        -var aws_fg_vpc=<vpc of the AWS account> \
        -var aws_fg_security_groups  <security group> \
        -var ecs_task_role <arn of a task role>
        -var ecs_task_execution_role <arn of task execution role>

## Run

Set you local AWS credentials to access the data


    ./aws-login.darwin.amd64
    export AWS_PROFILE=saml-pub


Start the dask cluster


Run the processing


    python ./src/labcas/workflow/manager/main.py

Publish the package on pypi

    pip install build
    pip install twine
    python -m build
    twine upload dist/*


# Apache Airflow

Test locally using https://github.com/aws/aws-mwaa-local-runner, clone the repository:

    ./mwaa-local-env build-image

Then from your local labcas_workflow repository:

    cd mwaa

## Update the AWS credentials

    aws-login.darwin.amd64
    cp -r ~/.aws .

## Launch the server
 
    docker compose -f docker-compose-local.yml up

## Stop 

    Ctrl^C

## Stop and re-initialize local volumes

    docker compose  -f ./docker/docker-compose-local.yml down -v

    

See the console on http://localhost:8080, admin/test

## Test the requirement.txt files
 
    ./mwaa-local-env test-requirements

## Debug the workflow import

    docker container ls

Pick the container id of image "amazon/mwaa-local:2_10_3", for example '54706271b7fc':

Then open a bash interpreter in the docker container:

    docker exec -it 54706271b7fc bash

And, in the bash prompt:

    cd dags
    python3 -c "import nebraska"








