Metadata-Version: 2.1
Name: karp-backend
Version: 6.1.3
Summary: Karp backend
Home-page: https://spraakbanken.gu.se
License: MIT
Author: Språkbanken at the University of Gothenburg
Author-email: sb-info@svenska.gu.se
Requires-Python: >=3.10,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Utilities
Provides-Extra: mysql
Provides-Extra: sqlite
Requires-Dist: Deprecated (>=1.2.13,<2.0.0)
Requires-Dist: PyMySQL (>=1.0.2,<2.0.0) ; extra == "mysql"
Requires-Dist: TatSu (>=5.8.3,<6.0.0)
Requires-Dist: aiomysql (>=0.1.1,<0.2.0) ; extra == "mysql"
Requires-Dist: aiosqlite (>=0.17.0,<0.18.0) ; extra == "sqlite"
Requires-Dist: alembic (>=1.8.1,<2.0.0)
Requires-Dist: asgi-correlation-id (>=3.0.1,<4.0.0)
Requires-Dist: elasticsearch (>=6,<7)
Requires-Dist: elasticsearch-dsl (>=6,<7)
Requires-Dist: environs (>=9.3.4,<10.0.0)
Requires-Dist: fastapi (>=0.88.0,<0.89.0)
Requires-Dist: injector (>=0.20.1,<0.21.0)
Requires-Dist: json-streams (>=0.12.0,<0.13.0)
Requires-Dist: mysqlclient (>=2.1.1,<3.0.0) ; extra == "mysql"
Requires-Dist: paradigmextract (>=0.1.1,<0.2.0)
Requires-Dist: pydantic (>=1.10.2,<2.0.0)
Requires-Dist: pyjwt[crypto] (>=2.6.0,<3.0.0)
Requires-Dist: python-dotenv (>=0.19.0,<0.20.0)
Requires-Dist: python-json-logger (>=2.0.4,<3.0.0)
Requires-Dist: regex (>=2022.8.17,<2023.0.0)
Requires-Dist: sb-json-tools (>=0.9.1,<0.10.0)
Requires-Dist: sqlalchemy (>=1.4.44,<2.0.0)
Requires-Dist: sqlalchemy-json (>=0.5.0,<0.6.0)
Requires-Dist: sqlalchemy-utils (>=0.38.3,<0.39.0)
Requires-Dist: tabulate (>=0.9.0,<0.10.0)
Requires-Dist: tenacity (>=8.0.1,<9.0.0)
Requires-Dist: tqdm (>=4.64.1,<5.0.0)
Requires-Dist: typer (>=0.7.0,<0.8.0)
Requires-Dist: ulid-py (>=1.1.0,<2.0.0)
Requires-Dist: urllib3 (>=1.26.13,<2.0.0)
Project-URL: Bug Tracker, https://github.com/spraakbanken/karp-backend/issues
Project-URL: Documentation, https://github.com/spraakbanken/karp-backend
Project-URL: Repository, https://github.com/spraakbanken/karp-backend
Description-Content-Type: text/markdown

# Karp backend

[![PyPI version](https://badge.fury.io/py/karp-backend.svg)](https://badge.fury.io/py/karp-backend)
[![Build Status](https://github.com/spraakbanken/karp-backend/workflows/Build/badge.svg)](https://github.com/spraakbanken/karp-backend/actions)
[![CodeScene Code Health](https://codescene.io/projects/24151/status-badges/code-health)](https://codescene.io/projects/24151)
[![codecov](https://codecov.io/gh/spraakbanken/karp-backend/branch/main/graph/badge.svg?token=iwTQnHKOpm)](https://codecov.io/gh/spraakbanken/karp-backend)

This in the version 6 of Karp backend, [for the legacy version (v5)](https://github.com/spraakbanken/karp-backend-v5).

## Setup

This project uses [poetry](https://python-poetry.org) and
[MariaDB](https://mariadb.org/).

1. Run `make install` or `make install-dev` for a develop-install
2. Install MariaDB and create a database
3. Setup environment variables (can be placed in a `.env` file in the root and then **?** `poetry run` sets those):
   ```
   export DB_DATABASE=<name of database>
   export DB_USER=<database user>
   export DB_PASSWORD=<user's password>
   export DB_HOST=localhost
   export AUTH_JWT_PUBKEY_PATH=/path/to/pubkey
   ```
4. Activate the virtual environment by running: `poetry shell`
5. Run `karp-cli db up` to initialize database
6. Run `make serve` or `make serve-w-reload` to start development server

   or `poetry shell` and then `uvicorn asgi:app`

7. To setup Elasticsearch, download Elasticsearch 6.x and start it
8. Install elasticsearch python libs for the right version
   1. If you use Elasticsearch 6.x, run `source <VENV_NAME>/bin/activate` and `pip install -e .[elasticsearch6]`
9. Add environment variables

```
export ES_ENABLED=true
export ELASTICSEARCH_HOST=localhost:9200
export SEARCH_CONTEXT=es6_search_service
```

## Create test resources

1. `poetry shell` and then:
2. `karp-cli entry-repo create karp/tests/data/config/places.json`
3. `karp-cli resource create karp/tests/data/config/places.json`
4. `karp-cli entries add places tests/data/places.jsonl`
5. Do the same for `municipalities`
6. `karp-cli resource publish places`
7. `karp-cli resource publish municipalities`

## Pre-processing data before publishing

** TODO: review this **
Can be used to have less downtime, because sometimes the preprocessing may
be faster on another machine than the machine that talks to Elasticsearch.
Do `create` and `import` on both machines, with the same data. Use
machine 1 to preprocess and use result on machine 2.

1. Create resource and import data as usual.
2. Run `karp-cli preprocess --resource_id places --version 2 --filename places_preprocessed`

   `places_preprocessed` will contain a pickled dataset containing everything that is needed

3. Run `karp-cli publish_preprocessed --resource_id places --version 2 --data places_preprocessed`
4. Alternatively run `karp-cli reindex_preprocessed --resource_id places --data places_preprocessed`
   , if the resource was already published.

## Technologies

### Python

- Poetry >= 3.10
- FastAPI
- SQLAlchemy
- Typer
- Elasticsearch
- Elasticsearch DSL

### Databases

- MariaDB
- Elasticsearch

## Development

### Version handling

Version can be bumped with [`bump2version`](https://pypi.org/project/bump2version/).

Usage:

- Increase patch number `a.b.X => a.b.(X+1)`: `make bumpversion` or `bumpversion patch`
- Increase minor number `a.X.c => a.(X+1).0`: `make bumpversion-minor` or `bumpversion minor`
- Increase major number `X.b.c => (X+1).0.0`: `make bumpversion-major` or `bumpversion major`
- To custom version `a.b.c => X.Y.Z`: `bumpversion --new-version X.Y.Z`

`bumpversion` is configured in [`.bumpversion.cfg`](.bumpversion.cfg).

The version is changed in the following files:

- [`setup.py`](setup.py)
- [`src/karp/__init__.py`](src/karp/__init__.py)
- [`.bumpversion.cfg`](.bumpversion.cfg)
- [`doc/karp_api_spec.yaml`](doc/karp_api_spec.yaml)

