Metadata-Version: 2.1
Name: cpgdata
Version: 0.1.0
Summary: Cell painting gallery data handling and validation
Author: Ankur Kumar
Author-email: ank@leoank.me
Requires-Python: >=3.8,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: click (>=8.1,<9.0)
Requires-Dist: cpgaws (>=2.15,<3.0)
Requires-Dist: cpgparser (>=0.1,<0.2)
Requires-Dist: joblib (>=1.3.2,<2.0.0)
Requires-Dist: polars (>=0.19,<0.20)
Requires-Dist: pyarrow (>=13.0,<14.0)
Requires-Dist: pydantic (>=2.4,<3.0)
Description-Content-Type: text/markdown

# Cell painting gallery data handling and validation

## Getting started


### Install `cpgdata` package

```bash
pip install cpgdata
```

### Sync pre-generated index files

```bash
cpg index sync -o "path to save index files"
```

### Example of using the index for filtering files to download from the Cell painting gallery

```python
from pprint import pprint

import polars as pl
from cpgdata.utils import download_files, parallel

index_dir = Path("path to dir containing index files")
index_files = [file for file in index_dir.glob("*.parquet")]
df = pl.scan_parquet(files)

df = (
    df
    .filter(pl.col("dataset_id").eq("cpg0016-jump"))
    .filter(pl.col("source_id").eq("source_4"))
    .filter(pl.col("leaf_node").str.contains("Cells.csv"))
    .select(pl.col("key"))
    .collect()
)

# print first 10 results
pprint(df.to_dicts()[0:10])

# Download filtered files
download_keys = df.to_dict()["key"]
parallel(download_keys, download_files)

```

