Metadata-Version: 2.1
Name: data_pilot_checker
Version: 1.6
Summary: A package for automating data quality and integrity checks with optional GPU acceleration using cuDF
Home-page: https://github.com/Sarvesh-GanesanW/datapilot
Author: Sarvesh Ganesan
Author-email: sarveshganesanwork@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: dask[dataframe]
Requires-Dist: matplotlib
Requires-Dist: aiohttp
Requires-Dist: openai
Requires-Dist: asyncio
Requires-Dist: colored

# DataPilotChecker

Datapilot is a Python package that automates data quality and integrity checks for your dataset. It performs several checks including missing values, duplicate rows, outliers, data type validation, and range validation. The package uses cuDF for GPU acceleration if a compatible GPU is available, and falls back to Dask for parallel processing otherwise.

## Installation

### Basic Installation

You can install the package via pip:

```bash
pip install data_pilot_checker
```

## Installation with GPU Support
To use GPU acceleration with cuDF, you need to set up a compatible environment. Follow these steps:

### Create a conda environment with RAPIDS:

```bash
conda create -n rapids-24.06 -c rapidsai -c conda-forge -c nvidia \
    rapids=24.06 python=3.11 cuda-version=12.2
```
### Activate the conda environment:

```bash
conda activate rapids-24.06
```
### Install DataWhiz in the conda environment:

```bash
pip install data_pilot_checker
```

Check the rapids website for cuDF installation. (https://docs.rapids.ai/install)
