Metadata-Version: 2.4
Name: eda_toolkit
Version: 0.0.22
Summary: A Python library for EDA, including visualizations, directory management, data preprocessing, reporting, and more.
Author: Leonid Shpaner, Oscar Gil
Author-email: "Leonid Shpaner, Oscar Gil" <lshpaner@ucla.edu>
Project-URL: Documentation, https://lshpaner.github.io/eda_toolkit/
Project-URL: Source Code, https://github.com/lshpaner/eda_toolkit/
Project-URL: Leonid Shpaner's Website, https://www.leonshpaner.com
Project-URL: Oscar Gil's Website, https://www.oscargildata.com
Project-URL: Zenodo Archive, https://zenodo.org/records/13162633
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7.4
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: jinja2==3.1.4
Requires-Dist: matplotlib<=3.9.2,>=3.5.3
Requires-Dist: nbformat<=5.10.4,>=4.2.0
Requires-Dist: numpy<=2.1.2,>=1.21.6
Requires-Dist: pandas<=2.2.3,>=1.3.5
Requires-Dist: plotly<=5.24.1,>=5.18.0
Requires-Dist: scikit-learn<=1.5.2,>=1.0.2
Requires-Dist: scipy<=1.16.3,>=1.5.4
Requires-Dist: seaborn<=0.13.2,>=0.12.2
Requires-Dist: tqdm<=4.67.1,>=4.66.4
Requires-Dist: xlsxwriter==3.2.0
Dynamic: author
Dynamic: license-file
Dynamic: requires-python

[![PyPI](https://img.shields.io/pypi/v/eda_toolkit.svg)](https://pypi.org/project/eda_toolkit/)
[![Downloads](https://pepy.tech/badge/eda_toolkit)](https://pepy.tech/project/eda_toolkit)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/lshpaner/eda_toolkit/blob/main/LICENSE.md)
[![Zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.13162633.svg)](https://doi.org/10.5281/zenodo.13162633)

<br>

<img src="https://raw.githubusercontent.com/lshpaner/eda_toolkit/main/assets/eda_toolkit_logo.svg" width="300" style="border: none; outline: none; box-shadow: none;" oncontextmenu="return false;">

<br> 

Welcome to EDA Toolkit, a collection of utility functions designed to streamline your exploratory data analysis (EDA) tasks. This repository offers tools for directory management, some data preprocessing, reporting, visualizations, and more, helping you efficiently handle various aspects of data manipulation and analysis.


## Prerequisites

Before you install `eda_toolkit`, ensure your system meets the following requirements:

- `Python`: Version `3.7.4` or higher.


Additionally, `eda_toolkit` depends on the following packages, which will be automatically installed when you install `eda_toolkit`:

- `jinja2`: version `3.1.4` (Exact version required)
- `matplotlib`: version `3.5.3` or higher, but capped at `3.9.2`
- `nbformat`: version `4.2.0` or higher, but capped at `5.10.4`
- `numpy`: version `1.21.6` or higher, but capped at `2.1.0`
- `pandas`: version `1.3.5` or higher, but capped at `2.2.3`
- `plotly`: version `5.18.0` or higher, but capped at `5.24.0`
- `scikit-learn`: version `1.0.2` or higher, but capped at `1.5.2`
- `scipy`: version `1.5.4` or higher, but capped at `1.16.3`
- `seaborn`: version `0.12.2` or higher, but capped below `0.13.2`
- `tqdm`: version `4.66.4` or higher, but capped below `4.67.1`
- `xlsxwriter`: version `3.2.0` (Exact version required)


## 💾 Installation

To install `eda_toolkit`, simply run the following command in your terminal:


```bash
pip install eda_toolkit
```

## 📄 Official Documentation

https://lshpaner.github.io/eda_toolkit_docs 


## 🌐 Authors' Websites

1. [Leonid Shpaner](https://www.leonshpaner.com)
2. [Oscar Gil](https://www.oscargildata.com)


## 🙏 Acknowledgements

We would like to express our deepest gratitude to Dr. Ebrahim Tarshizi of the Shiley-Marcos School of Engineering at the University of San Diego for his mentorship in the M.S. in Applied Data Science Program. His unwavering dedication and guidance played a pivotal role in our academic journey, supporting our successful completion of the program and our pursuit of careers as data scientists.

We thank Robert Lanzafame, PhD, for his feedback, encouragement, and thoughtful discussion following our presentation at JupyterCon, and Panayiotis Petousis, PhD, and Arthur Funnell from the CTSI UCLA Health data science team for their helpful comments, constructive feedback, and continued encouragement throughout the development of this library.

Finally, Leon Shpaner would like to personally acknowledge his mentor, former manager, and friend, Gustavo Prado, who hired him at the Los Angeles Film School. Gustavo believed in him early on, gave him the opportunity to grow, and was patient as he developed professionally. He saw potential before it was fully formed and sparked an early interest in data by demonstrating the importance of tools like VLOOKUP. His guidance and trust had a lasting impact. May he rest in peace.

## ⚖️ License

`eda_toolkit` is distributed under the MIT License. See [LICENSE](https://github.com/lshpaner/eda_toolkit/blob/main/LICENSE.md) for more information.

## 🛟 Support

If you have any questions or issues with `eda_toolkit`, please open an issue on this GitHub repository.


## 📚 Citing `eda_toolkit`

If you use `eda_toolkit` in your research or projects, please consider citing it.

```bibtex

@software{shpaner_2024_13162633,
  author       = {Shpaner, Leonid and
                  Gil, Oscar},
  title        = {EDA Toolkit},
  month        = aug,
  year         = 2024,
  publisher    = {Zenodo},
  version      = {0.0.22},
  doi          = {10.5281/zenodo.13162633},
  url          = {https://doi.org/10.5281/zenodo.13162633}
}

```


## 🔖 References

1. Hunter, J. D. (2007). *Matplotlib: A 2D Graphics Environment*. *Computing in Science & Engineering*, 9(3), 90-95. [https://doi.org/10.1109/MCSE.2007.55](https://doi.org/10.1109/MCSE.2007.55)

2. Kohavi, R. (1996). *Census Income*. UCI Machine Learning Repository. [https://doi.org/10.24432/C5GP7S](https://doi.org/10.24432/C5GP7S).

3. Pace, R. Kelley, & Barry, R. (1997). *Sparse Spatial Autoregressions*. *Statistics & Probability Letters*, 33(3), 291-297. [https://doi.org/10.1016/S0167-7152(96)00140-X](https://doi.org/10.1016/S0167-7152(96)00140-X).

4. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). *Scikit-learn: Machine Learning in Python*. *Journal of Machine Learning Research*, 12, 2825-2830. [http://jmlr.org/papers/v12/pedregosa11a.html](http://jmlr.org/papers/v12/pedregosa11a.html).

5. Waskom, M. (2021). *Seaborn: Statistical Data Visualization*. *Journal of Open Source Software*, 6(60), 3021. [https://doi.org/10.21105/joss.03021](https://doi.org/10.21105/joss.03021).





