Metadata-Version: 2.1
Name: getdaft
Version: 0.0.18
Summary: A Distributed DataFrame library for large scale complex data processing.
Home-page: https://getdaft.io
License: Apache-2.0
Author: Eventual Inc
Author-email: daft@eventualcomputing.com
Maintainer: Sammy Sidhu
Maintainer-email: sammy@eventualcomputing.com
Requires-Python: >=3.7.1,<4.0.0
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Provides-Extra: aws
Provides-Extra: experimental
Provides-Extra: iceberg
Provides-Extra: serving
Requires-Dist: Pillow (>=9.2.0,<10.0.0); extra == "experimental"
Requires-Dist: PyYAML (>=6.0,<7.0); extra == "serving" or extra == "experimental"
Requires-Dist: boto3 (>=1.23.0,<2.0.0); extra == "aws" or extra == "serving" or extra == "experimental"
Requires-Dist: cloudpickle (>=2.1.0,<3.0.0); extra == "serving" or extra == "experimental"
Requires-Dist: docker (>=5.0.3,<6.0.0); extra == "serving" or extra == "experimental"
Requires-Dist: fastapi (>=0.79.0,<0.80.0); extra == "serving" or extra == "experimental"
Requires-Dist: fsspec
Requires-Dist: icebridge (>=0.0.4,<0.0.5); extra == "iceberg" or extra == "experimental"
Requires-Dist: loguru (>=0.6.0,<0.7.0)
Requires-Dist: numpy (>=1.16.6,<2.0.0)
Requires-Dist: pandas (>=1.3.5,<2.0.0)
Requires-Dist: pickle5 (>=0.0.12,<0.0.13); python_version < "3.8"
Requires-Dist: polars[timezone] (>=0.14.12,<0.15.0)
Requires-Dist: protobuf (>=3.19.0,<3.20.0)
Requires-Dist: pyarrow (>=6,<7)
Requires-Dist: pydot (>=1.4.2,<2.0.0)
Requires-Dist: ray (==1.13.0)
Requires-Dist: s3fs; extra == "aws"
Requires-Dist: tabulate (>=0.8.10,<0.9.0)
Requires-Dist: typing-extensions (>=4.0.0); python_version < "3.8"
Requires-Dist: uvicorn (>=0.18.2,<0.19.0); extra == "serving" or extra == "experimental"
Project-URL: Repository, https://github.com/Eventual-Inc/Daft
Description-Content-Type: text/x-rst

|Banner|

|CI| |PyPI| |Latest Tag|

`Website <https://www.getdaft.io>`_ • `Docs <https://www.getdaft.io>`_ • `Installation`_ • `10-minute tour of Daft <https://getdaft.io/learn/10-min.html>`_ • `Community and Support <https://github.com/Eventual-Inc/Daft/discussions>`_

Daft: the distributed Python dataframe for media data
=====================================================


`Daft <https://www.getdaft.io>`_ is a fast, Pythonic and scalable open-source dataframe library built for Python and Machine Learning workloads.

  **Daft is currently in its Alpha release phase - please expect bugs and rapid improvements to the project.**
  **We welcome user feedback/feature requests in our** `Discussions forums <https://github.com/Eventual-Inc/Daft/discussions>`_

**Table of Contents**

* `About Daft`_
* `Getting Started`_
* `License`_

About Daft
----------

The Daft dataframe is a table of data with rows and columns. Columns can contain any Python objects, which allows Daft to support rich media data types such as images, audio, video and more.

1. **Any Data**: Columns can contain any Python objects, which means that the Python libraries you already use for running machine learning or custom data processing will work natively with Daft!
2. **Notebook Computing**: Daft is built for the interactive developer experience on a notebook - intelligent caching/query optimizations accelerates your experimentation and data exploration.
3. **Distributed Computing**: Rich media formats such as images can quickly outgrow your local laptop's computational resources - Daft integrates natively with `Ray <https://www.ray.io>`_ for running dataframes on large clusters of machines with thousands of CPUs/GPUs.

Getting Started
---------------

Installation
^^^^^^^^^^^^

Install Daft with ``pip install getdaft``.

Quickstart
^^^^^^^^^^

  Check out our `full quickstart tutorial <https://getdaft.io/learn/quickstart.html>`_!

In this example, we load images from an AWS S3 bucket and run a simple function to generate thumbnails for each image:

.. code:: python

    from daft import DataFrame, lit

    import io
    from PIL import Image

    def get_thumbnail(img: Image.Image) -> Image.Image:
        """Simple function to make an image thumbnail"""
        imgcopy = img.copy()
        imgcopy.thumbnail((48, 48))
        return imgcopy

    # Load a dataframe from files in an S3 bucket
    df = DataFrame.from_files("s3://daft-public-data/laion-sample-images/*")

    # Get the AWS S3 url of each image
    df = df.select(lit("s3://").str.concat(df["name"]).alias("s3_url"))

    # Download images and load as a PIL Image object
    df = df.with_column("image", df["s3_url"].url.download().apply(lambda data: Image.open(io.BytesIO(data))))

    # Generate thumbnails from images
    df = df.with_column("thumbnail", df["image"].apply(get_thumbnail))

    df.show(3)

|Quickstart Image|


More Resources
^^^^^^^^^^^^^^

* `10-minute tour of Daft <https://getdaft.io/learn/10-min.html>`_ - learn more about Daft's full range of capabilities including dataloading from URLs, joins, user-defined functions (UDF), groupby, aggregations and more.
* `User Guide <https://getdaft.io/learn/user_guides.html>`_ - take a deep-dive into each topic within Daft
* `API Reference <https://getdaft.io/api_docs.html>`_ - API reference for public classes/functions of Daft

License
-------

Daft has an Apache 2.0 license - please see the LICENSE file.

.. |Quickstart Image| image:: https://user-images.githubusercontent.com/17691182/200086119-fb73037b-8b4e-414a-9060-a44122f0c290.png
   :alt: Dataframe code to load a folder of images from AWS S3 and create thumbnails
   :height: 256

.. |Banner| image:: https://user-images.githubusercontent.com/17691182/190476440-28f29e87-8e3b-41c4-9c28-e112e595f558.png
   :target: https://www.getdaft.io
   :alt: Daft dataframes can load any data such as PDF documents, images, protobufs, csv, parquet and audio files into a table dataframe structure for easy querying

.. |CI| image:: https://github.com/Eventual-Inc/Daft/actions/workflows/python-package.yml/badge.svg
   :target: https://github.com/Eventual-Inc/Daft/actions/workflows/python-package.yml?query=branch:main
   :alt: Github Actions tests

.. |PyPI| image:: https://img.shields.io/pypi/v/getdaft.svg?label=pip&logo=PyPI&logoColor=white
   :target: https://pypi.org/project/getdaft
   :alt: PyPI

.. |Latest Tag| image:: https://img.shields.io/github/v/tag/Eventual-Inc/Daft?label=latest&logo=GitHub
   :target: https://github.com/Eventual-Inc/Daft/tags
   :alt: latest tag

