Metadata-Version: 2.3
Name: py-pinyin-split
Version: 1.0.0
Project-URL: Documentation, https://github.com/lstrobel/pinyin-split#readme
Project-URL: Issues, https://github.com/lstrobel/pinyin-split/issues
Project-URL: Source, https://github.com/lstrobel/pinyin-split
Author-email: lstrobel <mail@lstrobel.com>, Thomas Lee <thomaslee@throput.com>
Maintainer-email: lstrobel <mail@lstrobel.com>
License: MIT
Keywords: chinese,pinyin
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Requires-Dist: pygtrie>=2.5.0
Description-Content-Type: text/markdown

# pinyin-split

A Python library for splitting Hanyu Pinyin phrases into all possible valid syllable combinations. The library supports standard syllables defined in the [Pinyin Table](https://en.wikipedia.org/wiki/Pinyin_table), handles tone marks, and optionally includes non-standard syllables.

Based originally on [pinyinsplit](https://github.com/throput/pinyinsplit) by [@tomlee](https://github.com/tomlee).

## Installation

```bash
pip install pinyin-split
```

## Usage

```python
from pinyin_split import split

# Basic splitting - the below is a valid split. Consider filtering by number of syllables if you want to avoid the unlikely second output
split("nihao")
[['ni', 'hao'], ['ni', 'ha', 'o']]

# Tone marks are fully supported
split("nǐhǎo")
[['nǐ', 'hǎo'], ['nǐ', 'hǎ', 'o']]

split("Běijīng")
[['Běi', 'jīng']]

# Case preservation
split("BeijingDaxue")
[['Bei', 'jing', 'Da', 'xue'], ['Bei', 'jing', 'Da', 'xu', 'e']]

# Multiple valid splits
split("xian")  # Could be 先 or 西安
[['xian'], ['xi', 'an']]

# Punctuation and numbers are handled as boundaries
split("xi'an")
[['xi', 'an']]

split("bei3jing1")
[['bei', 'jing']]

# Complex phrases
split("Jiéguǒtāmenyíngle")
[
    ['Jié', 'guǒ', 'tā', 'men', 'yíng', 'le'],
    ['Jié', 'gu', 'ǒ', 'tā', 'men', 'yíng', 'le'],
    ['Ji', 'é', 'guǒ', 'tā', 'men', 'yíng', 'le'],
    ['Ji', 'é', 'gu', 'ǒ', 'tā', 'men', 'yíng', 'le']
]

# Non-standard syllables (disabled by default)
split("duang")
[['du', 'ang']]

# Enable non-standard syllables
split("duang", include_nonstandard=True)
[['duang'], ['du', 'ang']]

# Invalid input returns empty list
split("xyz")
[]
```
