Metadata-Version: 2.1
Name: PySClump
Version: 0.0.1
Summary: SClump implemented in Python.
Home-page: https://github.com/ameya98/PySClump
Author: Ameya Daigavane
Author-email: ameya.d.98@gmail.com
License: UNKNOWN
Description: # PySClump [![Build Status](https://travis-ci.com/ameya98/PySClump.svg?token=EvToDgVLa7n6xFgyBhTU&branch=master)](https://travis-ci.com/ameya98/PySClump)
        A Python implementation of 'Spectral Clustering in Heterogeneous Information Networks' from AAAI, 2019.  
        This was heavily inspired by the original [implementation](https://github.com/lixiang3776/SClump) in MATLAB.
        
        <figure>
            <p align="center">
                <img src="visualization.png">
                <figcaption>A similarity matrix represented as a graph. Nodes are coloured according to their assigned cluster.</figcaption>
            </p>
        </figure>
        
        ## References
        Li, Xiang and Kao, Ben and Ren, Zhaochun and Yin, Dawei. 'Spectral Clustering in Heterogeneous Information Networks'. Proceedings of the AAAI Conference on Artificial Intelligence: 4221-4228.
        
        ## Installation
        PySClump is available on PyPI! Install with:
        ```
        pip install pysclump
        ```
        
        ## PathSim
        We provide PathSim as a similarity metric between pairs of nodes. However, PySClump works with any similarity metric! See the SClump section below.
        
        ```
        from pathsim import PathSim
        import numpy as np
        
        type_lists = {
            'A': ['Mike', 'Jim', 'Mary', 'Bob', 'Ann'],
            'C': ['SIGMOD', 'VLDB', 'ICDE', 'KDD'],
            'V': ['Pasadena', 'Guwahati', 'Bangalore']
        }
        
        incidence_matrices = { 
           'AC': np.array([[2, 1, 0, 0], [50, 20, 0, 0], [2, 0, 1, 0], [2, 1, 0, 0], [0, 0, 1, 1]]),
           'VC': np.array([[3, 1, 1, 1], [1, 0, 0, 0], [2, 1, 0, 1]])
        }
        
        # Create PathSim instance.
        ps = PathSim(type_lists, incidence_matrices)
        
        # Get the similarity between two authors (indicated by type 'A').
        ps.pathsim('Mike', 'Jim', metapath='ACA')
        
        # Get the similarity matrix M for the metapath.
        ps.compute_similarity_matrix(metapath='ACVCA')
        ```
        
        ## SClump
        Once we have the similarity matrices (PathSim shown here), running SClump is really simple.
        ```
        # Construct similarity matrices.
        similarity_matrices = {
            'ACA': pathsim.compute_similarity_matrix(metapath='ACA'),
            'ACVCA': pathsim.compute_similarity_matrix(metapath='ACVCA'),
        }
        
        # Create SClump instance.
        sclump = SClump(similarity_matrices, num_clusters=2)
        
        # Run the algorithm!
        labels, learned_similarity_matrix = sclump.run()
        ```
        
        If we have n nodes to be clustered into k clusters, *labels* is a n-by-1 vector, with entries from 0 to (k - 1) indicating the cluster index assigned. *learned_similarity_matrix* is the n-by-n matrix S referenced in the paper, indicating node-to-node similarity.
        
        The clusters themselves are assigned by k++-means clustering using the learned similarity matrix.
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
