Co-Clustering Can Provide Industrial Data Pattern Discovery

Dan Yarmoulk Dan Yarmoulk
June 20, 2018 IoT & Automation

Ready to learn Internet of Things? Browse courses like Internet of Things (IoT) Training developed by industry thought leaders and Experfy in Harvard Innovation Lab.

In spite of the rapid development in data acquisition technology resulting in the explosive collection of acquired datasets, techniques such as data organization and classification, manipulation, and analysis of very large, diverse, heterogeneous datasets have only evolved modestly. This has led to hindrances in effective utility and better understanding of the acquired, large-scale data for knowledge discovery. In an industrial setting, an interesting visual from McKinsey illustrates that despite collecting data from tens of thousands of sensors, less than 1% is actually utilized.

 

Data clustering is the classification of data objects into different groups (clusters) such that data objects in one group are similar together and dissimilar from another group. Typically, homogeneous data objects, i.e. data objects having the same data type, are grouped together using some of the well-known clustering algorithms. However, many of the real world data clustering problems arising in data mining applications are pair-wise heterogeneous in nature. Clustering problems of these kinds have two data types that need to be clustered together. For example, in a customer relationship management (CRM) application, it is desirable to co-cluster customers and items purchased to study items of interest for particular category of customers. Customized product promotion campaigns are then targeted at appropriate prospective customers. Collaborative information filtering applications such as movie recommender systems co-cluster the accumulated movie rating provided by viewers and the movies they have watched. A new viewer submits a movie rating for a movie he/she has liked. Using this information, the viewer is recommended other movies by classifying the rating he/she provided to a viewer ratings-movies watched cluster. In some of the biomedical applications, co-clustering is performed on patient symptoms and medical diagnosis for patients in the database. Computer-aided diagnosis is then achieved for a patient based on symptoms provided. From the above discussion, it is clear that the existence of two pair-wise data types is “hand-in-hand”. In other words, one data type in this scenario induces clustering of the other data type and vice-versa. Hence, applying conventional clustering algorithms separately to each of the data types cannot produce meaningful co-clustering results.

Typically, the data is stored in a contingency or co-occurrence matrix C where rows and columns of the matrix represent the data types to be co-clustered. An entry Cij of the matrix signifies the relation between the data type represented by row i and column j. Co-clustering is the problem of deriving sub-matrices from the larger data matrix by simultaneously clustering rows and columns of the data matrix. Names such as bi-clustering, bi-dimensional clustering, and block clustering, among others, are often used in the literature to refer to the same problem formulation.

One technique for achieving co-clustering is to approach the problem from a graph theoretic point of view. That is, we model the relationship between the two data types in the co-clustering problem using a weighted bipartite graph model. The two data types represent the two kinds of vertices in the bipartite graph. Data co-clustering is achieved by partitioning the bipartite graph.

 

The square and circular vertices (m and r, respectively) denote the two data types in the co-clustering problem that are represented by the bipartite graph. Partitioning this bipartite graph leads to co-clustering of the two data types.

I would welcome any conversation on application development to provide stronger insights for a variety of industries. We can move rapidly into Industry 4.0 by combining subject matter expertise, data collection methods and next-generation data science tools, beyond many of the "me too" products.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Dan Yarmoulk

    Tags
    Internet of Things
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    The Two Cardinal Rules of Cryptocurrency

    The Two Cardinal Rules of Cryptocurrency

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in IoT & Automation
    IoT & Automation
    Could the IoT Help End Hunger? Farmers Are Finding Out

    Internet of Things (IoT) gadgets are everywhere. Cars, buildings, roadways, airplanes, home appliances, and other items have tens of billions of sensors, processors, and internet-connected gadgets. IoT devices detect motion, regulate temperature, share and collect data, measure weather, and provide location information, power logistics, and medical research. They also enable self-driving vehicles, to name just

    5 MINUTES READ Continue Reading »
    IoT & Automation
    10 Biggest Opportunities for IoT Innovation in 2021

    IoT is a powerful economic driver. IoT Innovation is actively shaping businesses and consumer trends. Most of the technologies developed before and during the pandemic address the Internet of Things directly or indirectly. From healthcare and retail to automobile and manufacturing, IoT innovations are opening new avenues across industries.  It covers almost every segment of

    8 MINUTES READ Continue Reading »
    IoT & Automation
    10 Things to Consider When Starting an IoT Project

    One of the biggest issues companies face when starting an IoT project is deciding who should be responsible. Should it be the engineering team that is responsible for the core technicalities of the device, or should it be the product management team that is responsible for the end functionalities of the IoT product? Starting on

    8 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.