Research and high-fidelity prototype

What is clustering and
how to make a tool out of it

In team with: Tudor Vacaretu, Jan de Wit

Goal

Create a prototype of a general purpose tool that will allow users to perform clustering analysis with meaningful visualisation for preliminary exploration of event sequences.

Result

High-fidelity interactive prototype
(Illustrator + Axure + D3.js)

Research

What is clustering?

A collection of data objects that are "similar" to one another and thus can be treated collectively as one group. But as a collection, they are sufficiently different from other groups.

For this particular study we were interested in the clustering of sequential datasets. As an example of such a dataset we used logs of users who were playing the LIX game.

Used skills
Transitions between skills
Time used
Buckets of time between events
(1sec, 2s, 5s, 10s, 20s, 30s, 60s)
Number of lix interacted with
Average skills per lix
Sequential clustering of the player actions

Two different winning strategies.

We first explored the existing dataset and then applied
k-means and sequential clustering

How do we make clustering easier?

solution:

Create a Wizard of Oz
Clustering Tool

The tool would recommend clustering algorithms to users depending on the type of uploaded data. The tool would have several steps which lead to the clustering results:

Choose dimensions to cluster
Prepare data
Check if the data is clusterable
Choose clustering algorithm
Compute cluster analysis
Validate results

Paper Prototype Evaluation Results:

Flow and purpose were clear
Extra options and attributes were suggested
Minor usability issues

Upload the dataset

The system would automatically recognise the types of data and suggest the dependencies, taking into consideration the sequential data structure.

Choose which
data to cluster

Depending on what you are looking for it is possible to cluster data by any number of dimensions. It is also possible to i.e. ignore repetitive actions or choose to filter data depending on its output result.

Choose clustering
algorithm

By running special functions in the background it would be possible to recommend which cluster algorithms to use and i.e. how many clusters should be made. It is especially useful for users that do not have broad knowledge about clustering but would like to use it.

Cluster Details

While exploring the visualised results of the clustering, the system dynamically displays the data related to a selected cluster such as a list of the data points belonging to it and the probability that they are assigned to the correct cluster, as well as reasons the cluster was formed.

Grouping setting

Cluster results could be grouped not only by clusters but also by the output. For example in case of shopping logs it would be possible to group the clustering results by 'made purchase' / 'didn't purchase' or in case of medical treatment 'recovered' / 'didn't recover'.

Layers of Clusters

It is possible to save some clustering groups to map them on top of different results. This way users would see how the saved group results are represented in the other one, i.e. which percent of the saved group belongs to each chosen cluster.

Walkthrough of the Clustering Tool Prototype

What is clustering and how to make a tool out of it

What is clustering and
how to make a tool out of it