Research and high-fidelity prototype

What is clustering and
how to make a tool out of it

In team with: Tudor Vacaretu, Jan de Wit
Goal
Create a prototype of a general purpose tool that will allow users to perform clustering analysis with meaningful visualisation for preliminary exploration of event sequences.

Result
High-fidelity interactive prototype
(Illustrator + Axure + D3.js)


Research
What is clustering?
A collection of data objects that are "similar" to one another and thus can be treated collectively as one group. But as a collection, they are sufficiently different from other groups.

For this particular study we were interested in the clustering of sequential datasets. As an example of such a dataset we used logs of users who were playing the LIX game.
What did we cluster to learn more about it?
  • Used skills
  • Transitions between skills
  • Time used
  • Buckets of time between events
    (1sec, 2s, 5s, 10s, 20s, 30s, 60s)
  • Number of lix interacted with
  • Average skills per lix
  • Sequential clustering of the player actions
What did we find in the data?
Two different winning strategies.
We first explored the existing dataset and then applied
k-means and sequential clustering
How do we make clustering easier?
solution:
Create a Wizard of Oz
Clustering Tool
The tool would recommend clustering algorithms to users depending on the type of uploaded data. The tool would have several steps which lead to the clustering results:
  1. Choose dimensions to cluster
  2. Prepare data
  3. Check if the data is clusterable
  4. Choose clustering algorithm
  5. Compute cluster analysis
  6. Validate results
Paper Prototype Evaluation Results:
  • Flow and purpose were clear
  • Extra options and attributes were suggested
  • Minor usability issues
Upload the dataset
The system would automatically recognise the types of data and suggest the dependencies, taking into consideration the sequential data structure.
Choose which
data to cluster
Depending on what you are looking for it is possible to cluster data by any number of dimensions. It is also possible to i.e. ignore repetitive actions or choose to filter data depending on its output result.
Choose clustering
algorithm
By running special functions in the background it would be possible to recommend which cluster algorithms to use and i.e. how many clusters should be made. It is especially useful for users that do not have broad knowledge about clustering but would like to use it.
Cluster Details
While exploring the visualised results of the clustering, the system dynamically displays the data related to a selected cluster such as a list of the data points belonging to it and the probability that they are assigned to the correct cluster, as well as reasons the cluster was formed.
Grouping setting
Cluster results could be grouped not only by clusters but also by the output. For example in case of shopping logs it would be possible to group the clustering results by 'made purchase' / 'didn't purchase' or in case of medical treatment 'recovered' / 'didn't recover'.
Layers of Clusters
It is possible to save some clustering groups to map them on top of different results. This way users would see how the saved group results are represented in the other one, i.e. which percent of the saved group belongs to each chosen cluster.
Walkthrough of the Clustering Tool Prototype
This site was made on Tilda — a website builder that helps to create a website without any code
Create a website