MIT Libraries Machine Learning Studio by MITLibraries

In my last blog post, I talked about exploring concept clusters based on different disciplines’ uses of a particular word. However, I got into these neural nets in the first place because I was motivated by the question of “why does browse suck?”, and I think machine learning creates possibilities for novel interfaces which might not suck. In short, I really want to explore documents, and give people new ways of exploring them. Or, to put it another way: if you liked this thesis, what others might you like?

Visualizing an entire thesis corpus is harder than visualizing the local network around a word, because many of our departments have produced thousands of theses over the years (and also because I know approximately nothing about data visualization and d3). So what I’ve done here is process the aero-astro department down into subgraphs, where everything in the subgraph is related to at least one other subgraph node by at least a threshold amount. Then we can explore subgraphs of more manageable size, and finally start to see what clusters of related documents might emerge.

What do you think the labels for these clusters should be?

click a cluster to see its component theses

MIT Libraries home

MIT Libraries Machine Learning Studio

Latest Posts

Visualizing A Department