About SimSeerX

SimSeerX is a similar document search engine. It accepts a document as input and then uses several similarity functions to identify similar documents and rank them.

Similarity in SimSeerX

SimSeerX currently supports 3 notions of similarity:

Document Collections

SimSeerX currently indexes the following document collections: More collections coming soon!


SimSeerX is built based on the Play! Framework and makes use of Solr/Lucene with custom similarity functions for indexing and searching. Information extraction is performed using CiteSeerExtractor (Williams et al., 2014) and keyphrase extraction is performed using Maui.


The SimSeerX API is described at http://simseerx.ist.psu.edu/api.


Kyle Williams
Kyle Williams developed, runs and maintains SimSeerX as part of his PhD research.
Prof. C. Lee Giles
Prof. C. Lee Giles is the PI on the SimSeerX project


A paper describing SimSeerX appeared in ACM Document Engineering 2014.

Williams, K., Wu, J., & Giles, C. L. (2014). SimSeerX: a similar document search engine. In Proceedings of the 2014 ACM symposium on Document engineering (pp. 143-146). ACM. [Download]


