Thematic Exploration of YouTube Data: A Methodology for Discovering Latent Topics

An automated research method that uses the topic modeling algorithm called Latent Dirichlet Allocation (LDA) to discover latent topics and explore potential themes in YouTube transcript data.

In a study published in 2015 by Ganesan, Brantley, Pan, and Chen (Ganesan, Brantley, Pan, & Chen, 2015) researchers recognized that there is a problem with the search process when trying to visualize the correlation between a large collections of documents and a given set of topics. Chaney and Blei emphasize in a 2012 study the importance of science, industry, and culture to have the ability to explore the hidden structures found within large collections of unorganized documents (Chaney & Blei, 2012). Wang and Blei published an article in 2011 citing the difficulty of finding and recommending relevant scientific research papers to communities of researchers (Wang & Blei, 2011). Finally, an article written by Roberts, Stewart, and Airoldi in 2016 discusses the popularity of statistical models and how they are used for exploring large collections of documents to measure latent linguistic, political, and psychological variables in the social sciences (Roberts, Stewart, & Airoldi, 2016).

All of these studies can be aggregated together to describe the problem that continues to challenge researchers and information management practitioners whom are attempting to explore a latent set of topics which form a common theme within a large collection of documents. Documents are being collected from a growing number of sources that continue to offer the problem of complexity and lack of intuitive correlation. Novel research methods that include a mixture of technologies working together as a framework are needed to address these challenges found within a large collection of documents. This research method is proposing proposes a framework that can be used by researchers and practitioners to discover latent topics found within a target set of YouTube video transcript documents.

Author: Clinton Daniel

Cite as: Daniel, C. (2017). Thematic exploration of YouTube data: A methodology for discovering latent topics. Muma Business Review, 1(12). 141-155.