Open Access System for Information Sharing

Department of Computer Science & Engineering (컴퓨터공학과) 1. Journal Papers

Article

Cited 6 time in webofscience

Cited 11 time in scopus

Metadata Downloads

Spoiler detection in TV program tweets SCIE SCOPUS

Title: Spoiler detection in TV program tweets

Authors: Jeon, S; Kim, S; Yu, H

Date Issued: 2016-02-01

Publisher: ELSEVIER SCIENCE INC

Abstract: Watching TV programs at the scheduled airtime is difficult due to time differences between countries or personal circumstances. Not to be a victim of spoilers, people sometimes choose a self imposed isolation from civilization until they have seen their favorite program, such as to stay away from the Internet. However, smartphones allow people to habitually check the SNS messages posted by their friends to maintain their relationships. It leads to the problem of exposing spoilers about their favorite TV programs. To prevent a self imposed isolation from their friends, we need automatic method for detecting spoilers from TV program tweets. To the best of our knowledge, there have been two works that have addressed the spoiler detection task: (1) a keyword matching method and (2) a machine-learning method based on Latent Dirichlet Allocation (LDA). However, they were not designed for short texts as well as the real-world system. The keyword matching method incorrectly predicts most tweets as spoilers. Although the LDA-based method works well on large bodies of text, it fails to accurately detect spoilers from short texts such as Twitter. In this work, we introduce a simple and powerful method of spoiler detection based on four representative features, which are significant indicators of spoilers. To identify and utilize four features, we conduct a precise analysis on real-world tweet data, and we build an SVM-based prediction model based on the result. Using tweets about Dancing with the Stars, and the final of the 2014 World-Cup, we evaluate the effectiveness of the proposed methods on spoiler detection tasks. According to the result, our method achieves greater precision than the competitors while maintaining a comparable recall performance. At the same time, our method outperforms the competitors in terms of processing time, showing that our method is sufficiently lightweight for application to the web-browser. Furthermore, to reduce the labeling cost, we introduce a semi-supervised approach that automatically re-trains the prediction model based on a small amount of labeled data. The experimental results show that the semi-supervised approach delivers performance comparable to that of the previous model. 2015 Elsevier Inc. All rights reserved.