Data Collection in a Social Network with Weighted Seed Selection and Data Analysis Based on Rule-based Methods

Data Collection in a Social Network with Weighted Seed Selection and Data Analysis Based on Rule-based Methods

4.11 - 1251 ratings - Source



In recent years, with the increasing popularity of diverse online social network sites, such as Facebook, Twitter, Blogger, YouTube, LinkedIn, and MySpace, a massive amount of data has become available. Analyzing sets of data in social media can lead to some understanding of individual and human behavior, detection of hot topics, identification of influential people, and/or discovery of a group or community. However, it is difficult to discover useful information from social data without automated information processing because of three main characteristics of social media data sets: the data is large, noisy, and dynamic. In order to overcome these challenges of social media, data-mining techniques can be used by data seekers to discover a diversity of perspectives that would otherwise not be possible. To apply data-mining techniques to social data, the target data set must be prepared from social networks before the analyzing process. For these reasons, Twitter enables researchers and data analyzers to access a variety of data in Twitter by providing Application Programming Interface (API). However, there is a restriction on data collection from Twitter: the method call of Twitter API is limited. Furthermore, it is impossible to collect enough data to apply data analysis techniques and filter out unnecessary data, such as spam messages without an automated data collector and filter. In order to overcome these data access problems, we aim to design and implement our own Twitter data-collection tool, which includes data filtering and analysis capabilities. This allows us, as well as other researchers and data seekers, to build their own Twitter dataset. First, in this research we introduce the design specifications and explain the implementation details of the Twitter Data Collecting Tool we developed. To introduce and explain the implementation details and the design specifications of the Twitter Data Collecting Tool, the Unified Modeling Language (UML) diagram is used. We next propose a new algorithm that selects the best seed nodes with limited resources and time to collect the data related to a specific topic and keyword efficiently. The algorithm also evaluates various user influence and activity factors, and updates the seed nodes dynamically during the gathering process. After the gathering process, we compared two results, one from this algorithm and one from a specialist. In the final chapter, we provide an analysis of Twitter data gathered by the Twitter Data Collecting Tool in a case study about the Super Bowl 2012 and Super Bowl 2013. The case study aims to address the question of how people use Twitter and to assess the power of Twitter in creating consumer interest in brands and commercials. The main objective of this study is to find the relationship between Twitter and Super Bowl advertisements by analyzing data on Twitter. This research shows that the Twitter Data Collecting Tool allows researchers to gather users' information, follow relationships and tweets from Twitter. Furthermore, the data collection result with the seed selection algorithm proved that the efficiency of the algorithm for collecting more keyword-related data is higher than the existing approach. In addition, data-mining techniques and rule-based data analysis are applied to the gathered data. With these results, we could prove that the Twitter Data Collecting Tool is able to gather a huge amount of data from Twitter and filter the data so it can be used in research areas. This paper will be valuable to those who may want to build their own Twitter dataset, apply customized filtering options to get rid of unnecessary, noisy data, and analyze social data to discover new knowledge.Furthermore, the data collection result with the seed selection algorithm proved that the efficiency of the algorithm for collecting more keyword-related data is higher than the existing approach.


Title:Data Collection in a Social Network with Weighted Seed Selection and Data Analysis Based on Rule-based Methods
Author: Changhyun Byun
Publisher: - 2013
ISBN-13:

You must register with us as either a Registered User before you can Download this Book. You'll be greeted by a simple sign-up page.

Once you have finished the sign-up process, you will be redirected to your download Book page.

How it works:
  • 1. Register a free 1 month Trial Account.
  • 2. Download as many books as you like (Personal use)
  • 3. Cancel the membership at any time if not satisfied.


Click button below to register and download Ebook
Privacy Policy | Contact | DMCA