Big Data and Social Media Analytics

Business: Social network
Environment: Java, Hadoop
Platforms: Windows, Linux
Technology: Hadoop, Java+Web, Kafka, Cassandra, Hive, HBASE


One of the world’s well-known social networks hired Scand development service team to analyze their users’ insights: a number of new users joining the network daily, their activity and online behavioral patterns, the detection of viral patterns, collaborative filtering, etc.

Generally, the customer wanted to track users’ digital profiles and in case needed to get access to all required information ad hoc to facilitate ad campaigns running online.

The development team decided to perform a general uplift of the customer’s online analytics system using a bleeding-edge technology platform. The reinforced solution was required to provide a customizable and flexible access to the analytical data of users’ digital profiles: static and dynamic. It is also supposed to track demographic statistics, pulled together from a variety of sources.


First of all, Scand Big Data developers add anchor points to each social network page (1x1 pixel images) and such manipulation helps to mark and track all users via cookies. So, if there are new users with no cookies attached, they are added. The data is collected in HBASE with Kafka consumed.

Secondly, these marks are gathered together with social network profiles data: age, sex, user-defined tags, etc. Once that is performed, a chance to move forward and add link translation for the majority of ‘native’ links in the users’ posts becomes available. In other words, if a user makes such link available on their walls, the clicks and users’ interests are intercepted in the same manner as for the views. Cosine-based collaborative filtering is applied to the set of per-user views and clicks. It means that if a user clicks on similar links and looks through similar walls it is a good idea to let show them the block ‘see also’ based on votes from other users having similar interests.

Once, digital profiles are obtained, filters on age groups, sex and tags already gathered were added. It increases the relevancy of proposed blocks in a few more fractions of the percentage.

Now the target campaign is ready to be launched. Control groups with ‘native’ traffic are added and other groups are targeted through viral news, specific advertisements. To track changes in the users’ behavioral models some prepared news posts are used. This let network owners get a powerful tool to manage the advertising content.

System Highlights

The Social Networking Big Data solution supports the following features:

  • Humongous amount of data procession;
  • Automatic reporting;
  • Analytical views in the form of tables and charts;
  • Summary views;
  • Analytical dashboards;
  • Fast gathering of analytics to key value storage via Kafka and HBASE;
  • Fast file-based data capture with the help of Cassandra;
  • SQL alike requests and the ability to proceed them through Hive;
  • Targeting campaigns;


Having Big Data in social media implemented and organizing heavy use of statistics analysis and gathering gave the project extra abilities to pre-provision the hardware equipment in case new viral news arrives and let the advertising campaign run smoothly and touch only the target audience. The simple analysis of ‘content boosters’ makes the changes in users’ digital profiles visible and the resources of the network are eventually used in a more efficient way.

If you are interested in any software development with similar functionality, please, do not hesitate to contact us using the form below.

(a confirmation e-mail will be sent to this address)

CAPTCHA Image Reload Image