As one of the largest media corporations in the United States, its website garners visitors from around the world exceeding millions of visitors daily. It has published millions and millions of articles and has content that dates back to hundred plus years of content that is accessible from any device. With growing data, the client was challenged with the inability to understand customers’ psychographics, behaviours, interests about online actions, trends and motivations to better respond to the needs of its customers.

Furthermore, the growing number of clicks per page, traffic volume and variety of new interest area contents hampered performance and lacked the ability to scale in real-time. The existing solution had challenges not only generating insights on reader behavior but also catch up with the growing published contents coming in real time from the content mangement systems and the ability to ingest those via search technologies, visitors and access. Integrating buying behaviors with sales and marketing campaigns was time-consuming and inefficient, and it could not be capitalized in real-time. Using a traditional relational or nosql based database hindered the use of sophisticated storage or advanced in-database analytics and lacked rapid execution.


As part of the effort, Data Cuve search and visualization architects analyzed both sides of the platform – publishing and consuming ends. Sources such as CMS, third party data and wire stories that could be written from various publishers for consumptions across various services and applications (both internal and external) for search analytics, personalization services towards readers, and other insights around behaviors, usage, pattern detection etc. Once the business goals were defined in terms of latency, visualization requirements, speed and relevance in terms of search content, sentient quality , we evaluated various commercial enterprise ready to go platforms and open source frameworks before coming up with a messaging based architecture combined with a search index.

Using a real time data pipeline, we were able to read and write data and streams of published content and index immediately via a messaging system infrastructure. This enabled every single content that was written to be ready for public consumption immediately for various feed generators, and front-end applications including websites and native apps. Everything was done in real time, more importantly without any data loss.

With the ability to subscribe an indexed data published in real time, gaining insights and provide behavioral data was possible now using analytics and visualization layer.


  • Identify areas of opportunities in terms of new sales and subscriber retention
  • Visualize in real time around reader interests and further enhance the experience by providing personalized engagement services in the website or native apps.
  • Ability to increase the revenue with more engaged readers and upsell with their interests well aligned.