Since it became an Apache Top Level Project early 2008, Hadoop has established itself as the de-facto industry standard for batch processing. Running data analysis and crunching petabytes of data is no longer fiction. But the MapReduce framework does have two major downsides: query latency and data freshness.
At the same time, businesses have started to exchange more and more data through REST API, leveraging HTTP words (GET, POST, PUT, DELETE) and URI (for instance http://company/api/v2/domain/identifier), pushing the need to read data in a random access style – from simple key/value to complex queries.
Enhancing the BigData stack with real time search capabilities is the next natural step for the Hadoop ecosystem, because the MapReduce framework was not designed with synchronous processing in mind.
There is a lot of traction today in this area and this project will try to answer the question of how to fill in this gap with specific open-source components and build a dedicated platform that will enable real-time queries on an Internet-scale data set. This project will be carried out in cooperation with VeriSign Inc.