Several existing frameworks (GraphLab, Map/Reduce, etc.) and systems (Cloudera, Bigtop, etc.) that relate with the processing of distributed big data. However, difficulty in programming paradigms, slows down the emergence of novel products and services out of big data frameworks and systems. Furthermore, existing solutions fail to engage non-IT experts into a more direct interaction with enterprise workflows for extracting actionable knowledge from big data.
There is lack of commonly agreed standards and frameworks which makes data integration a very challenging and costly process. There exist some technologies and efforts that can help in accommodating data from multiple heterogeneous sources, like devising standards for common semantic data models and formats, Linked Data (http://linkeddata.org), data anonymization, data aggregation, etc. However, the degree of data sharing and re-use is at an unsatisfactory level, and there is a need for technology advancements to foster data sharing and re-use.
- dependent and independent verification and validation variables of the experiments to be conducted;
- a statistical power analysis to determine the number of experimental subjects required by the cross-sectorial experiments;
- ways to access and engage the required number of experimental subjects;
- a concrete and coherent experimentation schedule for real-life industrial cases;
- industrial-validated benchmarks that can demonstrate significant increases in various parameters of data processing, the speed of data analysis, the size of data assets that can be processed and so on;
- a verification and validation approach including standards and benchmarks of the Big Data domain.
The BDV Reference Model (BDV SRIA, European Big Data Value Strategic Research and Innovation Agenda, Version 4.0, October 2017) has been developed by the BDVA, taking into account input from technical experts and stakeholders along the whole Big Data Value chain, as well as interactions with other related PPPs. The BDV Reference Model may serve as a common reference framework to locate Big Data technologies on the overall IT stack. It addresses the main concerns and aspects to be considered for Big Data Value systems. The BDV Reference Model distinguishes between two different elements. On the one hand, it describes the elements that are at the core of the BDVA; on the other, it outlines the features that are developed in strong collaboration with related European activities.
The BDV Reference Model is structured into horizontal and vertical concerns.
- Horizontal concerns cover specific aspects along the data processing chain, starting with data collection and ingestion, and extending to data visualisation. It should be noted that the horizontal concerns do not imply a layered architecture. As an example, data visualisation may be applied directly to collected data (the data management aspect) without the need for data processing and analytics.
- Vertical concerns address cross-cutting issues, which may affect all the horizontal concerns. In addition, vertical concerns may also involve non-technical aspects.
- Develops data processing tools and techniques applicable in real-world settings, and demonstrates significant increase of speed of data throughput and accessibility;
- Releases a safe environment for methodological big data experimentation, for the development of new products, services, and tools;
- Develops technologies that increase the efficiency and competitiveness of all EU companies and organisations that need to manage vast and complex amounts of data;
- Offers tools and services for fast ingestion and consolidation of both realistic and fabricated data from heterogeneous sources;
- Facilitates simultaneous batch and real-time processing of Big Data;
- Offers enhancement of real–Time Data with batch (historical) Data – off–loading of Compute intensive Operations to GPUs – parallel Processing of many Input and heterogeneous Channels;
- Provides a pool of algorithms from traditional ETL/aggregation algorithms to graph processing;
- Extends Big Data runtimes and tools to support fast and scalable data analytics;
- Takes advantage of in-database computation, indexing and filtering and improves programmer’s productivity through simple interfaces and a sequential programming paradigm;
- Develops a distributed large-scale framework for powerful and scalable Data Processing;
- Promotes management of heterogeneous and federated infrastructures including Cloud and GPU resources and orchestration across diverse resource providers;
- Integrates infrastructure elasticity capabilities offered by COMPs and Hecuba runtime environments;
- Automatizes and save resources (cost & time), and move procedures which could require human interaction to automated procedures, minimizing costs (money & time) and in some cases even human error.
- Enables telecom companies to process immense amount of data which are constantly being produced by their customers. This processing allows the extraction of critical learnings about customer behavior and system performance, and enables the allocation of appropriate resources where needed, for better optimization and resource utilization.
- Enables better system maintenance, and predictive resource allocation, thus reducing customer churn and complaints.
- Probes customers for new products and markets, and the availability of big data during such explorations can be useful for proving preliminary efforts worthy of further investment or abandoning such efforts when proven non-promising.
- Introduces real-time big data analytics processing in plant floor (e.g. car manufacturing)