Data Bases & Data Mining
Since the 1970s, data bases have developed into one of the most widespread computing technologies. Fortune-500 companies nowadays typically operate over 10.000 major databases. Both enterprise software (SAP), and huge end user application (Amazon, Facebook) build on database technologies. Meta databases (e.g. computerized product catalogs) pervade engineering, natural, sciences, and medicine. A decade ago, Turing award winner Jim Gray therefore postulated an era of data-centric science where new scientific results stem from mining huge amounts of existing data. Data mining has since become a standard technology in marketing as well as in scientific data management.
Since the beginning of the new century, the scope of databases and data mining has significantly broadened. First, the growth of the World Wide Web has broadened the scope of media to include massive amounts of text (e.g. Google), image (e.g. Flickr), video (e.g. YouTube), audio and spoken language. Second, the Internet of Things produces a massive growth of data streams from mobile systems and sensor networks. Third, high-performance simulation output data need to be analyzed and stored at an unprecedented rate for later reuse. Taken together, the explosion of data far exceeds the capabilities and growth rates of present data management technologies, resulting in what is called the 'Big Data Challenge'. Highly parallel basic algorithms such as MapReduce as well as database storage methods such as Hadoop are being investigated, but even traditional databases are speeded up significantly by novel column-oriented storage techniques such as SAP HANA.