- Apache ™ Hadoop® project develops open source software for reliable and scalable distributed computing.
Apache Hadoop software is a framework for distributed processing of large amounts of data. The technology makes it possible to create solutions that can run both on a single server and on thousands of servers.
Error handling and parallelization of data and calculations are built into the technology.
- R is a free development environment for statistics, machine learning and graphic presentation. The development environment is used by Oracle, IBM, Microsoft, Google, educational and research institutions and many others.
- Statsmodels is a framework for data exploration, estimation of statistical models and implementation of statistical tests.
The framework contains a comprehensive list of descriptive statistics, statistical tests, plot functions. Many companies and academic institutions use this framework.
- leading no-sql document database system.
- Apache Spark™ is a very fast and scalable engine for large scale data processing. Spark runs on standard Intel affordable and cost-effective servers.
- leading SQL database.