RapidMiner Studio



You are viewing the RapidMiner Studio documentation for version 9.4 - Check here for latest version

New Automated Model Ops

Rapidminer Studio System Requirements

RapidMiner Studio; RapidMiner Studio What is RapidMiner Studio? RapidMiner Studio is a visual workflow designer that makes data scientists more productive, from the rapid prototyping of ideas to designing mission-critical predictive models. So if you want to update you will need to overwrite the stored data. On the other hand once you have done the import then you have a local copy stored in RapidMiner's repository so you can chose to get rid of the original source file if you want.

RapidMiner Studio What is RapidMiner Studio? RapidMiner Studio is a visual workflow designer that makes data scientists more productive, from the rapid prototyping of ideas to designing mission-critical predictive models.

Follow the fully automated data science path: prepare your data using Turbo Prep, create prediction models via Auto Model and finally put them into production with Model Ops.

  • Deploy the most promising models with one click and score new data via flexible web services or in the UI.
  • Track model performance on an intuitive dashboard and swap easily to the best performing one. Setup an email alert to get notified if a model outperforms the one in production.
  • Evaluate each model with respect to their financial impact instead of pure Data Science metrics.
  • Detect changes in data and their impact on model performance early to address problems.
  • Use our integrated dashboard to keep track of data drift and model performance.

New map visualizations

Visualize geospatial data with the new map visualizations. You can choose from multiple map types with many different configuration options, as well as dozens of maps for geographic regions, continents, and countries. Available map types:

  • Choropleth maps: Used to display numeric values associated to regions (e.g. a country or a state) via a color gradient
  • Categorical maps: Used to visualize regions that belong to a number of distinct categories
  • Point maps: These maps offer latitude and longitude support and display a marker for each coordinate on the selected map

New charts

Three new chart types have been added in addition to some tweaks and fixes to the existing charts:

  • Sunburst chart
  • Chord diagram
  • Parliament chart

Improved Auto Model

Auto Model features several improvements under the hood as well as a few more visible enhancements:

  • All predictive processes generated by Auto Model are now much cleaner, well-structured, and can be understood way easier.
  • Cost-sensitive learning has been added to show the costs / benefits in the validation result. This allows to solve problems (e.g. fraud detection) that involve highly imbalanced data sets (e.g. credit card transaction data).

New data prep and modeling capabilities

Several new operators have been added to ease and enhance data preparation and machine learning:

  • New operators Replace All Missings, Handle Unknown Values, One Hot Encoding and Append (Robust) to easily prepare data for modeling and scoring.
  • New operator Rescale Confidences (Logistic) to rescale confidences even for classification with more than two classes.
  • New operator Cost-Sensitive Scoring: Novel approach for cost-sensitive learning which works for more than two classes.
  • New operators Multi Label Modeling and Multi Label Performance to train and validate a combined model for multiple label columns in a single step.

Enhanced time series forecasting

New operators have been added for

  • Forecasting multiple horizons of a time series with any machine learning model (Multi Horizon Forecast)
  • Validating performance of multi horizon forecasts (Multi Horizon Performance)
  • Sliding window validation for time series data science problems

Enhanced data source connection framework

All RapidMiner-supported connectivity extensions on the Marketplace now use the new data source connection framework, which includes handling connections to

  • MongoDB
  • Cassandra
  • Splunk
  • Solr
  • Mozenda

Enhancements and bug fixes

The following pages describe the enhancements and bug fixes in RapidMiner Studio 9.4.1 releases:

RapidMiner is a predictive analytics and machine learning platform originally developed at the faculty for artificial intelligence at the Technische Universität Dortmund, Germany. Since 2007, RapidMiner Inc. has taken charge of further development of the software. The vendor provides the source code of the tool under the AGPL open source license and has made a limited version of the tool available as an open source version in the past.

RapidMiner has undergone a shift in strategy, limiting the capabilities of the open source version in recent years. The commercial version offers extended access to different common databases, processing of unlimited rows of data, as well as professional technical support. Due to its large user base and its continued investment in the usability and productivity of the tool, RapidMiner seems to be managing this transition well.

Current investment is being directed at enhancing the user interface, upgrades to the community website, further extension of the RapidMiner Marketplace and integration with Hadoop, Spark, TensorFlow and H2O. RapidMiner functionality can be extended with additional plugins, which are made available via the RapidMiner Marketplace. The RapidMiner Extensions Marketplace provides a platform for developers to create data analysis algorithms and publish them to the user community. The RapidMiner portfolio also offers functionality for web-based reports and the code-free creation of interactive web apps. The tool also offers possibilities to integrate with business intelligence software tools including Qlik, Tableau and Jaspersoft. Recently RapidMiner has added functionality for business-user-friendly data preparation and automated machine learning.

RapidMiner targets data scientists and advanced business analysts, and has a strong focus on machine learning methods for classification and clustering algorithms. In order to support novices in advanced analytics, RapidMiner includes an integrated recommendation feature (“Wisdom of Crowds”). Wisdom of Crowds is an innovative feature that guides users by proposing what most RapidMiner users would do as the next step in a specific analysis.

Most customers use RapidMiner for visual analysis/data exploration (67 percent) and data preprocessing (56 percent). 38 percent use it for advanced analytics and 23 percent for operationalization. RapidMiner’s focus is on enabling machine learning, illustrated by the fact that 75 percent of users are data preparation, data discovery and advanced analytics users. However, data preparation and exploration consumes most of the time in analytics projects and many projects have not yet reached the stage of analytic modelling and operationalization. RapidMiner is mostly deployed in smaller and medium-sized companies, which is reflected in a rather low mean and median number of users per installment. In competitive evaluations RapidMiner is evaluated alongside other advanced analytics software products such as KNIME, SPSS Statistics and Modeller, and Azure ML.

Current vs. planned use

RapidMiner Studio

N=18

5 products most often evaluated in competition with RapidMiner

N=20

Total number of users per company

N=21

Advanced analytics users per company

Rapidminer Studio Extensions

N=21

Rapidminer Studio Download

Advanced analytics users (as a percentage of all users)

N=21

Company size (number of employees)

Rapidminer Download Free

N=21