Introduction to H2O and Java

0

Introduction to H2O and Java Leveraging AutoML for Faster Predictions

Introduction to H2O and Java
Introduction to H2O and Java

1. Overview

Machine learning is essential in modern software development. We build models with various architectures, train them using different algorithms, and achieve incredible results by improving our systems with neural networks.

In this tutorial, we’ll explore the H2O platform, a powerful open-source machine learning framework. It simplifies the process of creating, training, and tuning models, making it an excellent choice for developers and data scientists. H2O's seamless integration with Java and its AutoML feature make it stand out among other ML tools.

2. Install the Open Source H2O Platform

We can download H2O from its official website. Once download, starting the H2O platform is straightforward:

java -jar h2o.jar

After the application starts, access the web console at http://localhost:54321. On the main page, 

Introduction to H2O and Java
Introduction to H2O and Java

you'll find a list of actions and features available in the platform. Navigate to the Help option in the top menu and select Assist Me for guidance.

3. Prepare the Dataset

Before using the platform, we need to prepare a dataset for model training. For this tutorial, we’ll solve a classic classification problem using the Iris dataset.

This dataset contains various features of flowers, such as:

  • Sepal length and width
  • Petal length and width
  • The flower variety (Setosa, Versicolor, Virginica)

Download the dataset in CSV format. A sample of the dataset looks like this:

"sepal.length","sepal.width","petal.length","petal.width","variety"
5.1,3.5,1.4,.2,"Setosa"
4.9,3,1.4,.2,"Setosa"
4.7,3.2,1.3,.2,"Setosa"
...
5.8,2.7,3.9,1.2,"Versicolor"
6,2.7,5.1,1.6,"Versicolor"
5.4,3,4.5,1.5,"Versicolor"
...
6.5,3,5.2,2,"Virginica"
6.2,3.4,5.4,2.3,"Virginica"
5.9,3,5.1,1.8,"Virginica"


4. Train the Model

4.1. Import Dataset

Upload the dataset using the Import Files option under the Data menu. After selecting the file, 

Introduction to H2O and Java
Introduction to H2O and Java

click the Import button.

4.2. Prepare Training and Test Datasets


Introduction to H2O and Java
Introduction to H2O and Java

Split the dataset into training and testing sets using the Split Frame option. 

Introduction to H2O and Java
Introduction to H2O and Java

Apply an 80/20 ratio for effective training and validation.

4.3. Build the Model

Navigate to the Model menu and select the Random Forest algorithm. 

Introduction to H2O and Java
Introduction to H2O and Java

Configure the parameters such as:

  • training_frame
  • validation_frame
  • response_column
Introduction to H2O and Java
Introduction to H2O and Java
Once set, click Build Model.

4.4. AutoML Function

H2O's AutoML feature simplifies model selection. Under the Model menu, select Run AutoML and configure parameters like max_runtime_secs

Introduction to H2O and Java
Introduction to H2O and Java

AutoML ranks models based on performance, making it easier to choose the best one.

Introduction to H2O and Java
Introduction to H2O and Java

4.5. Download the Model

After training, download the model artifacts:

  • Download Gen Model: A JAR file for Java applications.
Introduction to H2O and Java
Introduction to H2O and Java
  • Download Model Deployment Package (MOJO): The model file itself.
Introduction to H2O and Java
Introduction to H2O and Java


5. Use the Model Predictions From Java Application

5.1. Add the H2O Archives

Place the downloaded H2O archives in your project’s libs folder and add them to the classpath.

5.2. Dependencies

Include the H2O dependency in your project configuration to integrate the model seamlessly.

<dependency>
    <groupId>ai.h2o</groupId>
    <artifactId>h2o-genmodel</artifactId>
    <version>1.0</version>
    <scope>system</scope>
    <systemPath>${project.basedir}/libs/h2o-genmodel.jar</systemPath>
</dependency>

5.3. Use the Manually Built Model for Prediction

Using the MOJO archive, wrap the model with the EasyPredictModelWrapper class.


public class H2OModelLiveTest {

    Logger logger = LoggerFactory.getLogger(H2OModelLiveTest.class);

    @Test
    public void givenH2OTrainedModel_whenPredictTheIrisByFeatures_thenExpectedFlowerShouldBeReturned() throws IOException, PredictException {
        String mojoFilePath = "libs/mojo.zip";

        MojoModel mojoModel = MojoModel.load(mojoFilePath);
        EasyPredictModelWrapper model = new EasyPredictModelWrapper(mojoModel);

        RowData row = new RowData();
        row.put("sepal.length", 5.1);
        row.put("sepal.width", 3.4);
        row.put("petal.length", 4.6);
        row.put("petal.width", 1.2);

        MultinomialModelPrediction prediction = model.predictMultinomial(row);

        Assertions.assertEquals("Versicolor", prediction.label);

        logger.info("Class probabilities: ");
        for (int i = 0; i < prediction.classProbabilities.length; i++) {
            logger.info("Class " + i + ": " + prediction.classProbabilities[i]);
        }
    }
}

Create an input row with the flower features and use the predictMultinomial() method for classification.

Predicted: Versicolor
Probability: 0.9597
19:33:48.648 [main] INFO  com.baeldung.h2o.H2OModelLiveTest - Class probabilities: 
19:33:48.653 [main] INFO  com.baeldung.h2o.H2OModelLiveTest - Class 0: 0.016846955011789237
19:33:48.653 [main] INFO  com.baeldung.h2o.H2OModelLiveTest - Class 1: 0.9597659357519948
19:33:48.653 [main] INFO  com.baeldung.h2o.H2OModelLiveTest - Class 2: 0.023387109236216036

5.4. Use the AutoML Model for Prediction

Use the best model proposed by AutoML to make predictions.

@Test
public void givenH2OTrainedAutoMLModel_whenPredictTheIrisByFeatures_thenExpectedFlowerShouldBeReturned() throws IOException, PredictException {
    String mojoFilePath = "libs/automl_model.zip";

    MojoModel mojoModel = MojoModel.load(mojoFilePath);
    EasyPredictModelWrapper model = new EasyPredictModelWrapper(mojoModel);

    RowData row = new RowData();
    row.put("sepal.length", 5.1);
    row.put("sepal.width", 3.4);introduction-to-
    row.put("petal.length", 4.6);
    row.put("petal.width", 1.2);

    MultinomialModelPrediction prediction = model.predictMultinomial(row);

    Assertions.assertEquals("Versicolor", prediction.label);

    logger.info("Class probabilities: ");
    for (int i = 0; i < prediction.classProbabilities.length; i++) {
        logger.info("Class " + i + ": " + prediction.classProbabilities[i]);
    }
}

Although it may have a slightly lower probability, it provides another perspective on the classification.

20:28:06.440 [main] INFO  com.baeldung.h2o.H2OModelLiveTest - Class probabilities: 
20:28:06.443 [main] INFO  com.baeldung.h2o.H2OModelLiveTest - Class 0: 0.08536296008169375
20:28:06.443 [main] INFO  com.baeldung.h2o.H2OModelLiveTest - Class 1: 0.8451806663486182
20:28:06.443 [main] INFO  com.baeldung.h2o.H2OModelLiveTest - Class 2: 0.06945637356968806

6. Conclusion

In this article, we explored the H2O platform, a robust and user-friendly tool for machine learning. By leveraging its features like AutoML and Java integration, we can build powerful models without delving deep into Python-based ML stacks. H2O's ease of use and flexibility make it a valuable asset for both beginners and experienced developers aiming to implement machine learning in their applications.

Key takeaways:

  • Simplified ML model training and deployment with H2O.
  • AutoML feature for automated model selection.
  • Seamless integration with Java for production-ready applications.

Start experimenting with H2O today and unlock the potential of machine learning in your projects!

Post a Comment

0Comments
Post a Comment (0)

#buttons=(Accept !) #days=(20)

Our website uses cookies to enhance your experience. Learn More
Accept !
✨ Updates