Question 1

How to install and use isolation-forest with Apache Spark?

Accepted Answer

Add the library as a dependency in Gradle or Maven, then import the Scala classes to train models on Spark DataFrames. The README provides step-by-step examples for building, training, and scoring with code snippets.

Question 2

What's the difference between Isolation Forest and Extended Isolation Forest in this library?

Accepted Answer

Standard Isolation Forest uses axis-aligned splits, while Extended Isolation Forest uses random hyperplane splits to reduce directional bias. The extended version is better for correlated features but may perform worse on some datasets, as shown in benchmarks.

Question 3

Can I run isolation-forest models in Python without Spark?

Accepted Answer

For inference only, you can export standard Isolation Forest models to ONNX format using the Python converter, then run them with ONNX runtime. Training and core operations require Spark and Scala.

Question 4

How does LinkedIn's isolation-forest compare to scikit-learn's IsolationForest?

Accepted Answer

LinkedIn's library is designed for distributed computing with Spark, offering scalability for big data, while scikit-learn is for single-machine use. It also includes the Extended Isolation Forest variant, which scikit-learn lacks, but has a steeper setup cost.

Question 5

How to export a trained model to ONNX format?

Accepted Answer

Use the isolation-forest-onnx Python package to convert saved Spark model files. Load the Avro and metadata files from HDFS, run the converter, and save the ONNX model, as detailed in the ONNX conversion section.

Question 6

What are the performance implications of using Extended Isolation Forest?

Accepted Answer

Extended Isolation Forest can improve detection on high-dimensional data like ionosphere but may degrade performance on others like ForestCover. It adds computational complexity due to hyperplane splits, as noted in the benchmarks.

isolation-forest

What is isolation-forest?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions