Question 1

Is MongoDB Connector for Hadoop still supported?

Accepted Answer

No, the project is end-of-life with no further updates or support from MongoDB, as explicitly stated in the README. Users should avoid it for new projects and seek alternatives.

Question 2

How to install mongo-hadoop on Amazon EMR?

Accepted Answer

Use bootstrap actions to fetch dependencies like the MongoDB Java driver and mongo-hadoop JARs, placing them in Hadoop's lib folders, as described in the EMR usage section with an enron example.

Question 3

MongoDB Hadoop connector vs using MongoDB's native aggregation framework?

Accepted Answer

The connector is for integrating MongoDB data into Hadoop ecosystems for distributed processing, while MongoDB's aggregation runs server-side. Choose the connector for Hadoop workflows; otherwise, native aggregation is faster for in-database analytics.

Question 4

Can I use mongo-hadoop with Spark 2.x or later?

Accepted Answer

Officially, it's only tested with Spark 1.4, so compatibility with newer versions is not guaranteed. You might encounter issues or need to modify the code, which is risky given the EOL status.

Question 5

What are the alternatives to mongo-hadoop?

Accepted Answer

Consider MongoDB's BI Connector for SQL-based tools, or ETL platforms like Apache NiFi or Talend that support MongoDB. For Hadoop, check if newer connectors or custom solutions using Spark's MongoDB connector are available.

Question 6

How to filter data from MongoDB before processing in Hadoop?

Accepted Answer

Configure the job to use MongoDB query language in the input source settings, allowing filtering at the data source to reduce transfer volumes, as mentioned in the query filtering feature.

mongo-hadoop

What is mongo-hadoop?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions