MEP 3.0 Of MapR Ecosystem Pack Provide Security For Apache Spark

Every Apache Spark development company has a reason to rejoice as MAPR Technologies has released the version 3.0 of MapR Ecosystem Pack (MEP) that is targeted towards providing improved security for Apache Spark, new Apache Spark connectors for MapR-DB and HBase, integrations with Drill, and faster version of Hive.

The following provides an overview of the latest MEP 3.0 updates for the projects running on MapR:

  • Apache Drill 1.10

This contains enhancements related to BI tool integration, end-to-end application security, and application performance in general.

  • Apache Hive 2.1.1

This version of Hive has faster speed, as there are a lot of performance-centric improvements in data processing and querying.

  • Apache Spark 2.1.0

This has significant improvements related to enterprise level stability and security, which make the applications enterprise-ready.

  • Native Spark Connector for MapR-DB JSON

This provides tight integration through MapR-DB records in real-time, leading to improved efficiency of database transactions in the application.

  • MapR Streams

This package provides new APIs for C and Python.

  • MapR Installer

This installer has features to simplify upgrades and add-ons to the existing MEP version.

How will these features benefit developers?

MapR Technologies claims that its MEP ecosystem pack works on the pain points of the complexity arising due to the coordination issues in community projects and versions. It basically works on development, testing and integrating issues in open source projects like Apache Drill, Apache Spark, Hive, Myriad, etc.

Apart from the latest version (3.0), new versions of MEP are released on a quarterly basis. This helps the developers to work on the latest features of all the community-driven softwares like Apache Spark, Apache Drill, and others.

MEP handles version compatibility problems for the developers. Hence, instead of working on/upgrading separate installations of the open source softwares (instead of using their bundled versions in MEP), installing the latest MEP guarantees inter-project compatibility between the earlier and newer versions of softwares in MEP. This gives the developers in the Apache Spark development company enough bandwidth to work on the actual business logic in the project code rather than spending time troubleshooting compatibility issues with other software/project.

Upgrading MEP version does not make changes to the existing core MapR platform installation. Hence, it is easier to install the quarterly updates that only upgrade the open source project stacks.

Feature additions to Apache Drill and Apache Hive

Apache Hive 2.1.1

As mentioned above, Hive 2.1.1 will increase speed for data processing, have smaller latency for interactive queries, and increase throughput for batch queries. This leads to better big data processing abilities.

Key features:

  • 2X faster ETL.
  • New HiveServer UI for diagnostics and monitoring tools.
  • Dynamically partitioned hash joins.
  • Vectorized query execution.

Apache Drill 1.10

As mentioned above, this update provides improved BI tools, end-to-end security, and performance.

Key features:

  • Tableau native connectivity.
  • Support for Kerberos and MapR-SASL authentication.
  • Support for CTTAS command.
  • Ability to query data with Hue 3.12 on an experimental basis.
  • Improved compatibility with Parquet files.

What is special about the Apache Spark release in MEP 3.0?

As mentioned above, the Apache Spark 2.1.0 focuses on providing enterprise level stability and security. This is a result of the following features that are incorporated in the release:

  • It has scalable partition handling.
  • The data type APIs are more stable in this version.
  • It has more than 1200 fixes on the Apache Spark 2.x generation.
  • Secure connections can be established using MapR-SASL, Kerberos for inbound client connections to Spark Thrift Server, and Apache Spark connections to Hive Metastore.
  • It has support for impersonation on SELECT statements.

All the content shared in this post belongs to the author of Apache Spark Consulting Company. Share your thoughts with other readers and let them know about your views.

Leave a Reply

Your email address will not be published. Required fields are marked *