pondhas.blogg.se - Pentaho data integration 6.0

Pentaho data integration 6.0 update#
Pentaho data integration 6.0 driver#
Pentaho data integration 6.0 trial#

Pentaho data integration 6.0 driver#

Once the driver is exposed to Pentaho it can be used to configure a Data Source.

In short it means adding the driver jar file to the appropriate lib directory and restarting the service. I added the driver to Pentaho’s BI Server as described here. Storage requirements for this dataset was cut in half when document storage was disabled! Configuring Pentaho Tthis is not a problem when you just want to aggregate data (as in the use case described in this post).

When _sourcestorage is disabled it is no longer possible to retrieve the original documents, but. It is however possible to increase document compression or disable _sourcedocument storage altogether (see this page). Since Elasticsearch usually stores the document source, denormalization of data will require more storage. For this type of data, with a very controlled vocabulary of product names, categories, and other fields with low cardinality, etc. Under the hood Elasticsearch maintains a vocabulary with identifiers for all terms it encounters. This might seem very expensive with respect to storage and memory usage but this is not necessarily so, at least not when using Elasticsearch. The entire data process is shown in the figure below.ĭimension data is duplicated in order to avoid joining which makes it faster to aggregate data. Obviously a very small dataset but good enough for the proof of concept running on a single laptop. This resulted in roughly 10M rows with random sales volumes per day. The flattened dataset was indexed in Elasticsearch without strings being analysed. This data was denormalized, each row containing the joined data with all dimensions, using Apache Spark.

Currency: name and exchange rate (dollar).

Store: name, country, currency (see below).

Product: name, category, subcategory and price.

Time: containing the date, month, week and day fields.

I created a typical BI sales dataset containing a single fact (daily point of sales) with a number of dimensions: The following sections describe the data used, how the driver was loaded and configured within Pentaho and used to generate reports.

Pentaho data integration 6.0 trial#

I put the SQL4ES driver to the test by using it to expose Elasticsearch as a data source for Pentaho.įor this effort I used Elasticsearch version 2.2.1, Pentaho Business Analytics 6.0.1.0 Trial and SQL4ES driver version 0.8.2.2.

Pentaho data integration 6.0 update#

JNI_CreateJavaVM() failed, error: -1Īs a side note, the machine has the Oracle JDK 1.7.0 update 45 installed a change was required in "ist" of the JDK (located at "/Library/Java/JavaVirtualMachines/jdk1.7.0_45.In one of my previous posts, I described the Elasticsearch JDBC driver we developed, called SQL4ES. JavaVM FATAL: Failed to load the jvm library. JavaVM: Failed to load JVM: /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/bundle/Libraries/libserver.dylib When attempting to run "JavaApplicationStub" directly from "Data Integration.app/Contents/Mac OS/", the following text is displayed: (Provided, of course, that you previously ran "chmod +x *.sh" on the "data-integration" folder). Even double-clicking on mand works just fine. It can be started, however, from spoon.sh with no problems. It attempts to load something, then it closes immediately. Double-clicking on the "Data Integration" icon does not start PDI.