Friday, 4 March 2016

tAggregateRow and tAggregateSortedRow in Talend

This is our input for demonstration.

tAggregateRow vs tAggregateSortedRow in TalendtAggregateRow vs tAggregateSortedRow in Talend

I know these many columns not required but still wanted to use.

tAggregateRow: Receives a flow and aggregates it based on one or more columns. For each output line, are provided the aggregation key and the relevant result of set operations (min, max, sum…).

Question 1: Display Maximum quantity by Continent?

Step 1: Create simple job with above given input and tAggregareRow and tLogRow.

Step 2: Connect input with tAggregareRow and do the following settings.

  • Add two columns in output Schema of tAggregateRow component, for quantity & Continent. you final schema should look like below image.
tAggregateRow vs tAggregateSortedRowtAggregateRow vs tAggregateSortedRow
  •  Do the following setting in tAggregateRow
    • In Group by table, add Continent as input and output.
    • In "Operations" table, add Quantity column as input and output and select max function from function tab. see the below image for more details.
tAggregateRow vs tAggregateSortedRowtAggregateRow vs tAggregateSortedRow

We have done the basic setting, now we can execute the job and get output no problem question was easy but when it comes to tAggregateSortedRow it becomes complicated, because official description of tAggregateSortedRow says

tAggregateSortedRow: tAggregateSortedRow receives a sorted flow and aggregates it based on one or more columns. For each output line, are provided the aggregation key and the relevant result of set operations (min, max, sum…).

lets see how it behaves with our example job.

Add another sub job with same input and output just change the tAggregateRow to tAggregateSortedRow with same setting we did for tAggregateRow except that we will add "Input Rows Count"=7 ( we have seven rows only)

But outputs are different, see the below image with both the output.

tAggregateRow vs tAggregateSortedRow outputtAggregateRow vs tAggregateSortedRow output

Outputs are different because we do not have sorted flow for  tAggregateSortedRow component. We got our first difference that is

tAggregateSortedRow works on Sorted rows only. But tAggregateRow performs same operation without sorting rows. 

Step 2:  Add tSortRow to the tAggregateSortedRow  flow.

  • Add tSortRow component after the input and connect with input and tAggregateSortedRow using main flow.
  • Configure tSortedRow component as follows.
    • Sync columns using sync button.
    •  Inside "Criteria" table add one row and
      • Schema column=continent
      • sort num or alpha?=alpha
      • Order asc or desc?=asc
  • now execute the same job we will get a below output.
tAggregateRow vs tAggregateSortedRow output 2tAggregateRow vs tAggregateSortedRow output 2

Now results are matching but order is shuffle.

We got our second difference.

tAggregateRow does not sort the result, but tAggregateSortedRow works on sorted flow that is why it produces result in sorted order. 

This is the final job design which is being used for demonstration.

tAggregateRow vs tAggregateSortedRow JobtAggregateRow vs tAggregateSortedRow Job

Now we will use same job for further demonstration.

Step 3: Modify tAggregateSortedRow Setting.

  • we are working on fixed flow input so we know how many rows are in input flow. we will change the
    • "Input Rows Count"=0
  • Execute the job you will get below output.
tAggregateRow vs tAggregateSortedRow output 3tAggregateRow vs tAggregateSortedRow output 3

We got the another difference.

tAggregateRow is not dependanat on input row count, means we can use tAggregateRow component without knowing input row count whereas tAggregateSortedRow requires input row count in prior. 

Except that I did not see any major differences using these components it behaves seemlier except above differences .

No comments:

Post a Comment