Monday, 28 May 2018

The differences between Unique match, First match and All matches


This article uses examples to illustrate the differences of the three match models. These three match models mentioned in the title are provided by the tMap component when this component is used to perform the JOIN (Inner Join / Left Outer Join) operation over the data from homogeneous or heterogeneous sources.
Environment
This procedure was written with:
  • Talend Open Studio for Data Integration 5.3.1
  • Sun JDK build 1.6.0_26-b03
  • Windows XP SP3
Talend has verified that this procedure is compatible with all versions and any edition of the Talend Studio in which the tMap component is available ,
Resolution
The following example shows how to use the match models with the tMap component.
When writing this article, we assume that you have been familiar with Talend intuitive interface and thus is able to create Talend Jobs with components and links.
For further information about how to create a Talend Job, see 4.2.1 How to create a Job .
For further information about the tMap component, see tMap or 9.2 tMap operation .
Source data
The main source reads like:
IDName
1Shong
2Elisa
3Sabrina
The Lookup source reads as follows:
IDEmail
1Shong1@talend.com
1Shong2@talend.com
2Elisa@talend.com
3Sabrina@talend.com
Now we plan to perform an inner join between the main source and the lookup source, and to produce the data structure as below based on the two sources.
IDNameEmail
The result varies depending on the match model to be used.
Creating the Job
We use a tFixedFlowInput component to generate the main source.
And use a second tFixedFlowInput component to generate the lookup source.
Use tMap to perform the inner join, and output the result to a tLogRow component (with Table mode) that prints the result on the console (the demo job is downloadable in the Related File section of this article).
Using the match models to generate different results
Unique match : this is the default option for the JOIN operation. It outputs the last matching record of the lookup source.
The result of the JOIN by the Unique match model reads as follows:
Starting job tMap_Match_modes at 17:46 25/09/2013.

[statistics] connecting to socket on port 3367
[statistics] connected
.--+-------+-------------------.
|          tLogRow_2           |
|=-+-------+------------------=|
|ID|Name   |Email              |
|=-+-------+------------------=|
|1 |Shong  |Shong2@talend.com  |
|2 |Elisa  |Elisa@talend.com   |
|3 |Sabrina|Sabrina1@talend.com|
'--+-------+-------------------'
[statistics] disconnected
Job tMap_Match_modes ended at 17:46 25/09/2013. [exit code=0]
First match : it outputs the first matching record of the lookup source.
The result of the JOIN by the First match model reads as follows:
Starting job tMap_Match_modes at 17:51 25/09/2013.

[statistics] connecting to socket on port 3942
[statistics] connected
.--+-------+-------------------.
|          tLogRow_2           |
|=-+-------+------------------=|
|ID|Name   |Email              |
|=-+-------+------------------=|
|1 |Shong  |Shong1@talend.com  |
|2 |Elisa  |Elisa@talend.com   |
|3 |Sabrina|Sabrina1@talend.com|
'--+-------+-------------------'
[statistics] disconnected
Job tMap_Match_modes ended at 17:51 25/09/2013. [exit code=0]
All match : it outputs all matching records of the lookup source.
The result of the JOIN by the All match model reads as follows:
Starting job tMap_Match_modes at 17:58 25/09/2013.

[statistics] connecting to socket on port 3381
[statistics] connected
.--+-------+-------------------.
|          tLogRow_2           |
|=-+-------+------------------=|
|ID|Name   |Email              |
|=-+-------+------------------=|
|1 |Shong  |Shong1@talend.com  |
|1 |Shong  |Shong2@talend.com  |
|2 |Elisa  |Elisa@talend.com   |
|3 |Sabrina|Sabrina1@talend.com|
'--+-------+-------------------'
[statistics] disconnected
Job tMap_Match_modes ended at 17:58 25/09/2013. [exit code=0]

No comments:

Post a Comment