Tuesday, 7 November 2017

Merging the Content of Files Using TUnite

Scenario: Iterate on files and merge the content

The following Job iterates on a list of files then merges their content and displays the final 2-column content on the console.

Dropping and linking the components

  1. Drop the following components onto the design workspace: tFileListtFileInputDelimitedtUnite and tLogRow.

  2. Connect the tFileList to the tFileInputDelimited using an Iterate connection and connect the other component using a row main link.

Configuring the components

  1. In the tFileList Basic settings view, browse to the directory, where the files to merge are stored.

    The files are pretty basic and contain a list of countries and their respective score.

  2. In the Case Sensitive field, select Yes to consider the letter case.

  3. Select the tFileInputDelimited component, and display this component's Basic settings view.

  4. Fill in the File Name/Stream field by using the Ctrl+Space bar combination to access the variable completion list, and selecting tFileList.CURRENT_FILEPATH from the global variable list to process all files from the directory defined in the tFileList.

  5. Click the Edit Schema button and set manually the 2-column schema to reflect the input files' content.

    For this example, the 2 columns are Country and Points. They are both nullable. The Country column is of String type and the Points column is of Integer type.

  6. Click OK to validate the setting and accept to propagate the schema throughout the Job.

  7. Then select the tUnite component and display the Component view. Notice that the output schema strictly reflects the input schema and is read-only.

  8. In the Basic settings view of tLogRow, select the Table option to display properly the output values.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6, or click Run on the Run console to execute the Job.

    The console shows the data from the various files, merged into one single table.

Monday, 30 October 2017

About sharing Talend development w/ version control


Per http://www.talendforge.org/forum/viewtopic.php?id=6876 (dated Jun 2009)

Q: Is it possible to configure Talend Open Studio with a repository on a shared drive??

A1: No, TOS can only be used on local machine. To share the project, you can use talend Integration Suite, which is the subscription extension of TOS, support multiple users, groups, specific access for these users, specifics privileges etc... For more info, please access our website.

A2: If your drive is mounted, you can try to change the workspace location by adding the "data" option in the shortcut : "C:\TOS\TOS-win32-x86.exe -data Z:\mySharedWorkspace"
This remote workspace will be locked by the first person who use it. So you won't be able to work at the same time with a coworker on this workspace : this option is only available in TIS.

Per http://stackoverflow.com/questions/4604823/talend-project-in-svn (dated Jan 2011)

Q: I need to use Talend Open Studio with a SVN. What is the folder that I should put in the repository?

A1: Well, the "official" answer is to upgrade to Talend Integration Suite (TIS), which includes SVN integration. It goes a long way to synchronizing the activities of several developers using a shared repository.

If TIS is not an option for you, you might have some success by putting the project directory of your Talend Open Studio (TOS) under SVN control. This is the directory immediately under "workspace" that has the same name as your project. You would have to perform all SVN operations, such as commit and update, manually on this directory, for example using Tortoise (if you're on Windows). TOS might be able to use this project as if it was not under SVN. I personally haven't tried this, and would recommend using TIS instead.

A2: I prepared the following for my team. http://www.dariopardo.com/talend-open-studio/svnintegration/ It basically involves doing what was said above. (note – this one uses Subversion repository w/ TortoiseSVN client)

A3: I faced a similar issue an wrote a blog about it, see Putting Talend Open Studio projects under version control. Basically you should version your entire workspace. (note – this one uses Git)

A4: Putting a TOS workspace under SVN is strongly discouraged, as CSV/SVN as file-based by nature while TOS workspace is directory-based. The probability that you incur in workspace corruption are very high.

However I achieved good results putting in under git, instead (have a look here for an idea). I must say it's not elegant and putting the entire workspace under version control is a tremendous waste of space (you will commit compiled files, logs, history, temp...), but it's the only safe solution for your metadata integrity.

Be aware! In *<workspace_root>/.java* subtree there are external libraries and the classpath files for each job in your project you executed at least one time. These paths are absolute. This mean that if you plan to use your version-controlled workspace in a shared environment anyone in team must place the workspace in the absolute location in their local file system (ie. c:/talend_git/workspace). Otherwise you'll get a class not found exception upon executing jobs. Sad but true.

So, my conclusion --- Talend version control for shared development is possible but unless we need to collaborate on Talend jobs frequently and are very skilled in SVN/Git, it can be tricky to incorporate all team members' Talend projects in a repository and once in a while take over for continuing development. The learning curve to incorporate Talend w/ version control software is likely high. As mentioned for answer A4 above, the chances of corrupted file is very high, and I know that can ruin the whole Talend project rather than just one Talend job).

What's the alternative --- just have a repository so everyone can export his/her own projects (or jobs) for others to grab as needed. We have /home/developers/Talend-jobs on dev.berkeley.edu (pahma-dev) we can use, or setup account on code@berkeley.edu (which I have difficulty learning it previously), or maybe the easiest is use Berkeley's "box".

Friday, 20 October 2017

Cycle Handling in PostGreSQL

Below is the way to handle Cycle data in Postgresql database

 CREATE TABLE FOLDER(ID INT , NAME VARCHAR(255), PARENT INT);

INSERT INTO FOLDER VALUES(1, '/', null);
INSERT INTO FOLDER VALUES(2, 'src', 1);
INSERT INTO FOLDER VALUES(3, 'main', 2);
INSERT INTO FOLDER VALUES(4, 'org', 3);
INSERT INTO FOLDER VALUES(5, 'test', 2);


With RECURSIVE CTE(Id,NAME,Parent,Pathh,path, cycle)
as
(
Select Id,NAME,parent, cast(Id  as varchar(30)) pathh,ARRAY[Folder.Id] path,false from Folder
where parent is null
union all
Select 
F.id,F.name,F.Parent,cast(F.Id||'-->'||C.Pathh as varchar(30))  pathh,
C.Path||F.id path,F.id = any(path) as cycle
from 
Folder F,CTE C
where F.Parent=C.ID  and not cycle
)
Select C.* from CTE C
order by 1 


Friday, 13 October 2017

ORACLE PLSQL Exception Handling



SQL> create table a (x number (10), y number(10);
Table created.

SQL> insert into a values(1, 5);
SQL> insert into a values(2, 10);
SQL> insert into a values(3, 15);
SQL> insert into a values(3, 0);
SQL> insert into a values(4, 20);
SQL> insert into a values(5, 25);
SQL> commit;
SQL> select x, y from a order by x;

X Y
---------- ----------
1 5
2 10
3 15
3 0
4 20
5 25
SQL>
declare
v_temp number:= 30;
v_tot number;
v_x number;
v_y number;
cursor c_temp is
select x, y from a order by x;
begin
open c_temp;
loop
fetch c_temp into v_x, v_y;
exit when c_temp%notfound;
v_tot:=v_x+v_temp/v_y;
dbms_output.put_line('Total is :'||v_tot);
end loop;
close c_temp;
exception
when others then
dbms_output.put_line('Found Oracle error: '||SQLERRM);
end;
/

Total is :7
Total is :5
Total is :5
Found Oracle error: ORA-01476: divisor is equal to zero

PL/SQL procedure successfully completed.

Only Partial Records are Printed. 

Below Code Prints : Complete Numbers 

declare
  v_temp number:= 30;
  v_tot number;
  v_x number;
  v_y number;
  cursor c_temp is
    select x, y from a order by x;
begin
  open c_temp;
  loop
    fetch c_temp into v_x, v_y;
    exit when c_temp%notfound;
    begin
      v_tot:=v_x+v_temp/v_y;
      dbms_output.put_line('Total is :'||v_tot);
    exception
      when others then
      dbms_output.put_line('Found Oracle error: '||SQLERRM);
    end;
  end loop;
  close c_temp;
end;

Thursday, 12 October 2017

PLSQL / Works Cursors with Exception Handling

Simple Stored Procedure to Load the Cursor Dynamically:

create or replace procedure asf_cursor is 
TYPE EmpCurTyp  IS REF CURSOR;
c2  EmpCurTyp;
cursor C1 is (select 1 a from dual union all select 2 a from dual);
dynamic_query varchar2(1000);
i int;
var1 varchar(100);
BEGIN
open c1;
loop
fetch c1 into i;
EXIT WHEN c1%NOTFOUND;
dynamic_query:= 'select '''||'~'||i||'''  from dual';
open c2 for dynamic_query;
  LOOP
    FETCH c2 INTO var1;
    EXIT WHEN c2%NOTFOUND;
  END LOOP;
close c2;
DBMS_OUTPUT.PUT_LINE(var1);
end loop;
close c1;
END; 

set serveroutput on 
execute asf_cursor;

 

Exception Handling for Table Not Found:

-- set serveroutput on;
DECLARE 
TYPE EmpCurTyp  IS REF CURSOR;
c2  EmpCurTyp;
var1 varchar2(30);
drop_tab varchar2(100):='Select * from duall union all Select cast(2 as varchar(5)) from duall';
clob_out clob;
BEGIN
/*
open c2 for select * from duall;
LOOP
    FETCH c2 INTO var1;
    EXIT WHEN c2%NOTFOUND;
    dbms_output.put_line(var1);
    close c2;
  END LOOP;
  */
  open c2 for drop_tab;
  loop
  fetch c2 into var1;
   EXIT WHEN c2%NOTFOUND;
   clob_out:=clob_out||var1;
   end loop;
   close c2;
   dbms_output.put_line(clob_out);
  EXCEPTION
WHEN OTHERS THEN 
IF (SQLCODE = -942) THEN
DBMS_OUTPUT.PUT_LINE('Table Not Found');
clob_out:='~'||null;
dbms_output.put_line(clob_out);
END IF;
END;

Tuesday, 10 October 2017

Setting Different Java Version for Eclipse

  • Open eclipse config file eclipse.ini in your Eclipse folder.
  • Add -vm yourPath\Java\jre6\bin\javaw.exe like:

    -startup  plugins/org.eclipse.equinox.launcher_1.3.0.v20120522-1813.jar  --launcher.library  plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.200.v20120522-1813  -product  org.eclipse.epp.package.java.product  --launcher.defaultAction  openFile  --launcher.XXMaxPermSize  256M  -vm   C:\Java\jre6\bin\javaw.exe  ...

Friday, 6 October 2017

NT Authentication In Talend

The solution is to place ntlmauth.dll in windows/system32 (or sysWow64) folder. but instead of placing it in system32 folder, we can place it in our own local folder and loaded by our talend jobs while establishing a connection to MS SQL Server.


Design time configuration:

1. Copy ntlmauth.dll to bin folder inside talend installed location. bin folder wont be present, create your own.

2. While executing the job, go to Advanced settings and select "Use Specific JVM arguments" and add the following additional argument

-Djava.library.path=./bin

3. Execute the talend job, it will use windows authentication

Inline image 1


Runtime configuration

1. While creating the package, ensure "bin" folder is also bundled along with ntlmauth.dll
2. Edit the .bat or .sh file and add the following entry

-Djava.library.path=./bin


Inline image 2

3. Execute the batch file, the job should run without any SSO error.

Tuesday, 3 October 2017

Talend Console logs to file

Redirecting Talend Console logs to file

Talend Open studio for data integration does not have any component to redirect console trace to a file. This blog post provides the solution for Redirecting Talend Console logs to file and is very easy to follow and extend. In previous article logging-using-talend we have seen how to customize Talend logs. This article shows how to redirect Talend console logs to file. In talend exchange, there is a custom component to perform similar job.

Solution



Redirecting Talend console logs is required by almost all Talend projects. This is very easy to implement in talend. Use tJava component at the start of any job and enter following lines into it.

Redirecting Talend Console logs

Redirecting Talend Console logs

java.io.File folder = new java.io.File("_logs/");
if (!folder.exists()) {
if (folder.mkdir()) {
System.out.println("Directory is created!");

}
else {
System.out.println("Failed to create directory!");
}
}
java.io.File logFile = new java.io.File("_log/"+projectName+"_.log");
java.io.PrintStream ts = new java.io.PrintStream(new java.io.FileOutputStream(logFile));
System.setOut(ts);
System.setErr(ts);

After executing the job, above code creates a flat file with "projectName.log" and starts writing data which appears on the console. You can create a new file at each execution by post/pre fixing with the time stamp in the file name or use following code. Attached image shows

java.io.File file = new java.io.File(projectName+"_"+TalendDate.getDate("CCYYMMDDhhmmss")+".log");

projectName is the variable name provided by the Talend and you can get the postfix date stamp by another variable TalendDate.getDate("CCYY-MM-DD hh:mm:ss"). Remove all the special characters in the time stamp. ProjectName can also be replaced by another global variable jobName.

After successfully executing this job, you can see ProjectName_TimeStamp.log  file is created in the root folder of Talend. This file contains complete console stack trace.

Drawback



 Drawback of implementing this strategy is that, it will not print or display anything on the console while using IDE or independent execution. This becomes difficult when developer is developing or debugging the job. Because, it is an extra effort for a developer to open a file every time to find out whether the job is executed successfully and generated error or not.

Monday, 2 October 2017

Talend Logging

java.io.File file = new java.io.File("C:/temp/myTalendLog.txt");
java.io.PrintStream ps = new java.io.PrintStream(new java.io.FileOutputStream(file));
System.setOut(ps);

Sunday, 24 September 2017

Upper Casing in char

char is a primitive type that represents a single 16 bit Unicode character while Character is a wrapper class that allows us to use char primitive concept in OOP-kind of way.

Example for char,

char ch = 'a';

Example of Character,

Character.toUpperCase(ch);

It converts 'a' to 'A'

Thursday, 17 August 2017

Query to Generate alter statement to disable and enable foreign key constraints in SQL Server


SELECT 
distinct
OBJECT_NAME(f.parent_object_id) Reference_TableName,
'Alter table '+OBJECT_NAME(f.parent_object_id)+' nocheck constraint '+ f.name ALter_Statement_Disable,
'Delete from '+OBJECT_NAME (f.referenced_object_id) Delete_Statment,
'Alter table '+OBJECT_NAME(f.parent_object_id)+' check constraint '+ f.name ALter_Statement_Enable
FROM 
   sys.foreign_keys AS f
INNER JOIN 
   sys.foreign_key_columns AS fc 
      ON f.OBJECT_ID = fc.constraint_object_id
INNER JOIN 
   sys.tables t 
      ON t.OBJECT_ID = fc.referenced_object_id
WHERE 
   OBJECT_NAME (f.referenced_object_id) = 'dashboard'

Sunday, 30 July 2017

Overriding Context Environments and Variables

1. Passing Context Environment :-

We can create different context group in the Talend Job and set the values of the context variables according to these groups. e.g. we can create a context group for Dev (Development environment) , Prod (Production environment) and provide different values to context variables based on the environment.

context_demo_run.bat --context=Prod

For using context group Dev defined in the Job:

H:\demo\context_demo> context_demo_run.bat --context=Dev

If we do not provide the context group then Job will automatically use the default context group defined in the Job.

For using default context group defined in the Job:

H:\demo\context_demo> context_demo_run.bat

2. Overriding Context Variables - We can also override the values of the individual context variables/parameters through command line. We can pass the argument --context_param followed by a name of the parameter defined in the Job and its value to override any of the parameters in the context. Use the following command to override the value of context parameter or variable.

H:\demo\context_demo> context_demo_run.bat --context_param <param-name>=<param-value>

For example: To override the value of context parameter DB_NAME to localhost

H:\demo\context_demo> context_demo_run.bat --context_param DB_NAME=localhost

Thursday, 18 May 2017

Export Model Repository Mappings using infacmd oie command

Export Command Without Control File: 

​./infacmd.sh oie ExportObjects -dn infav10_domain -un Admin -pd "@@@@@@" -sdn Native -pn Integration -rs mrs_irpo -fp /tmp/Mappings.xml -ow true

Export Command With Control File: 
./infacmd.sh oie ExportObjects -dn infav10_domain -un Admin -pd "@@@@@@" -sdn Native -pn Integration -rs mrs_irpo -fp /tmp/Mappings.xml -ow true -cp /tmp/ec_mapping.xml

Control File Samples

Control File :    All_Mappings In the Project Without considering Sub Folders

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<exportParams
<folders>
<folder recursive="false" >
<objectList type="Mapping" />
</folder>
</folders>
</exportParams>

Control File :    All_Mappings In the Project With consider Sub Folders contents

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<exportParams
<folders>
<folder recursive="true" >
<objectList type="Mapping" />
</folder>
</folders>
</exportParams>


Control File :  Export Particular Mapping 

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<exportParams
<folders>
<folder recursive="false" >
<objectList type="Mapping">
<object name="m_PassThrough_Mapping"/>
</objectList>
</folder>
</folders>
</exportParams>​

Control File : Export Particular Folder Mappings in a Project ​

​<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<exportParams
<folders>
<folder path="/SourceAdapter" recursive="false" >
<objectList type="Mapping" />
</folder>
</folders>
</exportParams>​

​ 

Wednesday, 17 May 2017

Operation not allowed for reason code “7” on table in DB2

This issue generally occurs when we altered/modified already existing table in IBM DB2.

We have to execute below Query when we get this error.

CALL sysproc.admin_cmd('REORG TABLE TABLE_NAME');

Below are the situations when we can ReOrg the table in DB2.

  • A high volume of insert, update, and delete activity on tables accessed by queries. If many rows are inserted, there may not be enough free space to keep them in the clustered sequence. If many rows are deleted, the table will still have the space allocated and a REORG can free the unnecessary space.
  • Significant changes in the performance of queries that use an index with a high cluster ratio. Some applications access groups of rows using the clustered index and may not perform well if the table becomes unclustered.
  • Executing RUNSTATS to refresh statistical information does not improve performance.
  •    In some cases, a RUNSTATS Utility can collect the current statistics and resolve performance problems. If a table has become unclustered or if it contains a large amount of free space, the REORG Utility may be needed to improve access efficiency.
  • The REORGCHK command indicates a need to reorganize your table.
  • If Row compression is being implemented for a table, the REORG Utility can be used to build the compression dictionary and compress the data rows.
Note:  alter table "APPS"."DEPARTMENTS" alter column DEPARTMENT_ID set not null;