Tuesday, 13 March 2018

Split large files into a number of smaller files in Unix



To split large files into smaller files in Unix, use the split command. At the Unix prompt, enter:
  split [options] filename prefix
Replace filename with the name of the large file you wish to split. Replace prefix with the name you wish to give the small output files. You can exclude [options], or replace it with either of the following:
  -l linenumber

  -b bytes


  • n this simple example, assume myfile is 3,000 lines long:
      split samplefile
    This will output three 1000-line files: xaaxab, and xac.
  • Working on the same file, this next example is more complex:
      split -l 500 samplefile segment
    This will output six 500-line files: segmentaasegmentabsegmentacsegmentadsegmentae, andsegmentaf.
  • Finally, assume samplefile is a 160KB file:
      split -b 40k samplefile segment

Thursday, 1 March 2018

Identifying the bottleneck threads using Thread Dump



top - to list CPU and memory utilization by pid

top -H -p PID (e.g top -H -p 12100) - to list the CPU utilization by mulithreaded

Generate the thread dump using jstack:

jstack -l pid > filename

e.g jstack -l 12100 > ThreadDump_01032018.txt


Conversion of Decimal to HexaDecimal :

printf "%x \n" PID

e.g  printf "%x \n" 12100




Thursday, 15 February 2018

Code Snippet to clear Talend Hash Components


Instantiate thashinput: 

org.talend.designer.components.hashfile.common.MapHashFile mf_tHashInput_2 = org.talend.designer.components.hashfile.common.MapHashFile.getMapHashFile();


Clear the Cache based on PID:

mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_1");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_10");

mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_2");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_23");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_24");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_25");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_26");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_27");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_28");
mf_tHashInput_2.clearCache("tHashFile_"+jobName+"_" + pid + "_tHashOutput_29");

Saturday, 27 January 2018

Convert Long into Integer

How to convert a Long value into an Integer value in Java?

Integer i = (int) (long) theLong;


Tuesday, 7 November 2017

Merging the Content of Files Using TUnite

Scenario: Iterate on files and merge the content

The following Job iterates on a list of files then merges their content and displays the final 2-column content on the console.

Dropping and linking the components

  1. Drop the following components onto the design workspace: tFileListtFileInputDelimitedtUnite and tLogRow.

  2. Connect the tFileList to the tFileInputDelimited using an Iterate connection and connect the other component using a row main link.

Configuring the components

  1. In the tFileList Basic settings view, browse to the directory, where the files to merge are stored.

    The files are pretty basic and contain a list of countries and their respective score.

  2. In the Case Sensitive field, select Yes to consider the letter case.

  3. Select the tFileInputDelimited component, and display this component's Basic settings view.

  4. Fill in the File Name/Stream field by using the Ctrl+Space bar combination to access the variable completion list, and selecting tFileList.CURRENT_FILEPATH from the global variable list to process all files from the directory defined in the tFileList.

  5. Click the Edit Schema button and set manually the 2-column schema to reflect the input files' content.

    For this example, the 2 columns are Country and Points. They are both nullable. The Country column is of String type and the Points column is of Integer type.

  6. Click OK to validate the setting and accept to propagate the schema throughout the Job.

  7. Then select the tUnite component and display the Component view. Notice that the output schema strictly reflects the input schema and is read-only.

  8. In the Basic settings view of tLogRow, select the Table option to display properly the output values.

Saving and executing the Job

  1. Press Ctrl+S to save your Job.

  2. Press F6, or click Run on the Run console to execute the Job.

    The console shows the data from the various files, merged into one single table.