join(C)

join -- join two relations

Syntax

join [ -a file | -v file ] [ -e str ] [ -o list ] [ -t c ] [ -j[file] field | -1 field -2 field ] file1 file2

Description

The join command prints to the standard output a join of the two relations specified by the lines of file1 and file2. Either file1 or file2 can be a dash ``-'', in which case the standard input is used.

file1 and file2 must be sorted in increasing collating sequence (defined by the current locale; see locale(M)) on the fields on which they are to be joined, normally the first in each line.

There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally consists of the common field, followed by the rest of the line from file1, then the rest of the line from file2.

Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are discarded.

join recognizes these options:

-a file: In addition to the normal output, produces a line for each unpairable line in file, where file is 1 or 2.
-e str: Replaces empty output fields by string str.
-j[file] field: Joins on field (1,2,3...) of file (1 or 2). If file is missing, the same field from each file is used (equivalent to -j1 field -j2 field or -1 field -2 field).
-1 field: Equivalent to -j1 field.
-2 field: Equivalent to -j2 field.
-o list: Each output line comprises the fields specified in list, each element of which has the form file.field formed from the file number (1 or 2) and the field number (1,2,3,...). The list may consist of single or multiple command line arguments, each element separated by blanks or commas.
-t c: Uses character c as a field separator. Every appearance of c in a line is significant.
-v file: Only produce output for each unpairable line in file, where file is 1 or 2.

Exit values

join returns the following values:

0: all input files were output successfully
>0: an error occurred

Examples

The first example joins /etc/passwd and /etc/group to produce a list of user names and their login groups. /etc/passwd and /etc/group are sorted on the keys to be joined. The output is written to temporary files, the join performed on these, and the output sorted by user name:

   PTMP=/tmp/passwd{$$}
   GTMP=/tmp/group{$$}
   sort -t: +3 -4 /etc/passwd > $PTMP
   sort -t: +2 -3 /etc/group > $GTMP
   join -t: -j1 4 -j2 3 -o 1.1 2.1 $PTMP $GTMP | sort -t: +0 -1
   rm $PTMP $GTMP

In the next example an employer has two files of data about his employees. The first file, E_pers, contains the name, address, and payroll number of each employee. The second file, E_dept, contains the payroll number, position, salary, and department of each employee. The fields in each employee's record are separated by colons ``:'', and each record occupies one line of a file.

The format of a record in the file E_pers would be:

   Name:Address:Payroll#

Here are some example records that might be in the file E_pers:

   Selkirk C:33 Wingfield Court, Anytown, Pa 65000:133
   Friday M:17 Awberry Court, Ubique, Ca 18480:91
   Fletcher J:14 Tulip Gardens, Appledorn, NY 10010:231

Each record in the file E_dept would have the format:

   Payroll#:Position:Salary:Department

The following are some example records that might appear in E_dept:

   91:Manager:85000:Marketing
   133:Clerk:22000:Accounts
   231:Clerk:19500:Accounts

The employer wants to know which employees earn more than $20,000.

Firstly, the fields in the file E_pers are put in the order:

   Payroll#:Name:Address

This ensures that the field that the files are to be joined on (payroll number) is the first field in both files. The reordering is performed using awk(C), and its output is piped through sort(C). The sample records from E_pers given above would become:

   133:Selkirk C:33 Wingfield Court, Anytown, Pa 65000
   231:Fletcher J:14 Tulip Gardens, Appledorn, NY 10010
   91:Friday M:17 Awberry Court, Ubique, Ca 18480

Note that the records are sorted into order of payroll number according to the collating sequence (not increasing numeric order). The reordered and sorted file is named P_tmp:

awk -F: '{print $3":"$1":"$2}' E_pers | sort -t: +0 -1 > P_tmp

Next, the records in the file E_dept are sorted on payroll number and the output is placed in D_tmp:

sort -t: +0 -1 E_dept > D_tmp

The sorted files are now joined on their first fields (payroll number), and the output is filtered using awk. A final pass through sort puts the output in alphabetical order:

join -t: -o 1.2 2.3 P_tmp D_tmp | awk -F: '$2>20000 {print $1" $"$2}' | sort

This command prints the employee's name (the second field of P_tmp) and their salary (the third field of D_tmp) if it is more than $20000. For the sample records given above, the output would be:

   Friday M $85000
   Selkirk C $22000

The output from join can be directed to a file (E_joined):

join -t: -o 1.1 1.2 1.3 2.2 2.3 2.4 P_tmp D_tmp > E_joined

Omitting the -o option causes all fields to be written to the output file, with the fields from file1 being followed by those from file2. However, the field on which the files were joined is only included once. Thus the join command above could be entered as:

join -t: P_tmp D_tmp > E_joined

The file E_joined may be processed using awk to extract any other information required. Here a list of all employees sorted by name within each department is output:

cat E_joined | awk -F: '{print $6" "$2}' | sort

For the sample records used in the example above, the output would be:

   Accounts Fletcher J
   Accounts Selkirk C
   Marketing Friday M

Limitations

With default field separation, the collating sequence is that of sort -b. With -t, the sequence is that of a plain sort.

Standards conformance

join is conformant with:

ISO/IEC DIS 99452:1992, Information technology Portable Operating System Interface (POSIX) Part 2: Shell and Utilities (IEEE Std 1003.21992);
AT&T SVID Issue 2;
X/Open CAE Specification, Commands and Utilities, Issue 4, 1992.