Shared libraries

Importing symbols

Shared library code cannot directly use symbols defined outside a library, but an escape hatch exists. You can define pointers in the data area and arrange for those pointers to be initialized to the addresses of imported symbols. Library code then accesses imported symbols indirectly, delaying symbol binding until run time. Libraries can import both text and data symbols. Moreover, imported symbols can come from the user's code, another library, or even the library itself. In ``Imported symbols in a shared library'', the symbols _libc.ptr1 and _libc.ptr2 are imported from user's code and the symbol _libc_malloc from the library itself.

Imported symbols in a shared library

The following guidelines describe when and how to use imported symbols.

Imported symbols that the library does not define

Non-shared libraries typically contain relocatable files, which allow undefined references. Although the host shared library is an archive, too, that archive is constructed to mirror the target library, which more closely resembles an a.out file. Neither target shared libraries nor a.out files can have unresolved references to symbols.

Shared libraries must import any symbols they use but do not define. Some shared libraries will derive from existing non-shared libraries. For the reasons stated above, it may not be appropriate to include all the non-shared archive's modules in the target shared library. Remember though that if you exclude a symbol from the target shared library that is referenced from the target shared library, you will have to import the excluded symbol.

Imported symbols that users must be able to redefine

Optionally, shared libraries can import their own symbols. Two standard libraries, libc and libmalloc, provide a malloc family. Even though most UNIX system commands use the malloc from the C library, they can choose either library or define their own.

Three possible strategies exist for building the shared C library. First, exclude the malloc(S) family. But other library members might need it, and so it will have to be an imported symbol. This will work, but it means less savings.

Second, include the malloc family; do not import it. This provides more savings for typical commands. However, other library routines call malloc directly, and those calls can not be overridden. If an application tries to redefine malloc, the library calls will not have to use the alternate version. Furthermore, the link editor will find multiple definitions of malloc while building the application. To resolve this the library developer will have to change source code to remove the custom malloc, or refrain from using the shared library.

Finally, the most flexible, is to include malloc in a shared library, treating it as an imported symbol. Even though malloc is in the library, nothing else there refers to it directly; all references are through an imported symbol pointer. If the application does not redefine malloc, both application and library calls are routed to the library version. All calls are mapped to the alternate, if present.

You might want to permit redefinition of all library symbols in some libraries. You can do this by importing all symbols the library defines, in addition to those it uses but does not define. Although this adds a little space and time overhead to the library, the technique allows a shared library to be one hundred percent compatible with an existing non-shared library at link time and run time. This is the strategy used for the installed version of the Shared C library.

Mechanics of importing symbols

For example, assume a shared library wants to import the symbol malloc. The original non-shared code and the shared library code appear below.

   Non-Shared Library		Shared Library Code
   
   				/* See pointers.c on next page */
   

   extern char *malloc();		extern char *(*_libc_malloc)();
   

   funce()				funce()
   {				{
      ...				   ...
      p = malloc(n);	 	   p = (*_libc_malloc)(n);
      ...	 			   ...
   }				}

Making this transformation is straightforward, but two sets of source code would be necessary to support both a non-shared and a shared library. Some simple macro definitions can hide the transformations and allow source code compatibility. A header file defines the macros, and a different version of this header file would exist for each type of library. The -I flag to cc(CP), documented in the Programmer's Reference, would direct the C preprocessor to look in the appropriate directory to find the desired file.

Non-shared import.h Shared import.h

/* empty */ /* * Macros for importing * symbols. One #define * per symbol. */ ... #define malloc (*_libc_malloc) ... extern char *malloc(); ...

These header files allow one source both to serve the original archive source and to serve a shared library, too, because they supply the indirections for imported symbols. The declaration of malloc in import.h actually declares the pointer _libc_malloc.

   Common Source

   #include "import.h"
   

   extern char *malloc();
   

   funce()
   {
      ...
      p = malloc(n);
      ...
   }

Alternatively, one can hide the #include with #ifdef:

   Common Source

   #ifdef SHLIB
   #     include "import.h"
   #endif
   

   extern char *malloc();
   

   funce()
   {
      ...
      p = malloc(n);
      ...
   }

NOTE: When building the shared library the code can be conditionally turned on by defining shlib via the -D flag to cc(CP).

Of course the transformation is not complete. You must define the pointer _libc_malloc.

   File pointers.c
   ...
   char *(*_libc_malloc)() = 0;
   ...

NOTE: _libc_malloc is initialized to zero, because it is an external data symbol.

Special initialization code sets the pointers. Shared library code should not use the pointer before it contains the correct value. In the example the address of malloc must be assigned to _libc_malloc. Tools that build the shared library generate the initialization code according to the library specification file.

Pointer initialization fragments

A host shared library archive member can define one or many imported symbol pointers. Regardless of the number, every imported symbol pointer should have initialization code.

This code goes into the a.out file and does two things. First, it creates an unresolved reference to make sure the symbol being imported gets resolved. Second, initialization fragments set the imported symbol pointers to their values before the process reaches main. If the imported symbol pointer can be used at run time, the imported symbol will be present, and the imported symbol pointer will be set properly.

NOTE: Initialization fragments reside in the host, not the target, shared library. The link editor copies initialization code into a.out files to set imported pointers to their correct values.

Library specification files describe how to initialize the imported symbol pointers. For example, the following specification line would set _libc_malloc to the address of malloc:

   ...
   #init pmalloc.o
   _libc_malloc    malloc
   ...

When mkshlib builds the host library, it modifies the file pmalloc.o, adding relocatable code to perform the following assignment statement:

   _libc_malloc = &malloc ;

When the link editor extracts pmalloc.o from the host library, the relocatable code goes into the a.out file. As the link editor builds the final a.out file, it resolves the unresolved references and collects all initialization fragments. When the a.out file is executed, the run time startup routines execute the initialization fragments to set the library pointers.

Selectively loading imported symbols

You can reduce unnecessary loading by writing C source files that define imported symbol pointers singly or in related groups. If an imported symbol must be individually selectable, put its pointer in its own source file (and archive member). This will give the link editor a finer granularity to use when it resolves the reference to the symbol.

For example, a single source file might define all pointers to imported symbols:

   Old pointers.c

   ...
   int (*_libc_ptr1)() = 0;
   char *(*_libc_malloc)() = 0;
   int (*_libc_ptr2)() = 0;
   ...

Allowing the loader to resolve only those references that are needed requires multiple source files and archive members. Each of the new files defines a single pointer:

File Contents

ptr1.c int (*_libc_ptr1)() = 0;

-

pmalloc.c char *(*_libc_malloc)() = 0;

ptr2.c int (*_libc_ptr2)() = 0;

Using the three files ensures that the link editor will only look for definitions for imported symbols and load in the corresponding initialization code in cases where the symbols are actually used.

File	Contents
ptr1.c	*int (_libc_ptr1)() = 0;**
-
pmalloc.c	*char (_libc_malloc)() = 0;*
ptr2.c	*int (_libc_ptr2)() = 0;**

NOTE: If one version of a library does not resolve any pointers in an archive member, it is then not possible to use any of those pointers in a future version of the shared library without breaking backward compatibility.

Referencing symbols in a shared library from another shared library

In general, import all symbols defined outside the shared library whenever possible.

However, this is not always possible, as for example when floating-point operations are performed in a shared library to be built. When such operations are encountered in any C code, the standard C compiler generates calls to functions to perform the actual operations. These functions are defined in the C library and are normally resolved in a manner invisible to the user when an a.out is created, since the cc command automatically causes the relocatable (non-shared) version of the C library to be searched. These floating-point routine references must be resolved at the time the shared library is being built. But, the symbols cannot be imported, because their names and usage are invisible.

The #objects noload directive mkshlib(CP) is provided to allow symbol references such as these to be resolved at the time the shared library is built, provided that the symbols are defined in another shared library. If there are unresolved references to symbols after the object files listed with the #objects directive have been link edited, the host shared libraries specified with the #objects noload directive are searched for absolute definitions of the symbols. The normal use of the directive is to search the shared version of the C library to resolve references to floating-point routines.

For this use, the syntax in the specification file would be

   #objects noload
      -lc_s

This would cause mkshlib to search for the host shared library libc_s.a in the default library locations and to use it to resolve references to any symbols left unresolved in the shared library being built. The -L option can be used to cause mkshlib to look for the specified library in other than the default locations.

Using or building a shared library

When building a shared library using #objects noload, you must make sure that for each symbol with an unresolved reference there is a version of the symbol with an absolute definition in the searched host shared libraries, before any relocatable version of that symbol. mkshlib will give a fatal error if this is not the case, because relocatable definitions do not have absolute addresses and therefore do not allow complete resolution of the target shared library.

When using a shared library built with references to symbols resolved from another shared library, both libraries must be specified on the cc command line. The dependent library must be specified on the command line before the libraries on which it depends. (See the section ``Building a shared library'' for more details.) If you provide a shared library which references symbols in another shared library, you should make sure that your documentation clearly states that users must specify both libraries when building a.out files.

Finally, it is possible to use #objects noload to resolve references to any symbols not defined in a shared library, as long as they are defined in some other shared library. Therefore, you are strongly encourage you to import as many symbols as possible and to use #objects noload only when absolutely necessary. Probably you will only need to use this feature to resolve references to floating-point routines generated by the C compiler.

However, importing symbols has several important benefits over resolving references through #objects noload. First, importing symbols is more flexible in that it allows you to define your own version of library routines. You can define your own versions with archive versions of a library. Preserving this ability with the shared versions helps maintain compatibility.

Importing symbols also helps prevent unexpected name space collisions. The link editor will complain about multiple definitions of a symbol, references to which are resolved through the #objects noload mechanism, if a user of the shared library also has an external definition of the symbol.

Finally, #objects noload has the drawback that both the library you build and all the libraries on which it depends must be available on all the systems. Anyone who wishes to create a.out files using your shared library will need to use the host shared libraries. Also, the targets of all the libraries must be available on all systems on which the a.out files are to be run.

Next topic: Providing compatibility with non-shared libraries
Previous topic: Use #hide and #export to limit externally visible symbols

© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003