|
Chapter 4Introduction to Perl Modules
CONTENTS
This chapter introduces you to the concepts behind references to Perl modules, packages, and classes. It also shows you how to create a few sample modules. What Is a Perl Module?A Perl module is a set of Perl code that acts like a library of function calls. The term module in Perl is synonymous with the word package. Packages are a feature of Perl 4, whereas modules are prevalent in Perl 5. You can keep all your reusable Perl code specific to a set of tasks in a Perl module. Therefore, all the functionality pertaining to one type of task is contained in one file. It's easier to build an application on these modular blocks. Hence, the word module applies a bit more than package. Here's a quick introduction to modules. Certain topics in this section will be covered in detail throughout the rest of the book. Read the following paragraphs carefully to get an overview of what lies ahead as you write and use your own modules. What is confusing is that the terms module and package are used interchangeably in all Perl documentation, and these two terms mean the very same thing. So when reading Perl documents, just think "package" when you see "module" and vice versa. So, what's the premise for using modules? Well, modules are there to package (pardon the pun) variables, symbols, and interconnected data items together. For example, using global variables with very common names such as $k, $j, or $i in a program is generally not a good idea. Also, a loop counter, $i, should be allowed to work independently in two different portions of the code. Declaring $i as a global variable and then incrementing it from within a subroutine will create unmanageable problems with your application code because the subroutine may have been called from within a loop that also uses a variable called $i. The use of modules in Perl allows variables with the same name to be created at different, distinct places in the same program. The symbols defined for your variables are stored in an associative array, referred to as a symbol table. These symbol tables are unique to a package. Therefore, variables of the same name in two different packages can have different values. Each module has its own symbol table of all symbols that are declared within it. The symbol table basically isolates synonymous names in one module from another. The symbol table defines a namespace, that is, a space for independent variable names to exist in. Thus, the use of modules, each with its own symbol table, prevents a variable declared in one section from overwriting the values of other variables with the same name declared elsewhere in the same program. As a matter of fact, all variables in Perl belong to a package. The variables in a Perl program belong to the main package. All other packages within a Perl program either are nested within this main package or exist at the same level. There are some truly global variables, such as the signal handler array %SIG, that are available to all other modules in an application program and cannot be isolated via namespaces. Only those variable identifiers starting with letters or an underscore are kept in a module's symbol table. All other symbols, such as the names STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, Inc, and SIG are forced to be in package _main. Switching between packages affects only namespaces. All you are doing when you use one package or another is declaring which symbol table to use as the default symbol table for lookup of variable names. Only dynamic variables are affected by the use of symbol tables. Variables declared by the use of the my keyword are still resolved with the code block they happen to reside in and are not referenced through symbol tables. In fact, the scope of a package declaration remains active only within the code block it is declared in. Therefore, if you switch symbol tables by using a package within a subroutine, the original symbol table in effect when the call was made will be restored when the subroutine returns. Switching symbol tables affects only the default lookup of dynamic variable names. You can still explicitly refer to variables, file handles, and so on in a specific package by prepending a packageName:: to the variable name. You saw what a package context was when using references in Chapter 3. A package context simply implies the use of the symbol table by the Perl interpreter for resolving variable names in a program. By switching symbol tables, you are switching the package context. Modules can be nested within other modules. The nested module can use the variables and functions of the module it is nested within. For nested modules, you would have to use moduleName::nestedModuleName and so on. Using the double colon (::) is synonymous with using a back quote (`). However, the double colon is the preferred, future way of addressing variables within modules. Explicit addressing of module variables is always done with a complete reference. For example, suppose you have a module, Investment, which is the default package in use, and you want to address another module, Bonds, which is nested within the Investment module. In this case, you cannot use Bond::. Instead, you would have to use Investment::Bond:: to address variables and functions within the Bond module. Using Bond:: would imply the use of a package Bond that is nested within the main module and not within the Investment module. The symbol table for a module is actually stored in an associative array of the module's names appended with two colons. The symbol table for a module called Bond will be referred to as the associative array %Bond::. The name for the symbol table for the main module is %main::, and can even be shortened to %::. Similarly, all nested packages have their symbols stored in associative arrays with double colons separating each nesting level. For example, in the Bond module that is nested within the Investment module, the associative array for the symbols in the Bond module will be named %Investment::Bond::. A typeglob is really a global type for a symbol name. You can perform aliasing operations by assigning to a typeglob. One or more entries in an associative array for symbols will be used when an assignment via a typeglob is used. The actual value in each entry of the associative array is what you are referring to when you use the *variableName notation. Thus, there are two ways of referring to variable names in a package: *Investment::money = *Investment::bills; In the first method, you are referring to the variables via a typeglob reference. The use of the symbol table, %Investment::, is implied here, and Perl will optimize the lookup for symbols money and bills. This is the faster and preferred way of addressing a symbol. The second method uses a lookup for the value of a variable addressed by 'money' and 'bills' in the associative array used for symbols, %Investment:: explicitly. This lookup would be done dynamically and will not be optimized by Perl. Therefore, the lookup will be forced to check the associative array every time the statement is executed. As a result, the second method is not efficient and should be used only for demonstration of how the symbol table is implemented internally. Another example in this statement *kamran = *husain; causes variables, subroutines, and file handles that are named via the symbol kamran to also be addressed via the symbol husain. That is, all symbol entries in the current symbol table with the key kamran will now contain references to those symbols addressed by the key husain. To prevent such a global assignment, you can use explicit references. For example, the following statement will let you address the contents of $husain via the variable $kamran: *kamran = \$husain; However, any arrays such @kamran and @husain will not be the same. Only what the references specified explicitly will be changed. To summarize, when you assign one typeglob to another, you affect all the entries in a symbol table regardless of the type of variable being referred to. When you assign a reference from one variable type to another, you are only affecting one entry in the symbol table. A Perl module file has the following format: package ModuleName; The filename has to be called ModuleName.pm. The name of a module must end in the string .pm by convention. The package statement is the first line of the file. The last line of the file must contain the line with the 1; statement. This in effect returns a true value to the application program using the module. Not using the 1; statement will not let the module be loaded correctly. The package statement tells the Perl interpreter to start with a new namespace domain. Basically, all your variables in a Perl script belong to a package called main. Every variable in the main package can be referred to as $main'variable. Here's the syntax for such references: $packageName'variableName The single quote (') is synonymous with the double colon (::) operator. I cover more uses of the :: operator in the next chapter. For the time being, you must remember that the following two statements are equivalent: $packageName'variableName; The double-colon syntax is considered standard in the Perl world. Therefore, to preserve readability, I use the double-colon syntax in the rest of this book unless it's absolutely necessary to make exceptions to prove a point. The default use of a variable name defers to the current package active at the time of compilation. Thus, if you are in the package Finance.pm and specify a variable $pv, the variable is actually equal to $Finance::$pv. Using Perl Modules: use vs. requireYou include Perl modules in your program by using the use or the require statement. Here's the way to use either of these statements: use ModuleName; Note that the .pm extension is not used in the code shown above. Also note that neither statement allows a file to be included more than once in a program. The returned value of true (1;) as the last statement is required to let Perl know that a required or used module loaded correctly and lets the Perl interpreter ignore any reloads. In general, it's better to use the use Module; statement than the require Module; statement in a Perl program to remain compatible with future versions of Perl. For modules, you might want to consider continuing to use the require statement. Here's why: The use statement does a little bit more work than the require statement in that it alters the namespace of the module that includes another module. You want this extra update of the namespace to be done in a program. However, when writing code for a module, you may not want the namespace to be altered unless it's explicitly required. In this event, you will use the require statement. The require statement includes the full pathname of a file in the @Inc array so that the functions and variables in the module's file are in a known location during execution time. Therefore, the functions that are imported from a module are imported via an explicit module reference at runtime with the require statement. The use statement does the same thing as the require statement because it updates the @Inc array with full pathnames of loaded modules. The code for the use function also goes a step further and calls an import function in the module being used to explicitly load the list of exported functions at compile time, thus saving the time required for an explicit resolution of a function name during execution. Basically, the use statement is equivalent to require ModuleName; import ModuleName [list of imported functions]; The use of the use statement does change your program's namespace because the imported function names are inserted in the symbol table. The require statement does not alter your program's namespace. Therefore, the following statement use ModuleName (); is equivalent to this statement: require ModuleName; Functions are imported from a module via a call to a function called import. You can write your own import function in a module, or you can use the Exporter module and use its import function. In almost all cases, you will use the Exporter module to provide an import function instead of reinventing the wheel. (You'll learn more on this in the next section.) Should you decide not to use the Exporter module, you will have to write your own import function in each module that you write. It's much easier to simply use the Exporter module and let Perl do the work for you. The Sample Letter.pm ModuleThe best way to illustrate the semantics of how a module is used in Perl is to write a simple module and show how to use it. Let's take the example of a local loan shark, Rudious Maximus, who is simply tired of typing the same "request for payment" letters. Being an avid fan of computers and Perl, Rudious takes the lazy programmer's approach and writes a Perl module to help him generate his memos and letters. Now, instead of typing within fields in a memo template file, all he has to do is type a few lines to produce his nice, threatening note. Listing 4.1 shows you what he has to type. Listing 4.1. Using the Letter module. 1 #!/usr/bin/perl -w The use Letter; statement is present to force the Perl interpreter to include the code for the module in the application program. The module should be located in the /usr/lib/perl5/ directory, or you can place it in any directory listed in the @Inc array. The @Inc array is the list of directories that the Perl interpreter will look for when attempting to load the code for the named module. The commented line (number 4) shows how to add the current working directory to include the path. The next four lines in the file generate the subject matter for the letter. Here's the output from using the Letter module: To: Mr. Gambling Man The Letter module file is shown in Listing 4.2. The name of the package is declared in the first line. Because this module's functions will be exported, I use the Exporter module. Therefore, the statement use Exporter; is required to inherit functionality from the Exporter module. Another required step is putting the word Exported in the @ISA array to allow searching for Exported.pm.
Let's now look at the code for Letter.pm in Listing 4.2. Listing 4.2. The Letter.pm module. 1 package Letter; Lines containing the equal sign are used for documentation. You must document each module for your own reference; Perl modules do not need to be documented, but it's a good idea to write a few lines about what your code does. A few years from now, you may forget what a module is about. Good documentation is always a must if you want to remember what you did in the past! I cover documentation styles used for Perl in Chapter 8, "Documenting Perl Scripts." For this sample module, the =head1 statement begins the documentation. Everything up to the =cut statement is ignored by the Perl interpreter. Next, the module lists all the functions exported by this module in the @EXPORT array. The @EXPORT array defines all the function names that can be called by outside code. If you do not list a function in this @EXPORT array, it won't be seen by external code modules. Following the @EXPORT array is the body of the code, one subroutine at a time. After all the subroutines are defined, the final statement 1; ends the module file. 1; must be the last executable line in the file. Let's look at some of the functions defined in this module. The first function to look at is the simple Date function, lines 43 to 46, which prints the current UNIX date and time. There are no parameters to this function, and it doesn't return anything meaningful back to the caller. Note the use of my before the $date variable in line 44. The my keyword is used to limit the scope of the variable to within the Date function's curly braces. Code between curly braces is referred to as a block. Variables declared within a block are limited in scope to within the curly braces. In 49 and 50, the local variables $name and $subject are visible to all functions. You can also declare variables with the local qualifier. The use of local allows a variable to be in scope for the current block as well as for other blocks of code called from within this block. Thus, a local $x declared within one block is visible to all subsequent blocks called from within this block and can be referenced. In the following sample code, the ToTitled function's $name variable can be accessed but not the data in $iphone: 1 sub Letter::ToTitled { Subroutines and Passing ParametersThe sample code for Letter.pm showed how to extract one parameter at a time. The subroutine To() takes two parameters to set up the header for the memo. Using functions within a module is not any different than using and defining Perl modules within the same code file. Parameters are passed by reference unless otherwise specified. Multiple arrays passed into a subroutine, if not explicitly dereferenced using the backslash, are concatenated. The @_ input array in a function is always an array of scalar values. Passing values by reference is the preferred way in Perl to pass a large amount of data into a subroutine. (See Chapter 3, "References.") Another Sample Module: FinanceThe Finance module, shown in Listing 4.3, is used to provide simple calculations for loan values. Using the Finance module is straightforward. All the functions are written with the same parameters, as shown in the formula for the functions. Let's look at how the future value of an investment can be calculated. For example, if you invest some dollars, $pv, in a bond that offers a fixed percentage rate, $r, applied at known intervals for $n time periods, what is the value of the bond at the time of its expiration? In this case, you'll be using the following formula: $fv = $pv * (1+$r) ** $n ; The function to get the future value is declared as FutureValue. Refer to Listing 4.3 to see how to use it. Listing 4.3. Using the Finance module. 1 #!/usr/bin/perl -w Here is sample input and output of Listing 4.3. $ testme The revelation in the output is the result of the comparison of values between $fv1 and $fv2. The $fv1 value is calculated with the application of interest once every year over the life of the bond. $fv2 is the value if the interest is applied every month at the equivalent monthly interest rate. The Finance.pm package is shown in Listing 4.4 in its early development stages. Listing 4.4. The Finance.pm package. 1 package Finance; Look at the declaration of the function FutureValue with ($$$). The three dollar signs together signify three scalar numbers being passed into the function. This extra scoping is present for validating the type of the parameters passed into the function. If you were to pass a string instead of a number into the function, you would get a message very similar to this one: Too many arguments for Finance::FutureValue at ./f4.pl line 15, near "$time)" The use of prototypes when defining functions prevents you from sending in values other than what the function expects. Use @ or % to pass in an array of values. If you are passing by reference, use \@ or \% to show a scalar reference to an array or hash, respectively. If you do not use the backslash, all other types in the argument list prototype are ignored. Other types of disqualifiers include an ampersand for a reference to a function, an asterisk for any type, and a semicolon to indicate that all other parameters are optional. Now, let's look at the lastMovingAverage function declaration, which specifies two integers in the front followed by an array. The way the arguments are used in the function is to assign a value to each of the two scalars, $count and $number, whereas everything else is sent to the array. Look at the function getMovingAverage() to see how two arrays are passed in order to get the moving average on a list of values. The way to call the getMovingAverage function is shown in Listing 4.5. Listing 4.5. Using the moving average function. 1 #!/usr/bin/perl -w Here's the output from Listing 4.5: Values to work with = { 12 22 23 24 21 23 24 23 23 21 29 27 26 28 } The getMovingAverage() function takes two scalars and then two references to arrays as scalars. Within the function, the two scalars to the arrays are dereferenced for use as numeric arrays. The returned set of values is inserted in the area passed in as the second reference. Had the input parameters not been specified with \@ for each referenced array, the $movingAve array reference would have been empty and would have caused errors at runtime. In other words, the following declaration is not correct: sub getMovingAve($$@@) The resulting spew of error messages from a bad function prototype is as follows: Use of uninitialized value at Finance.pm line 128. This is obviously not the correct output. Therefore, it's critical that you pass by reference when sending more than one array. Global variables for use within the package can also be declared. Look at the following segment of code from the Finance.pm module to see what the default value of the Interest variable would be if nothing was specified in the input. (The current module requires the interest to be passed in, but you can change this.) Here's a little snippet of code that can be added to the end of the program shown in Listing 4.5 to add the ability to set interest rates. 20 local $defaultInterest = 5.0; The local variable $defaultInterest is declared in line 20. The subroutine SetInterest to modify the rate is declared in lines 21 through 26. The $rate variable uses the values passed into the subroutine and simply assigns a positive value for it. You can always add more error checking if necessary. To access the defaultInterest variable's value, you could define either a subroutine that returns the value or refer to the value directly with a call to the following in your application program: $Finance::defaultInterest; Returned Values from Subroutines in a PackageThe variable holding the return value from the module function is declared as my variable. The scope of this variable is within the curly braces of the function only. When the called subroutine returns, the reference to my variable is returned. If the calling program uses this returned reference somewhere, the link counter on the variable is not zero; therefore, the storage area containing the returned values is not freed to the memory pool. Thus, the function that declares my $pv and then later returns the value of $pv returns a reference to the value stored at that location. If the calling routine performs a call like this one: Finance::FVofAnnuity($monthly,$rate,$time); there is no variable specified here into which Perl stores the returned reference; therefore, any returned value (or a list of values) is destroyed. Instead, the call with the returned value assigned to a local variable, such as this one: $fv = Finance::FVofAnnuity($monthly,$rate,$time); maintains the variable with the value. Consider the example shown in Listing 4.6, which manipulates values returned by functions. Listing 4.6. Sample usage of the my function. 1 #!/usr/bin/perl -w Here is sample input and output for this function: $ testme Multiple InheritanceModules implement classes in a Perl program that uses the object-oriented features of Perl. Included in object-oriented features is the concept of inheritance. (You'll learn more on the object-oriented features of Perl in Chapter 5, "Object-Oriented Programming in Perl.") Inheritance means the process with which a module inherits the functions from its base classes. A module that is nested within another module inherits its parent modules' functions. So inheritance in Perl is accomplished with the :: construct. Here's the basic syntax: SuperClass::NextSubClass:: ... ::ThisClass. The file for these is stored in ./SuperClass/NextSubClass/ . Each double colon indicates a lower-level directory in which to look for the module. Each module, in turn, declares itself as a package with statements like the following: package SuperClass::NextSubClass; For example, say that you really want to create a Money class with two subclasses, Stocks and Finance. Here's how to structure the hierarchy, assuming you are in the /usr/lib/perl5 directory:
The Perl script that gets the moving average for a series of numbers is presented in Listing 4.7. Listing 4.7. Using inheriting modules. 1 #!/usr/bin/perl -w Lines 2 through 4 add the path to the Money subdirectory. The use statement in line 5 now addresses the Finance.pm file in the ./Money subdirectory. The calls to the functions within Finance.pm are now called with the prefix Money::Finance:: instead of Finance::. Therefore, a new subdirectory is shown via the :: symbol when Perl is searching for modules to load. The Money.pm file is not required. Even so, you should create a template for future use. Actually, the file would be required to put any special requirements for initialization that the entire hierarchy of modules uses. The code for initialization is placed in the BEGIN() function. The sample Money.pm file is shown in Listing 4.8. Listing 4.8. The superclass module for Finance.pm. 1 package Money; To see the line of output from the printf statement in line 5, you have to insert the following commands at the beginning of your Perl script: use Money; To use the functions in the Stocks.pm module, you use this line: use Money::Stocks; The Stocks.pm file appears in the Money subdirectory and is defined in the same format as the Finance.pm file, with the exceptions that use Stocks is used instead of use Finance and the set of functions to export is different. The Perl Module LibrariesA number of modules are included in the Perl distribution. Check the /usr/lib/perl5/lib directory for a complete listing after you install Perl. There are two kinds of modules you should know about and look for in your Perl 5 release, Pragmatic and Standard modules. Pragmatic modules, which are also like pragmas in C compiler directives, tend to affect the compilation of your program. They are similar in operation to the preprocessor elements of a C program. Pragmas are locally scoped so that they can be turned off with the no command. Thus, the command no POSIX ; turns off the POSIX features in the script. These features can be turned back on with the use statement. Standard modules bundled with the Perl package include several functioning packages of code for you to use. Refer to appendix B, "Perl Module Archives," for a complete list of these standard modules. To find out all the .pm modules installed on your system, issue the following command. (If you get an error, add the /usr/lib/perl5 directory to your path.) find /usr/lib/perl5 -name perl "*.pm" -print Extension ModulesExtension modules are written in C (or a mixture of Perl and C) and are dynamically loaded into Perl if and when you need them. These types of modules for dynamic loading require support in the kernel. Solaris lets you use these modules. For a Linux machine, check the installation pages on how to upgrade to the ELF format binaries for your Linux kernel. What Is CPAN?The term CPAN (Comprehensive Perl Archive Network) refers to all the hosts containing copies of sets of data, documents, and Perl modules on the Net. To find out about the CPAN site nearest you, search on the keyword CPAN in search engines such as Yahoo!, AltaVista, or Magellan. A good place to start is the www.metronet.com site. SummaryThis chapter introduced you to Perl 5 modules and described what they have to offer. A more comprehensive list is found on the Internet via the addresses shown in the Web sites http://www.metronet.com and http://www.perl.com. A Perl package is a set of Perl code that looks like a library file. A Perl module is a package that is defined in a library file of the same name. A module is designed to be reusable. You can do some type checking with Perl function prototypes to see whether parameters are being passed correctly. A module has to export its functions with the @EXPORT array and therefore requires the Exporter module. Modules are searched for in the directories listed in the @Inc array. Obviously, there is a lot more to writing modules for Perl than what is shown in this chapter. The simple examples in this chapter show you how to get started with Perl modules. In the rest of the book I cover the modules and their features, so hang in there. I cover Perl objects, classes, and related concepts in Chapter 5.
|
|||||||||||||||||||
With any suggestions or questions please feel free to contact us |