![]() |
![]()
![]() ![]() ![]()
![]()
|
![]() |
Chapter 25Perl Internal Files and Structures
CONTENTS
This chapter introduces some of Perl's internal data structures, and the information presented here serves as a reference for the rest of the book. This chapter will be useful not only for programmers who want to add their own extensions to Perl but also for those who simply want to look at the Perl source code to see what's "under the hood." IntroductionPerl is written primarily in C and has libraries to which you can link in your own C/C++ code. To perform this linking, however, your programs have to know how Perl stores its own data structures as well as how to interpret Perl's data types. The information presented in this chapter also will show you where to look for files, structures, and so on. There will be times when you are writing extensions or modules that you'll need to look up specific data structure definitions. The functions defined here are called from your extension's C sources. Version 5.002b was the latest Perl release at the time this book was written. The b stands for beta; therefore, some changes to the source tree are quite possible. What you see here will not only be a snapshot in time of the source tree for 5.002b, but will also serve as a basis for you to do your own research. The information in this chapter is about the functions you can call from C functions that interact with Perl variables. Your C code could be calling the Perl functions, or your Perl function could be calling your C code as part of an extension. The C functions have to be linked with the Perl libraries and also require the header files in your Perl distribution. The compiler is guaranteed to work with Perl on almost all platforms in the GNU C compiler. If you have problems compiling with other commercial compilers, then get the GNU compiler from the Net. A good place to try is the oak.oakland.edu ftp site. Exploring Perl Source Code
This section covers some of the header files in your Perl distribution.
Table 25.1 provides a brief description of what the ones covered
here contain. You can track the values or specific definitions
by starting from these header files.
The source files in the Perl distribution are as follows. They come with very sparse comments.
The name of each file gives a hint as to what the code in the file does. Run head *.c > text to get a list of the headers for the files. Now that you know a little about what source files to consult, you're ready to learn about the building blocks of Perl programs: the variables. Perl Variable TypesPerl has three basic data types: scalars, arrays, and hashes. Perl enables you to have references to these data types as well as references to subroutines. Most references use scalar values to store their values, but you can have arrays of arrays, arrays of references, and so on. It's quite possible to build complicated data structures using the three basic types in Perl. Variables in Perl programs can even have two types of values, depending on how they are interpreted. For instance, $i can be an integer when used in a numeric operation, and $i is a string when used in a string operation. Another example is the $!, which is the errno code when used as a number but a string when used within a print statement. Naming ConventionsBecause variables internal to Perl source code can have many types of values and definitions, the name must be descriptive enough to indicate what type it is. By convention, there are three tokens in Perl source code variable names: arrays, hashes, and scalar variables. A scalar variable can be further qualified to define the type of value it holds. The list of token prefixes for these Perl types are shown in the following list:
If you see SV in a function or variable name, the function is probably working on a scalar item. The convention is followed closely in the Perl source code, and you should be able to glean the type of most variable names as you scan through the code. Function names in the source code can begin with sv_ for scalar variables and related operations, av_ for array variables, and hv_ for hashes. Let's now cover these variable types and the functions to manipulate them. Scalars and Scalar FunctionsScalar variables in the Perl source are those with SV in their names. A scalar variable on a given system is the size of a pointer or an integer, whichever is larger. Specific types of scalars exist to specify numbers such as IV for integer or pointer, NV for doubles, and so on. The SV definition is really a typedef declaration of the sv structure in the header file called sv.h. NV, IV, PV, I32, and I16 are type-specific definitions of SV for doubles, generic pointers, strings, and 32- and 16-bit numbers. Floating-point numbers and integers in Perl are stored as doubles. Thus, a variable with NV will be a double that you can cast in a C program to whatever type you want. Four types of routines exist to create an SV variable. All four return a pointer to a newly created variable. You call these routines from within an XS Perl extension file:
The way to read these function declarations is as follows. Take the newSViv(IV) declaration, for example. The new portion of the declaration asks Perl to create a new object. The SV indicates a scalar variable. The iv indicates a specific type to create: iv for integer, nv for double, pv for a string of a specified length, and sv for all other types of scalars. Three functions exist to get the value stored in an SV. The type of value returned depends on what type of value was set at the time of creation:
You can modify the value contained in an already existing SV by using the following functions:
Perl does not keep NULL-terminated strings like C does. In fact, Perl strings can have multiple NULLs in them. Perl tracks strings by a pointer and the length of the string. Strings can be modified in one of these ways: void sv_setpvn(SV* ptr, char* anyt , int len); void sv_setpv(SV* ptr, char* nullt); SvGROW(SV* ptr, STRLEN newlen); void sv_catpv(SV* ptr, char*); void sv_catpvn(SV* ptr, char*, int); void sv_catsv(SV*dst, SV*src); Your C program using these programs will crash if you are not careful enough to check whether these variables exist. To check whether a scalar variable exists, you can call these functions:
A value of FALSE received from these functions means that the variable does not exist. You can only get two returned values, either TRUE or FALSE, from the functions that check whether a variable is a string, integer, or double. The SvTRUE(SV*) macro returns 0 if the value pointed at by SV is an integer zero or if SV does not exist. Two other global variables, sv_yes and sv_no, can be used instead of TRUE and FALSE, respectively.
You can get a pointer to an existing scalar by specifying its variable name in the call to the function: SV* perl_get_sv("myScalar", FALSE); The FALSE parameter requests the function to return sv_undef if the variable does not exist. If you specify a TRUE value as the second parameter, a new scalar variable is created for you and assigned the name myScalar in the current name space. In fact, you can use package names in the variable name. For example, the following call creates a variable called desk in the VRML package: SV *desk; Now let's look at collections of scalars: arrays. Array FunctionsThe functions for handling array variables are similar in operation to those for scalar variables. To create an array called myarray, you would use this call: AV *myarray = (AV* ) newAV(); To get the array by specifying the name, you can also use the following function. This perl_get_av() returns NULL if the variable does not exist: AV* perl_get_av(char *myarray, bool makeIt); The makeIt variable can be set to TRUE if you want the array created, and FALSE if you are merely checking for its existence and do not want the array created if it does not exist. To initialize an array at the time of creation, you can use the av_make() function. Here's the syntax for the av_make() function: AV *myarray = (AV *)av_make(I32 num, SV **data); The num parameter is the size of the AV array, and data is a pointer to an array of pointers to scalars to add to this new array called myarray. Do you see how the call uses pointers to SV, rather than SVs? The added level of indirection permits Perl to store any type of SV in an array. So, you can store strings, integers, and doubles all in one array in Perl. The array passed into the av_make() function is copied into a new memory area; therefore, the original data array does not have to persist. Check the av.c source file in your Perl distribution for more details on the functions and their parameters. Here is a quick list of the functions you would most likely perform on AVs. void av_push(AV *ptr, SV *item); SV* av_pop(AV *ptr); SV* av_shift(AV *ptr); void av_unshift(AV *ptr, I32 num); I32 av_len(AV *ptr); SV** av_fetch(AV *ptr, I32 offset, I32 lval); SV** av_store(AV *ptr, I32 key, SV* item); void av_clear(AV *ptr); void av_undef(AV *ptr); void av_extend(AV *ptr, I32 size); Hash FunctionsHash variables have HV in their names and are created in a manner similar to creating array functions. To create an HV type, you call this function: HV* newHV(); Here's how to use an existing hash function and refer to it by name: HV* perl_get_hv("myHash", FALSE); The function returns NULL if the variable does not exist. If the hash does not already exist and you want Perl to create the variable for you, use: HV* perl_get_hv("myHash", TRUE); As with the AV type, you can perform the following functions on an HV type of variable:
Check the file hv.c in your Perl distribution for the function source file for details about how the hash function is defined. Both of the previous functions return pointers to pointers. The return value from either function will be NULL. The following functions are defined in the source file: bool hv_exists(HV*, char* key, U32 klen); SV* hv_delete(HV*, char* key, U32 klen, I32 flags); void hv_clear(HV*); void hv_undef(HV*); You can iterate through the hash table using indexes and pointers to hash table entries using the HE pointer type. To iterate through the array (such as with the each command in Perl), you can use hv_iterinit(HV*) to set the starting point and then get the next item as an HE pointer from a call to the hv_iternext(HV*) function. To get the item being traversed, make a call to this function: SV* hv_iterval(HV* hashptr, HE* entry); The next SV is available via a call to this function: SV* hv_iternextsv(HV*hptr, char** key, I32* retlen); The key and retlen arguments are return values for the key and its length. See line 600 in the hv.c. Mortality of VariablesValues in Perl exist until explicitly freed. They are freed by the Perl garbage collector when the reference count to them is zero, by a call to the undef function, or if they were declared local or my and the scope no longer exists. In all other cases, variables declared in one scope persist even after execution has left the code block in which they were declared. For example, declaring and using $a in a function keeps $a in the main program even after returning from the subroutine. This is why it's necessary to create local variables in subroutines using the my keyword so that the Perl interpreter will automatically destroy these variables, which will no longer be used after the subroutine returns. References to variables in Perl can also be modified using the following functions: int SvREFCNT(SV* sv); void SvREFCNT_inc(SV* sv); void SvREFCNT_dec(SV* sv); Because the values declared within code blocks persist for a long time, they are referred to as immortal. Sometimes declaring and creating variable names in code blocks have the side effect of persisting even if you do not want them to. When writing code that declares and creates such variables, it's a good idea to create variables that you do not want to persist as mortal; that is, they die when code leaves the current scope. The functions that create a mortal variable are as follows:
To create AV and HV types, you have to cast the input parameters to and from these three functions as AV* and HV*. Subroutines and StacksPerl subroutines use the stack to get and return values to the callers. Chapter 27, "Writing Extensions in C," covers how the stack is manipulated. This section describes the functions available for you to manipulate the stack.
Arguments on a stack to a Perl function are available via the ST(n) macro set. The topmost item on the stack is ST(0), and the mth one is ST(m-1). You may assign the return value to a static value, like this: SV *arg1 = ST(1); // Assign argument[1] to arg1; You can even increase the size of the argument stack in a function. (This is necessary if you are returning a list from a function call, for example. I cover this in more detail in Chapter 27.) To increase the length of the stack, make a call to the macro: EXTEND(sp, num); sp is the stack pointer and num is an extra number of elements to add to the stack. You cannot decrease the size of the stack. To add items to the stack, you have to specify the type of variable you're adding. Four functions are available for four of the most generic types to push:
If you want the stack to be adjusted automatically, make the calls to these macros:
These macros are a bit slower but simpler to use. In Chapter 27, you'll see how to use stacks in the section titled "The typemap File." Basically, a typemap file is used by the extensions compiler xsubpp for the rules to convert from Perl's internal data types (hash, array, and so on) to C's data types (int, char *, and so on). These rules are stored in the typemap file in your Perl distribution's ./lib/ExtUtils directory. The definitions of the structures in the typemap file are specified in the internal format for Perl. What Is Magic?As you go through the online docs and the source for Perl, you'll often see the word magic. The mysterious connotations of this word are further enhanced by the almost complete lack of documentation on what magic really is. In order to understand the phrases "then magic is applied to whatever" or "automagically [sic]" in the Perl documentation, you have to know what "magic" in Perl really means. Perhaps after reading this section, you will have a better feel for Perl internal structures and actions of the Perl interpreter. Basically, a scalar value in Perl can have special features for it to become "magical." When you apply magic to a variable, that variable is placed into a linked list for methods. A method is called for each type of magic assigned to that variable when a certain action takes place, such as retrieving or storing the contents of the variable. Please refer to a comparable scheme in Perl when using the tie() function as described in Chapter 6, "Binding Variables to Objects." When tie-ing a variable to an action, you are defining actions to take when a scalar is accessed or when an array item is read from. In the case of magic actions of a scalar, you have a set of magic methods that are called when the Perl interpreter takes a similar action on (like getting a value from or putting a value into) a scalar variable. To check whether a variable has magic methods associated with it, you can get the flags for it using the SvFLAGS(sv) macro. The (sv) here is the name of the variable. The SvMAGIC(variable) macro returns the list of methods that are magically applied to the variable. The SvTYPE() of the variable is SVt_PVMG if it has a list of methods. A normal SV type is upgraded to a magical status by Perl if a method is requested for it. The structure to maintain the list is found in the file mg.h in the Perl source files: struct magic {
The mg_type value sets up
how the magic function is applied. The following items are used
in the magic table. You can see the values in use in the sv.c
file at about line 1950. Table 25.2, which has been constructed
from the switch statement,
tells you how methods have to be applied.
The magic virtual tables are defined in embed.h. The mg_virtual field in each magic entry is assigned to the address of the virtual table. Each entry in the magic virtual table has five items, each of which is defined in the following structure in the file mg.h: struct mgvtbl { The svt_get() function is called when the data in SV is retrieved. The svt_set() function is called when the data in SV is stored. The svt_len() function is called when the length of the string is changed. The svt_clear() function is called when SV is cleared, and the svt_free() function is called when SV is destroyed. All tables shown in the perl.h file are assigned mgvtbl structures. The values in each mgvtbl structure for each item in a table define a function to call when an action that affects entries in this table is taken by the Perl interpreter. Here is an excerpt from the file: EXT MGVTBL vtbl_sv = The vbtl_sv is set to call three methods: magic_get(), magic_set(), and magic_len() for the magic entries in sv. The zeros for vtbl_sig indicate that no magic methods are called. The Global Variable (GV) TypeIf you are still awake, you'll notice a reference to GV in the source file. GV stands for global variable, and the value stored in GV is any data type from scalar to a subroutine reference. GV entries are stored in a hash table, and the keys to each entry are the names of the symbols being stored. A hash table with GV entries is also referred to as a stash. Internally, a GV type is the same as an HV type. Keys in a stash are also package names, with the data item pointing to other GV tables containing the symbol within the package. Where to Look for More InformationMost of the information in this chapter has been gleaned from source files or the online documents on the Internet. There is a somewhat old file called perlguts.html by Jeff Okamoto (e-mail okamoto@corp.hp.com) in the www.metronet.com archives that has the Perl API functions and information about the internals. Note that the perlguts.html file was dated 1/27/1995, so it's probably not as up-to-date as you would like. Please refer to the perlguts.html or the perlguts man page for a comprehensive listing of the Perl API. If you want a listing of the functions in the Perl source code and the search strings, use the ctags *.c command on all the .c files in the Perl source directory. The result will be a very long file (800 lines), called tags, in the same directory. A header of this file is shown here: ELSIF perly.c /^"else : ELSIF '(' expr ')' block else",$/ If you are a vi hack, you can type :tag functionName to go to the line and file immediately from within a vi session. Ah, the old vi editor still has a useful function in this day and age. emacs users can issue the command etags *.c and get a comparable tags file for use with the M-x find-tag command in emacs. SummaryThis chapter is a reference-only chapter to prepare you for what lies ahead in the rest of the book. You'll probably be referring to this chapter quite a bit as you write include extensions. There are three basic types of variables in Perl: SV for scalar, AV for arrays, and HV for hash values. Macros exist for getting data from one type to another. You'll need to know about these internal data types if you're going to be writing Perl extensions, dealing with platform-dependent issues, or (ugh) embedding C code in Perl and vice versa. The dry information in this chapter will serve you well in the rest of this book.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
With any suggestions or questions please feel free to contact us |