Chapter 26 -- Writing C Extensions in Perl

Chapter 26 Writing C Extensions in Perl

Introduction
Compiling a C Program That Calls Perl
- What Is ExtUtils::embed?
- Using the embed.pm Package
Adding a Perl Interpreter to Your C Program
Calling Perl Subroutines from Within a C Program
Working with the Perl Stack
The Flags to Use
Using SCALAR Context
Returning Lists from Subroutines
Using G_EVAL
Getting Special Variable Values
Using the ST Macros
Evaluating Perl Expressions
Pattern Matches and Substitutions from Your C Program
Summary

This chapter introduces a way to embed the Perl interpreter into a C program. After reading this chapter, you should be able to integrate the Perl interpreter or its library code with a C application. The information in this chapter relies heavily on the discussion of Perl's internal data types from Chapter 25, "Perl Internal Files and Structures."

Introduction

The Perl interpreter is written in C and can therefore be linked with C code. The combination process is relatively straightforward. However, there are some caveats that you should be aware of. I discuss these caveats in this chapter. Also, even though you can combine C and Perl code, you might want to rethink the way you want to use each language. Both C and Perl have their strong points. C code can be optimized to a greater degree than can Perl code. Perl is great for processing text files and strings, whereas manipulation strings in C is a bit clumsy at times.

Several alternatives exist for combining C and Perl code. You can use extensions, call complete programs, run a background process, and so on. For example, if you want to create a library of mathematical functions to use in your Perl programs, you can write an extension and use it from within your Perl code. If the functionality you need from a C program can be encapsulated into an executable program, you can call this program from within a system call. The system call will force your Perl program to wait until the called program terminates. You can start the executable program as a background process from the command line or from within a Perl program using a fork system call.

If one of these methods will solve your problem, you may not have to take the more complicated route of embedding Perl in C code. Think through your design thoroughly before deciding which solution is best. If you feel that your design requires you to write a C program and also use the functionality of Perl from within your program, you are left with no alternative except to follow the more complicated route.

The first question to answer is Whether the solution you are deriving is based for the UNIX platform? Most of the tested situations to date for combining C code with Perl are those on the UNIX platform. Such a combination is therefore not viable for non-UNIX platforms. If you intend to port the combined code to an NT system, it simply won't compile, let alone work. Therefore, if you've concluded that you must embed the Perl interpreter in your C code, keep in mind that you're also limiting your solution to UNIX platforms. Embedding Perl in C code on Windows will not work at the moment, so you might want to rethink your design to see whether you can obtain comparable functionality using extension libraries.

The second question to answer is What exactly is the solution you are trying to achieve by combining Perl with C? Again, if all you want do is use C code from within Perl, you might want to consider writing the C code as a Perl extension. In such an event, you should rethink your design to see how to use only portions of the C code as parts of an extension library for a Perl program. The methods to do just this are covered in Chapter 27, "Writing Extensions in C."

Normally you want to write extensions in C because the compiled code is faster. For example, complex mathematical calculations are probably best written in C rather than in Perl. The compiled C code runs much faster than interpreted code in Perl for specific tasks. Also, with the use of extensions, it's possible to send in pointers to variables, existing with the Perl program, into the extension code. By receiving pointers to the Perl variables, code within extensions can modify the contents of variables directly.

Of course, if you are attempting to run another complete C program from Perl, consider using the system(), backtick, or exec() calls. They are convenient ways of calling complete programs. With the open() call using pipes, it's easy to collect the standard output from the executed application. However, you are constrained to reading only the output from the child process you started. The child process cannot manipulate any Perl variables directly. The Perl code and the called code run as completely different applications in their own address spaces. This is in contrast to working with Perl's extensions that run C as part of the calling process. (Of course, you can take explicit measures to create a bi-directional pipe through which to communicate.)

The flexibility in the two methods outlined up to now in this section solves a majority of the problems solved using C and Perl. In both methods, though, the calling program is a Perl application. What if you want to call Perl code from within a C application? You can write the Perl code as an executable script and then make a system() call from within the C code. However, you'll be constrained by the same things as you were using the system() call from within Perl. There is no direct connection between the child process started with system() and the calling application, unless you take explicit measures to create a communication channel like a socket or pipe.

Sometimes you just have to call Perl code from within a C program. For example, the C language is not designed to process regular expressions. The functionality of parsing tokens from a string using the strtok() function may not be sufficient for your problem. There are ways to port some regular expression parsers like grep into C. (A good example is Allen Holub's port of grep in an article in Dr. Dobbs, October, 1984, and in The Dr. Dobbs Toolbox of C, Brady Books, 1986.) Perhaps all you need is simple regular expression parsing, in which case using a port of grep makes sense. The grep code you link will not have the overhead of the entire Perl preprocessor.

However, if all else fails and you do want the Perl interpreter in your C program, or at the very least want to call a Perl subroutine from within your C program, this chapter might provide the information you need. Here are the topics I will cover:

How to add a Perl interpreter to a C program
How to call a Perl subroutine from within a C program
How to use Perl pattern matches and string substitutions

Compiling a C Program That Calls Perl

First of all, make sure you have read access to the Perl 5 distribution including Perl header files and linking Perl libraries. Make sure that your Perl 5.002 source code distribution is complete and installed correctly. Copying the Perl executable program from another machine won't work because you need the whole source tree to work with your C program.

Essentially, the files you need are the EXTERN.h and perl.h header files. The Perl libraries should exist in a directory using the following format:

/usr/local/lib/perl5/your_architecture_here/CORE

Execute this statement for a hint about where to find CORE:

perl -e 'use Config; print $Config{archlib}\n'

On my Linux system, this program returned the following pathname (the path might be completely different for your machine):

/usr/lib/perl5/i486-linux/5.002

The command line to use for compiling and running a program is

gcc -o filename filename.c -L($LIBS) -I($IncS)

The -L and -I flags define the locations of the library and header files. The libraries that you are linking with are

perl -e 'use Config; print $Config{libs} , "\n";'

An easier way to find out which libraries you are using is to use the ExtUtils::embed module extensions. This library is available from any CPAN site and is in the public domain.

What Is `ExtUtils::embed`?

ExtUtils::embed is a set of utility functions used when embedding a Perl interpreter and extensions in your C/C++ applications. You do not have to use this library, but it does a lot of the basic grunt work (such as finding the correct libraries, include files, and definitions) for you. The author of the ExtUtils::embed package is Doug MacEachern (dougm@osf.org). You can address problems and comments to him directly.

Installing the package is easy. Simply untar the archive, which will create a directory called ExtUtils-embed-version_number. Change directory into this new directory and run these commands:

perl Makefile.PL make make test make install

Using the `embed.pm` Package

Usually, you'll place a call to the embed.pm function in a makefile. The instructions on how to do this are in the embed.pm package itself. In practice, however, I just took the values returned from a few calls to the embed.pm functions and placed them directly in a makefile. This way there was a written record in the makefile for the explicit pathnames used to build the programs I happen to be using. The risk was that if the location of the Perl library changed in the future, my makefile would break. This change is too unlikely to happen on my machine because I administer it. Your case might be different if you are on a multiuser system or are writing this makefile for the benefit of a group.

The one-line command to compile and link the ex1.c file in a makefile is

$gcc -o ex1 ex1.c 'perl -MExtUtils::embed -e ccopts -e ldopts'

In fact, this line can be placed in a shell script like this:

$gcc -o $1 $1.c 'perl -MExtUtils::embed -e ccopts -e ldopts'

Each execution of the makefile will cause the Perl script to recreate the include and link paths. This might slow down each execution of make. A make variable set to a constant value will probably let make process the makefile faster. I discuss this procedure in a moment. In any event, the returned values from the call to use the ExtUtils::embed function are used to define the link libraries and include files. The returned values can be used directly in makefiles should the Perl -e command not work or if you prefer to use a make variable.

Listing 26.1 presents a sample makefile for the Linux machine.

Listing 26.1. A sample makefile.

1 IncK= -D__USE_BSD_SIGNAL -Dbool=char -DHAS_BOOL Â-I/usr/local/include -rdynamic -I /usr/lib/perl5/i486-linux/5.002/CORE 2 LIBK = -L/usr/local/lib /usr/lib/perl5/i486-linux/5.002 Â/auto/DynaLoader/DynaLoader.a -L/usr/lib/perl5/i486-linux/5.002/CORE Â-lperl -lgdbm -ldbm -ldb -ldl -lm -lc -lbsd 3 K_LIBS = -lgdbm -ldbm -ldb -ldl -lm -lc -lbsd 4 5 ex2 : ex2.c 6 $(cc) -fno-strict-prototype ex2.c -o ex2 -L$(LIBK) -I$(IncK)

Note the flags for the gcc compiler. The main problem with the gcc compiler is how the "safefree" function prototype is declared in the proto.h and handy.h files. There is a slight difference in the syntax of each declaration, but programmatically it makes no difference. To turn off the checking in gcc, simply use the -fno-strict-prototype flag at the command line for the gcc command.

The LIBK and IncK paths are set to values that are derived from running the Perl program shown in Listing 26.2. There is one possible problem you must be aware of when you run this program: if you get errors stating that it cannot find embed.pm in the @Inc array, then modify the @Inc array to include the path where the file is located. The commented lines in Listing 26.2 are examples.

If all else fails, copy the embed.pm path to the directory you happen to be in. If you get an error stating that embed.pm could not be included or that it was empty, you have to modify the embed.pm file. Go to the statement with the _END_ label and add the line 1; before it. This step is only necessary if the ExtUtils::embed file could not be included.

Listing 26.2. A sample makefile.

1 #!/usr/bin/perl 2 3 use ExtUtils::embed; 4 5 # 6 # unshift(@Inc,"/usr/local/lib/perl5/site_perl"); 7 # 8 9 &ccopts; # Create the path for headers 10 &ldopts; # Create the path for libraries

The call to the &ccopts function creates the include path for use with the -I flag. The call to &ldopts creates the path for libraries to be linked with the -L flag. Pipe the output to a saved file and create the makefile as shown in Listing 26.1.

Adding a Perl Interpreter to Your C Program

A C program using a Perl interpreter is really creating and running a PerlInterpreter object. The PerlInterpreter object type is defined in the Perl library, and the C program simply makes a reference to this library object. Several sample files come in the embed.pm file and are used here as examples.

Listing 26.3 presents a quick example of how to embed a Perl interpreter in a C program.

Listing 26.3. The first example from the embed.pm module.

1 /* 2 ** Sample program used from embed.pm module package. 3 */ 4 #include <stdio.h> 5 #include <EXTERN.h> 6 #include <perl.h> 7 8 static PerlInterpreter *my_perl; 9 10 main(int argc, char **argv, char **env) 11 { 12 13 my_perl = perl_alloc(); 14 perl_construct(my_perl); 15 perl_parse(my_perl, NULL, argc, argv, env); 16 perl_run(my_perl); 17 perl_destruct(my_perl); 18 perl_free(my_perl); 19 }

Lines 3 to 6 are the include headers you have to use to get this to work. The EXTERN.h and perl.h files will be picked from where the Perl distribution is installed.

At line 8, the C program creates a pointer reference to the PerlInterpreter object defined in the Perl libraries. The reference will actually be resolved in line 13 when the object is created.

At line 10, the main program interface is called. All three arguments are required. Do not use either of these lines because they both caused compiler error, even with the -fno-strict-prototypes flag set: main(int argc, char **argv); main(int argc, char *argv[], char *env[])

Lines 14 and 15 construct the PerlInterpreter object and parse any environment variables and command-line arguments. You can read and execute Perl statements from a file at any time in a C program by simply placing the name of the file in argv[1] before calling the perl_run function. The perl_run function is called at line 16 in the sample code in Listing 26.2. The function can be called repeatedly in the C code before the calls are made to destruct and free the object (lines 17 and 18 in the sample code).

Now make this program, called ex2.c, and run it. In the sample run that follows, note how variables are defined and used interactively in this sample run:

$ ex2 $a = 1; $b = 3; print $a," ",$b," ",$a+$b,"\n"; ^D 1 3 4 $

To run a script file, simply redirect the contents of a file into the input of the interpreter. The file you are feeding into your C mini-interpreter does not have to have its execute bit set in its permissions. Here's a sample run.

$ cat test.pl $a = 1; $b = 1; print "a + b = ", $a + $b, "\n"; $ $ ex2 < test.pl a + b = 2 $

There you have it-a small Perl interpreter embedded in C code. Once you have this interpreter embedded in your C code, you can evaluate Perl statements by simply feeding them into the interpreter one line at a time.

There will be occasions, though, when you simply want to call a Perl subroutine directly from within the C code. I show you how to do this in the next section.

Calling Perl Subroutines from Within a C Program

In order to call a Perl subroutine by name, simply replace the call to perl_run() with a call to perl_call_argv(). An example is shown in the code in Listing 26.4.

Listing 26.4. Calling a subroutine in Perl directly.

1 #include <stdio.h> 2 #include <EXTERN.h> 3 #include <perl.h> 4 5 static PerlInterpreter *my_perl; 6 7 int main(int argc, char **argv, char **env) 8 { 9 my_perl = perl_alloc(); 10 perl_construct(my_perl); 11 12 perl_parse(my_perl, NULL, argc, argv, env); 13 /* The next line calls a function in the file named in 14 * argv[1] of the program !!*/ 15 perl_call_argv("showUser", G_DISCARD | G_NOARGS, argv); 16 perl_destruct(my_perl); 17 perl_free(my_perl); 18 }

Look closely at line 15. This is really the only line that is different from the code shown in Listing 26.3. A Perl subroutine called showUser is being called here. The showUser subroutine takes no arguments, so you specify a G_NOARGS flag, and returns no values, so specify the G_DISCARD flag. The argv vector is used to store the filename in argv[1]. To invoke this program, type the name of the file (showMe.pl, in this case) at the command line:

$ ex3 showMe.pl Process 2689 : UID is 501 and GID is 501

The showMe.pl file named in argv[1] is shown in Listing 26.5.

Listing 26.5. The file containing the subroutine being called.

1 #!/usr/bin/perl 2 sub showUser { 3 print "Process $$ : UID is $< and GID is $(\n"; 4 }

Calling Perl functions with arguments and using the return values requires manipulation of the Perl stack. When you make the call, you push values onto the stack, and upon return from the function you get the returned value off the stack. Let's see how to work with the calling stack.

Note

A note to the Perl-savvy reader: You can get the values from Perl special variables directly. See the section titled "Getting Special Variable Values," later in this chapter.

Here is another Perl subroutine, which prints whatever parameters are passed to it:

sub PrintParameters { my(@args) = @_ ; for $i (@args) { print "$i\n" } }

Here is the function to use when calling this Perl subroutine:

static char *words[] = {"My", "karma", "over", "my", "dogma", NULL} ; static void justDoit() { dSP; perl_call_argv("PrintParameters", G_DISCARD, words) ; }

As you can see, it's easy to construct parameters and then use them directly in C statements to call Perl functions or programs.

Working with the Perl Stack

Perl has several C functions to use when calling Perl subroutines. Here are the most important ones to know:

perl_call_sv
perl_call_pv
perl_call_method
perl_call_argv

The perl_call_sv function is called by the other three functions in the list. These three functions simply fiddle with their arguments before calling the perl_call_sv function. All these functions take a flags parameter to pass options. We'll discuss the meaning of these flags shortly.

All these functions return an integer of type I32. The value returned is the number of items on the Perl stack after the call returns. It's a good idea to check this value when using unknown code. Calling functions use the return value to determine how many returned values to pick up from the stack after the calling function returns.

Note

One important point to note here is that you'll see a lot of types of returned values and arguments passed into functions. These declarations come from the Perl sources and header files starting from perl.h. You really do not need to know how I32 is defined, but its name suggests it's a 32-bit integer. If you are really curious to see how it works or what the definitions are, check out the perl.h header file. Please refer to Chapter 25 for more information on internal Perl variables.

The `perl_call_sv` Function

The syntax for this function is

I32 perl_call_sv(SV* sv, I32 flags) ;

The perl_call_sv takes two arguments. The first argument, sv, is a pointer to an SV structure. (The SV structure is also defined in the header file perl.h.) The SV stands for scalar vector. This way you can specify the subroutine to call either as a C string by name or by a reference to a subroutine.

The `perl_call_pv` Function

The function perl_call_pv is like the perl_call_sv call except that it expects its first parameter to be a string containing the name of the subroutine to call. The syntax to this function is

I32 perl_call_pv(char *subname, I32 flags) ;

For example, to specify the name of function, you would make a call like this:

perl_call_pv("showUser",0);

The function name can be qualified to use a package name. To explicitly call a routine in a package, simply prepend the name with a PackageName:: string. For example, to call the function Time in the Datum package, you would make this call:

perl_call_pv("Datum::Time",0);

The `perl_call_method` Function

This function is used to call a method from a Perl class. The syntax for this call is

I32 perl_call_method(char *methodName, I32 flags) ;

methodName is set to the name of the method to be called. The class in which the called method is defined is passed on the stack and not in the argument list. The class can be specified either by name or by reference to an object.

The `perl_call_argv` Function

This subroutine calls the subroutine specified in the subname parameter. The values of the flags that are passed in when making this function call will be discussed shortly. The argv pointer is set in the same manner as the ARGV array to a Perl program: each item in the argv array is a pointer to NULL-terminated strings. The syntax for this call is

I32 perl_call_argv(char *subname, I32 flags, register char **argv) ;

The Flags to Use

The flags parameter in all the functions discussed in the previous section is a bitmask that can consist of the following values:

G_DISCARD
G_NOARGS
G_SCALAR
G_ARRAY
G_EVAL

The following sections detail how each flag affects the behavior of the called function.

The `G_DISCARD` Flag

The G_DISCARD flag is used to specify whether a function returns a value or not. If the G_DISCARD flag is set to True, no values are returned. By default, a returned value or a set of values from the perl_call functions are placed on the stack, and placing these values on the stack does take time and processing. If you are not interested in these values, set the G_DISCARD flag to get rid of these items. If this flag is not specified, there might be returned values pushed onto the stack that you might not be aware of unless you explicitly check the returned value of the function call. Therefore, specify the flag explicitly if you don't care to even look for returned values.

The `G_NOARGS` Flag

Normally you'll be passing arguments into a Perl subroutine. The default procedure is to create the @_ array for the subroutine to work with. If you are not passing any parameters to the Perl subroutine, you can save processing cycles by not creating the @_ array. Setting the flag stops the flag from being created.

Be sure that the function you are calling does not use arguments. If the called Perl subroutine does use parameters and you do not pass any parameters, your program might return totally bogus results. In such cases, the last value of @_ is used.

The `G_SCALAR` Flag

The flag specifies that only one scalar value will be expected back from this function. The called subroutine might attempt to return a list, in which case only the last item in the list will be returned. Setting this flag sets the context under which the called subroutine will run. G_SCALAR is the default context if no context is specified.

When this flag is specified, the returned value from the called function will either be 0 or 1: 0 is returned if the G_DISCARD flag is set; otherwise, 1 is returned. If a 1 is returned, you must remove the returned value to restore the stack back to what it was before the call was made.

Tip

The called subroutine can call the wantarray function to determine if it was called in scalar or array context. If the call returns false, the called function is called in scalar context. If the call returns true, the called function is called in array context.

The `G_ARRAY` Flag

Set this flag when you want to return a list from the called Perl subroutine. This flag causes the Perl subroutine to be called in a list context. 0 is returned when the G_DISCARD flag is specified along with this flag. If the G_DISCARD flag is not specified, the number of items on the stack is returned. This number is usually the number of items the calling routine pushed on the stack when making the function call; however, the number of items could be different if the called subroutine has somehow manipulated the stack internally. If G_ARRAY was specified and the returned value is 0, an error has occurred in the called routine.

The `G_EVAL` Flag

The called subroutine might crash while processing bogus input parameters. It's possible to trap an abnormal termination by specifying the G_EVAL statement. The net effect of this flag is to put the called subroutine in an evaluated block from which it's possible to trap all but the more shameful errors. Whenever control returns from the function, you have to check the contents of the $@ variable for any possible error messages. Remember that any anomaly can set the contents of $@, so check the value as soon as you return from a subroutine call.

The value returned from the perl_call_* function is dependent on what other flags have been specified and whether an error has occurred. Here are all the different cases that can occur: If an error occurs when a G_SCALAR is specified, the value on top of the stack will be undef. The value of $@ will also be set. Just remember to pop the undef from the stack.

Using `SCALAR` Context

Here is an example of how to call a Perl function that takes three arguments and returns an integer. This is an example of using a function in a scalar context because it returns a scalar value. The code is shown in Listing 26.6.

Listing 26.6. A function used in a scalar context.

1 /* How to call a Perl subroutine from C */ 2 #include <stdio.h> 3 #include <EXTERN.h> 4 #include <perl.h> 5 6 static int getSeconds(int s, int m, int h); 7 static PerlInterpreter *my_perl; 8 9 int main(int argc, char **argv, char **env) 10 { 11 my_perl = perl_alloc(); 12 perl_construct(my_perl); 13 14 perl_parse(my_perl, NULL, argc, argv, env); 15 getSeconds(10,30,4); /* TIME = 10:30:04 AM */ 16 perl_destruct(my_perl); 17 perl_free(my_perl); 18 } 19 20 static int getSeconds(int s, int m, int h) 21 { 22 dSP ; /* init stack pointer */ 23 int count ; /* keep return value */ 24 ENTER ; /* start temporary area */ 25 SAVETMPS; 26 PUSHMARK(sp) ; /* push mark for last argument. */ 27 XPUSHs(sv_2mortal(newSViv(s))); /* leftmost argument */ 28 XPUSHs(sv_2mortal(newSViv(m))); /* go from left to right */ 29 XPUSHs(sv_2mortal(newSViv(h))); /* rightmost argument */ 30 PUTBACK ; /* make stack pointer available */ 31 count = perl_call_pv("seconds", G_SCALAR); /* call */ 32 SPAGAIN ; /* reset stack pointer */ 33 if (count != 1) /* check return value */ 34 croak("Whoa Nelly! This is wrong\n") ; 35 printf ("The number of seconds so far = %f for %d:%d:%d\n", POPi,h,m,s) ; 36 PUTBACK ; /* put the popped value on stack again */ 37 FREETMPS ; /* free up temporary variables (NOT count) */ 38 LEAVE ; /* get out and clean up stack */ 39 }

Lines 2 through 4 declare the mandatory headers. At line 7 we declare the prototype for the function we are going to call. You can declare the function here instead if you like. The call to this function, getSeconds, is made at line 15. The function itself is declared static to prevent unlikely confusion with any predefined functions in the Perl libraries.

At line 22, the call to the dSP macro initializes the stack. At line 23, the count variable is declared to be a nontemporary variable on the stack. The ENTER and SAVETMPS macros at lines 24 and 25 start the temporary variable area. A marker to this stack location is pushed on at line 26.

The ENTER/SAVETMPS pair creates the start of code for all temporary variables that will be destroyed on the return from the C function. The FREETMPS/LEAVE pair will be used to clean up and destroy the space allocated on the calling stack for these temporary variables.

Now the parameters to the Perl subroutine are pushed onto the stack one at a time from the leftmost parameter to the rightmost parameter. Remember this order because the Perl function expecting these parameters will be declared as this:

sub seconds { my($h, $m, $s) = @_ ; my $t; $t = $s + $m * 60 + $h * 3600; return $t }

The stack pointer is made into a globally available value with the PUTBACK macro in line 30. The actual call to the subroutine is made in line 31 with the G_SCALAR flag. On return from the subroutine, we have to reset the stack pointer with the SPAGAIN (stack pointer again) macro. The count must be 1, or else we have an error.

The returned value is in POPi. The called function returned an integer value. To get other types of values, you can use one of the following macros:

POPs for an SV
POPp for a pointer (such as a pointer to a string)
POPn for a double
POPi for an integer
POPl for a long

The PUTBACK macro is used to reset the Perl stack back to a consistent state just before exiting the function. The POPi macro call only updated the local copy of the stack pointer. We have to set the global value on the stack, too. All parameters pushed onto the stack must be bracketed by the PUSHMARK and PUTBACK macros. These macros count the number of parameters being pushed and hence let Perl know how to size the @_ array. The PUSHMARK macro tells Perl to mark the stack pointer and must be specified even if you are using no parameters. The PUTBACK macro sets the global copy of the stack pointer to the value of the local copy of the stack pointer.

Here's another example of how to use a returned string from a function using the POPp macro. (See Listing 26.7.) Look at lines 23 through 25 to see how strings and integers are pushed onto the stack with the XPUSHs(sv_2mortal(newSVpv(str,offset))); and XPUSHs(sv_2mortal(newSViv(offset))); functions. The returned value from the actual call to the Perl function is retrieved with the POPp macro.

Listing 26.7. Using a returned string value.

1 /* Using returned strings from functions */ 2 #include <stdio.h> 3 #include <EXTERN.h> 4 #include <perl.h> 5 6 static int MySubString(char *a, int offset, int len); 7 static PerlInterpreter *my_perl; 8 9 int main(int argc, char **argv, char **env) 10 { 11 my_perl = perl_alloc(); 12 perl_construct(my_perl); 13 perl_parse(my_perl, NULL, argc, argv, env); 14 MySubString("Kamran Was Here",7,3); /* return 'Was' */ 15 perl_destruct(my_perl); 16 perl_free(my_perl); 17 } 18 19 static int MySubString(char *a, int offset, int len) 20 { 21 dSP ; 22 PUSHMARK(sp) ; 23 XPUSHs(sv_2mortal(newSVpv(a, 0))); 24 XPUSHs(sv_2mortal(newSViv(offset))); 25 XPUSHs(sv_2mortal(newSViv(len))); 26 PUTBACK ; 27 perl_call_pv("Csubstr", G_SCALAR); 28 SPAGAIN ; 29 printf ("The substring is %s\n",(char *)POPp) ; 30 PUTBACK ; 31 FREETMPS ; 32 LEAVE ; 33 }

The function Csubstr calls the Perl substr() as shown here:

sub Csubstr { my ($s,$o,$l) = @_; return substr($s,$o,$l); }

The value of the substr() function call is returned back from the subroutine call. It's this returned value that is used in the C program.

Returning Lists from Subroutines

Many Perl functions return lists as their results. C programs can retrieve these values as well. Here's a simple Perl function that returns the ratio of two numbers. (See Listing 26.8.) The C program to call this function is shown in Listing 26.9.

Listing 26.8. Ratio of numbers in a Perl function.

1 sub GetRatio 2 { 3 my($a, $b) = @_ ; 4 my $c, $d; 5 if ($a == 0) { $c = 1; $d = 0; } 6 elsif ($b == 0) { $c = 0; $d = 1; } 7 else { 8 $c = $a/$b; 9 $d = $b/$a; 10 } 11 ($c,$d); 12 }

Look at lines 34 and 35 in Listing 26.9. The returned values from the Perl function are picked off one at a time using the POPn macro to get double values from the stack. The global stack is readjusted before returning from the C function.

Listing 26.9. Calling the GetRatio function.

1 /* How to return lists back from Perl functions */ 2 3 #include <stdio.h> 4 #include <EXTERN.h> 5 #include <perl.h> 6 7 static void getRatio(int a, int b); 8 static PerlInterpreter *my_perl; 9 10 int main(int argc, char **argv, char **env) 11 { 12 my_perl = perl_alloc(); 13 perl_construct(my_perl); 14 perl_parse(my_perl, NULL, argc, argv, env); 15 getRatio(8,3); 16 perl_destruct(my_perl); 17 perl_free(my_perl); 18 } 19 20 21 static void getRatio(int a, int b) 22 { 23 dSP ; 24 int count ; 25 ENTER ; 26 SAVETMPS; 27 PUSHMARK(sp) ; 28 XPUSHs(sv_2mortal(newSViv(a))); 29 XPUSHs(sv_2mortal(newSViv(b))); 30 PUTBACK ; 31 count = perl_call_pv("GetRatio", G_ARRAY); 32 SPAGAIN ; 33 if (count != 2) croak("Whoa! \n") ; 34 printf ("%d / %d = %f\n", a, b, POPn) ; 35 printf ("%d / %d = %f\n", b, a, POPn) ; 36 PUTBACK ; 37 FREETMPS ; 38 LEAVE ; 39 }

Placing G_SCALAR instead of G_ARRAY in the code in Listing 29.9 would have forced a scalar value to be returned. Only the last item of the array would have been returned and the value of count would be set to 1.

Note how this Perl subroutine takes precautions not to crash by first checking the divisor by zero. Now, this time let's not return a value. Instead, let's call the die() function if a bogus value is sent into the function GetRatio. We'll try to trap the errors caused by calling the function with the G_EVAL flag set.

Using `G_EVAL`

The G_EVAL flag is useful when calling functions you think may die(). The G_EVAL flag is OR-ed in with any other flags to such a call. Listing 26.10 presents a Perl function that calls the die function in case one of the arguments sent into it is zero. Because we know that this function can die, we'll send in a value that causes it to die. The code to make this fatal call (to illustrate how G_EVAL is used) is shown in Listing 26.11.

Listing 26.10. A Perl function that can die.

1 sub GetRatioEval 2 { 3 my($a, $b) = @_ ; 4 my $c, $d; 5 die "Hey! A is 0 \n" if ($a == 0); 6 die "Hey! B is 0 \n" if ($b == 0); 7 $c = $a/$b; 8 $d = $b/$a; 9 ($c,$d); 10 }

Listing 26.11. A C program to use the G_EVAL flag.

1 /* Call the suicidal function to use G_EVAL */ 2 3 #include <stdio.h> 4 #include <EXTERN.h> 5 #include <perl.h> 6 7 static void getRatio(int a, int b); 8 static PerlInterpreter *my_perl; 9 10 int main(int argc, char **argv, char **env) 11 { 12 my_perl = perl_alloc(); 13 perl_construct(my_perl); 14 perl_parse(my_perl, NULL, argc, argv, env); 15 getRatio(8,0); 16 perl_destruct(my_perl); 17 perl_free(my_perl); 18 } 19 20 21 static void getRatio(int a, int b) 22 { 23 dSP ; 24 int count ; 25 SV *svp; /* New line */ 26 ENTER ; 27 SAVETMPS; 28 PUSHMARK(sp) ; 29 XPUSHs(sv_2mortal(newSViv(a))); 30 XPUSHs(sv_2mortal(newSViv(b))); 31 PUTBACK ; 32 count = perl_call_pv("GetRatioEval", G_ARRAY | G_EVAL); 33 SPAGAIN ; 34 svp = GvSV(gv_fetchpv("@", TRUE, SVt_PV)); 35 if (SvTRUE(svp)) 36 { 37 printf ("Die by division: %s\n", SvPV(svp, na)) ; 38 POPs ; 39 } 40 else 41 { 42 if (count != 2) croak("Whoa! \n") ; 43 printf ("%d / %d = %f\n", a, b, POPn) ; 44 printf ("%d / %d = %f\n", b, a, POPn) ; 45 } 46 PUTBACK ; 47 FREETMPS ; 48 LEAVE ; 49 }

In the code shown in Listing 26.11, the call to the Perl function will terminate in a die() function call. The returned value from this Perl function call is checked in line 34. The variable we are looking at is the $@ variable in Perl. The syntax for this call is

svp = GvSV(gv_fetchpv("@", TRUE, SVt_PV));

The value of the variable is checked in the following lines, and the stack is adjusted with a call to pop off the string. The string is retrieved with a call to the SvPV() function in line 38.

Getting Special Variable Values

In Listing 26.7 we recovered values of special variables, $< and $(, to get the UID and GID of the calling process via a Perl subroutine. The Perl subroutine was simply an example of how to call a routine. Now let's see how we can get values of special variables in Perl directly. The call to get these values is

svp = GvSV(gv_fetchpv(variableName, defaultValue, SVt_PV));

Let's look at the function in Listing 26.12 to see how to get the UID and GID of a calling C program.

Listing 26.12. Function for getting the values of $< and $( directly.

1 static int getUserInfo() 2 { 3 dSP ; 4 int tmp; 5 SV *svp; 6 PUSHMARK(sp); 7 svp = GvSV(gv_fetchpv("<", 0, SVt_PV)); 8 tmp = SvIV(svp); 9 printf ("\n UID = %d",tmp); 10 svp = GvSV(gv_fetchpv("(", 0, SVt_PV)); 11 tmp = SvIV(svp); 12 printf ("\n GID = %d",tmp); 13 }

The GvSV() function returns the current value of the variable named in the first parameter. The default value is 0 for these calls. The returned value from each GvSV call is a pointer from which an integer is extracted with a call to SvIV().

Using the `ST` Macros

The POPi, POPp, and POPn macros are great for getting items off the stack one at a time. To get the individual items in the stack though, you have to use the ST() macros. Basically, ST(n) returns the nth item from the top of the stack. However, you have to adjust the number of items on the stack yourself.

Listing 26.13 illustrates how the stack will look using the ST macros with a different function. Line 39 is where the ST(i) macro is used. The stack length is adjusted in lines 35 and 36.

Listing 26.13. Using the ST macros.

1 /* demonstration C program */ 2 /* Using the ST macros.*/ 3 4 #include <stdio.h> 5 #include <EXTERN.h> 6 #include <perl.h> 7 8 static void squares(int a); 9 static PerlInterpreter *my_perl; 10 11 int main(int argc, char **argv, char **env) 12 { 13 my_perl = perl_alloc(); 14 perl_construct(my_perl); 15 perl_parse(my_perl, NULL, argc, argv, env); 16 getRatio(8); 17 perl_destruct(my_perl); 18 perl_free(my_perl); 19 } 20 21 22 static void Squares(int a) 23 { 24 dSP ; 25 I32 ax; 26 int i; 27 int count ; 28 ENTER ; 29 SAVETMPS; 30 PUSHMARK(sp) ; 31 XPUSHs(sv_2mortal(newSViv(a))); 32 PUTBACK ; 33 count = perl_call_pv("squares", G_ARRAY); 34 SPAGAIN ; 35 sp -= count ; 36 ax = (sp - stack_base) + 1 ; /* adjust the stack */ 37 if (count != 2) croak("Whoa! \n") ; 38 for (i = 0; i < count; i++) 39 printf ("%d ", SvIV(ST(i))) ; 40 PUTBACK ; 41 FREETMPS ; 42 LEAVE ; 43 }

The for loop recovers only as many values as are on the stack. If you do not want to adjust the stack, you can remove lines 35 and 36 from this code and replace the code in lines 38 and 39 with this:

for (i = 0; i < count; i++) printf ("%d ", POPi) ;

Of course, the choice of which type of function to use is entirely up to you.

Evaluating Perl Expressions

In addition to calling Perl code, you can also use the eval function to directly evaluate a Perl statement. This lets you use a C program by itself without having the need to declare the code for Perl elsewhere. By using C strings to hold your Perl programs, you can create entire applications using one C source file. Listing 26.14 presents a sample file.

Listing 26.14. Using expressions in C.

1 #include <stdio.h> 2 #include <EXTERN.h> 3 #include <perl.h> 4 5 static PerlInterpreter *my_perl; 6 7 8 /* 9 ** This is a wrapper around the eval call 10 */ 11 int evalExpression(char *evaluatedString) 12 { 13 char *argv[2]; 14 argv[0] = evaluatedString; 15 argv[1] = NULL; 16 perl_call_argv("_eval_", 0, argv); 17 } 18 19 main (int argc, char **argv, char **env) 20 { 21 22 /* 23 ** Plain code to do some parsing. 24 */ 25 char *codeToUse[] = { "", "-e", "sub _eval_ { eval $_[0] }" }; 26 STRLEN length; 27 28 my_perl = perl_alloc(); 29 perl_construct( my_perl ); 30 31 /* 32 ** Fake out the call by creating your own argc, argv, and env 33 */ 34 perl_parse(my_perl, NULL, 3, codeToUse, env); 35 36 evalExpression("$x = 3; $y = 2; $rho= sqrt($x * $x + $y * $y);"); 37 printf("x = %d, y = %d and rho = %f \n", 38 SvIV(perl_get_sv("x", FALSE)), 39 SvIV(perl_get_sv("y", FALSE)), 40 SvNV(perl_get_sv("rho", FALSE))); 41 evalExpression("$wisdom 42 = 'Able was I ere I saw Elba'; $wisdom = reverse($wisdom); "); 43 printf("wisdom = %s\n", SvPV(perl_get_sv("wisdom", FALSE), length)); 44 45 evalExpression("$wisdom = 'I ran a mile today, and said \ 46 Here Lady Take your purse'; $wisdom = reverse($wisdom); "); 47 printf(" %s\n", SvPV(perl_get_sv("wisdom", FALSE), length)); 48 49 evalExpression("$joke = 'I was walking down the street when something 50 caught my eye and dragged it twenty feet'; $joke = uc($joke); "); 51 printf("%s\n", SvPV(perl_get_sv("joke", FALSE), length)); 52 53 perl_destruct(my_perl); 54 perl_free(my_perl); 55 }

Here's the output from running this program:

x = 3, y = 2 and rho = 3.605551 wisdom = ablE was I ere I saw elbA esrup ruoy ekaT ydaL ereH dias dna ,yadot elim a nar I

I WAS WALKING DOWN THE STREET WHEN SOMETHING CAUGHT MY EYE AND DRAGGED IT TWENTY FEET

Now let's look at some of the lines of relevance in the listing. Lines 11 through 17 define a function that will serve as the wrapper around the Perl eval() function. Basically, the function evalExpression takes one string as an argument, creates an argument vector, and then calls the _eval_ function with this newly created argument vector.

Then in line 25 we actually define the _eval_ function as if it was typed on the command line. The text is preserved in the string variable codeToUse. The perl_parse function is then called with the codeToUse string and a length of 3 arguments as argc. The environment is passed in verbatim.

At line 36 we test out how to use integers and floating point numbers in a calculation. Note how the returned values are extracted by naming functions without $. The $ is automatically prepended to the variable name; therefore, you should not use it explicitly. If you prepend $ yourself, the value of the variable $$var, not $var, will be extracted.

Strings are probably where you'll benefit the most when using Perl functions from within C. Lines 42 through 51 show examples of how to call Perl functions to reverse and change the case of some strings. Note how the strings can cross multiple lines.

Actually, you can put in entire Perl functions instead of these calls and have the eval operator work on these functions as strings. This enables you to create very powerful applications using the power of each of the languages (C and Perl).

Pattern Matches and Substitutions from Your C Program

It's possible to use the Perl interpreter to do pattern matches and string substitutions from within C code. Pattern matches in Perl are easy when compared with the same process in C. This is one area where you can use Perl to make up for the lack of pattern matching and string substitution features in C. Here's a definition of two functions in C that use the Perl interpreter:

int match(char *inputString, char *matchString)
This function takes an input string, and attempts to match it with the pattern in matchString. The matchString is a string with a regular pattern in it; for example, "/Gumb[oy]/" or "/[Dd]onald/". Be careful when using double quotes within patterns because you'll have to escape them with a backslash ("\").
int substitute(char *inputString, char *substitution)
This function takes an input string followed by a substitution operation such as "s/courage/stupidity/" or "s/ition/ate/g". The inputString will be modified if any substitution is made.

Listing 26.15 presents the code to define and use these functions.

Listing 26.15. Using pattern matching and substitution in C.

1 /* 2 ** Using the Perl interpreter to do pattern matching 3 ** and string substitution. 4 */ 5 #include <stdio.h> 6 #include <EXTERN.h> 7 #include <perl.h> 8 9 /* Undefine this to see how it all works. */ 10 #define KDEBUG 1 11 12 static PerlInterpreter *my_perl; 13 14 /* 15 ** A wrapper around the Perl eval statement 16 */ 17 int doExpression(char *string) 18 { 19 char *argv[2]; 20 argv[0] = string; 21 argv[1] = NULL; 22 perl_call_argv("_eval_", 0, argv); 23 } 24 25 /* 26 ** A global to this program since I am too lazy to use mallocs 27 */ 28 static char command[256]; 29 30 31 /* 32 ** Do a pattern match 33 */ 34 char match(char *string, char *pattern) 35 { 36 sprintf(command, "$string = '%s'; $return = $string =~ %s", string, pattern); 37 #ifdef KDEBUG 38 printf (" %s", command); 39 #endif 40 doExpression(command); 41 return SvIV(perl_get_sv("return", FALSE)); 42 } 43 44 /* 45 ** Do a string substitution 46 */ 47 int substitute(char *string, char *pattern) 48 { 49 char *bfptr = command; 50 STRLEN length; 51 sprintf(command, "$string = '%s'; $ret = ($string =~ %s)",string,pattern); 52 53 #ifdef KDEBUG 54 printf (" %s", command); 55 #endif 56 57 doExpression(command); 58 bfptr = (char *) SvPV(perl_get_sv("string", FALSE), length); 59 strcpy(string,bfptr); 60 return SvIV(perl_get_sv("ret", FALSE)); 61 } 62 63 64 /* 65 ** The main program itself. 66 */ 67 main (int argc, char **argv, char **env) 68 { 69 char *embedding[] = { "", "-e", "sub _eval_ { eval $_[0] }" }; 70 STRLEN length; 71 char *text; 72 int i,j; 73 74 my_perl = perl_alloc(); /* Allocate the interpreter */ 75 perl_construct( my_perl ); /* Call the constructor */ 76 /* Fake the call to the script to use */ 77 perl_parse(my_perl, NULL, 3, embedding, env); 78 79 /* 80 ** Do the loop 81 */ 82 83 while (1) { 84 doExpression("$reply = <STDIN>; chop $reply;"); 85 text = SvPV(perl_get_sv("reply", FALSE), length); 86 printf("reply = %s\n", text); 87 88 if (match(text, "/[Qq]uit|[Ee]xit/")) /* Bail out? */ 89 break; 90 if (match(text, "/[Cc]om/")) 91 { 92 printf("match: Text contains the word 'com'\n\n"); 93 substitute(text, "s/com/commercial/g"); 94 printf("\nAfter substitution = %s\n", text); 95 } 96 else 97 printf("match: Text doesn't contain the word Com.\n\n"); 98 99 } /* while loop ends */ 100 101 /* 102 ** Clean up after yourself. 103 */ 104 perl_destruct(my_perl); 105 perl_free(my_perl); 106 }

The program is derived from the code in Listing 26.14. Most of the constructs are the same with the exception of the global command buffer declared at line 28. You might want to consider using mallocs within functions for a more complicated application. The KDEBUG flag is set to 1 so as to show how the command strings are constructed. You can comment out the code in line 10 for a less verbose output.

The match function is defined at line 34. The command string is constructed at line 36 with the command executed at line 40. The value of the $return string is returned. The $return contains the first matched pattern and will be empty if no matches are found. If $return is empty, 0 will be returned.

The substitute function is defined at line 47. A command string is constructed at line 51 using the global "command" buffer. At line 58, we retrieve the value of the $ret variable after the substitution has been made, and we overwrite this new string onto the input string. The length of the new string is returned at line 60. The bulk of the work in the program is being done in lines 83 to 99 during the while loop. At line 84, we collect an input string from the user and point to it with the text char *pointer. Then we match the incoming string to see whether we have to exit in line 88 and break out of the loop if there is a match. At line 91, we search for another pattern and, if found, substitute a string in the input pattern.

Summary

This chapter shows ways of incorporating the Perl interpreter into your C code. Writing a makefile involves getting the correct paths to the libraries. The ExtUtils::embed module can help you get these paths. The C code that utilizes the Perl functions must initialize and maintain a PerlInterpreter object. Calls from C into Perl have to maintain a stack on which values are sent into and retrieved from functions. Both scalar and array values can be returned from Perl functions, and the calling routine has to be aware of how to handle these inputs. The Perl functions being called can either reside on disk or can be embedded in C strings. When combining code from both these languages, you have to balance the division to get the most out of each language by using their strong points. For example, use Perl for string manipulation and use C for coding complex operations involving calculations.

Previous chapter Chapter contents Contents Next chapter

Chapter 26

Writing C Extensions in Perl

CONTENTS