CGI, SCGI and Certificates (was Re: [ANN] Gemini browser for iOS)

Sean Conner sean at conman.org
Thu Jun 11 20:45:58 BST 2020


It was thus said that the Great solderpunk once stated:
> On Tue, Jun 09, 2020 at 09:02:24PM -0400, Michael Lazar wrote:
>  
> > I believe this is using SCRIPT_NAME incorrectly per RFC 3875. The SCRIPT_NAME
> > should be the part of the URI path that comes before the PATH_INFO [1]. So in
> > your example:
> > 
> > GEMINI_URL=gemini://lucy.roswell.area51/cgi-bin/beta/foobar?one=1&two=2
> > SCRIPT_NAME=/cgi-bin/beta
> > PATH_INFO=/foobar
> 
> Is this how cgi-bins are traditionally handled?  

  Yes.

> If a URI paths's prefix
> matches the configured cgi-bin path, the standard mapping from URI paths
> to the filesystem is interrupted, and the first component of the URI path
> *after* the cgi-bin prefix (here `beta`) is the only think looked for on
> the disk, with everything else passed along to PATH_INFO?  If there is,
> for example, a /var/gemini/cgi-bin/beta/ directory on the disk, the
> server does not check for an executable named `foobar` in it?

  To answer that last question, no.

  To explain, let me explain my setup.  GLV-1.12556 allows one to use
multiple directories per virtual host for content.  I have the following
set up on my development box:

      {
        path      = "^(/cgi%-bin/)(.*)", -- [5]
        module    = "GLV-1.handlers.filesystem",
        directory = "/home/spc/projects/gemini/non-checkin/cgi-bin",
        -- ... there are some other directives, not important right now
      },

      {
        path      = ".*", -- [5]
        module    = "GLV-1.handlers.filesystem",
        directory = "/home/spc/projects/gemini/non-checkin/lucy.roswell.area51",
        -- ... more directives ...
      }

Note that depending upon how things are configured, CGI [1] can be in any
directory or restricted to a single directory [2].  With GLV-1.12556, any
file with the 'execute' bit will be treated as a CGI script [3][4].  I just
added a CGI to my main Gemini server:

	gemini://gemini.conman.org/test/a-script/foobar?one=1&two=2

the URL is broken up:

	location =
	{
	  host = "gemini.conman.org",
	  port = 1965.000000,
	  path = "/test/a-script/foobar",
	  scheme = "gemini",
	  query = "one=1&two=2",
	}

the path is matched against each handler's path (in order, first match wins)
and the matching one is handed the request.  Per the configuration, this
match result will be:

	match =
	{
	  "/",
	  "test/a-script/foobar",
	}

  The filesystem handler will breakdown the second match element (the first
is considered the "URL filesystem space"---remember, GLV-1.12556 supports
multiple directories per vhost) and check each segment (for permissions, CGI
script or SCGI script).  So the first check is for:

	<directory> .. "test"

  This is a directory, so it continues, walking down the path.   Next it
tries:

	<directory> .. "test/" .. "a-script"

  This is a file with the execute bit set, so it's run.  The rest of the
match is used to construct the PATH_INFO

	PATH_INFO="/foobar"

and PATH_TRANSLATED

	PATH_TRANSLATED=<directory> .. "/foobar"

This does not imply that such a directory exists.  If there is no more to
the path (say, the request was to "/test/a-script") then the PATH_INFO and
PATH_TRASLATED would not be set.

  A Gemini server doesn't have to do what I do.  It is certainly in line
with Apache to require CGI scripts to have a particular extension, look for
said extension and handle things that way without having to walk down the
filesystem checking each component.  So hypothetically speaking, a request
like:

	gemini://example.net/foo/bar.cgi

the server can scan for ".cgi", find it, know it's going to execute a CGI
script, but there is nothing more of the URL path, so not set PATH_INFO and
PATH_TRANSLATE.  But for this:

	gemini://example.net/foo/bar.cgi/baz

find the .cgi extension, extract the path up through the extension
("/foo/bar.cgi") and because there's more, set up PATH_INFO and
PATH_TRANSLATE.  There's another message on this list where I give a real
life example where I use PATH_INFO and PATH_TRANSLATED here:

	https://lists.orbitalfox.eu/archives/gemini/2020/001485.html

> Semi-related: when the server forks off the CGI process, is it
> conventional to set that process' working directory to the CGI bin?

  It would be conventional to set the working directory to the main
directory for the host.  In my case, given that a host can have multiple
directories, I set the working directory to the handler's directory setting. 
That value is also set in GEMINI_DOCUMENT_ROOT.

  -spc

[1]	And SCGI, I support this as well.

[2]	That's why I have 'cgi-bin'---to test that configuration.

[3]	I didn't bother with extensions for this.  I felt that checking for
	the 'execute' bit was more elegant than just an extension.  Also, if
	CGI has been disabled (server wide, host or directory---the
	configuration is very fine grained) then I return an error to the
	client.

[4]	There's another method for SCGI.

[5]	This is a Lua style regex.  The patterns in () are groupings and the
	filesystem handler wants two groups---the first is the leading
	portion in URL space that doesn't map to a file system, the second
	is the portion that does map to a filesystem.  The original syntax
	for this only required one match and I kept that---in that case, the
	match is redone slightly so that the leading '/' from the URL
	portion is the first match, then the rest.  So the '.*' pattern
	(which is basically "match all") becomes the pattern "^(/)(.*)". 
	This is an implementation detail of GLV-1.12556, but I thought I
	should mention it.


More information about the Gemini mailing list