An outsider's view of the `gemini://` protocol

Ciprian Dorin Craciun ciprian.craciun at gmail.com
Thu Feb 27 23:16:30 GMT 2020
Previous message (by thread): WWW indexing concerns (was: Gemini Universal Search)
Next message (by thread): An outsider's view of the `gemini://` protocol
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello all!

[Disclaimer:  I'm not an active `gopher://` user, although long ago I
did implement my own Gopher server in Erlang and another one in Go;
however I do keep an eye on the Gopher mailing list, mostly because
I'm nostalgic of a "simpler" web...]

Today I've stumbled upon the `gemini://` protocol specification
(v0.10) and FAQ, and after reading them both, I thought that perhaps
an "outsiders" point of view could be useful.




First of all I get it that `gemini://` wants to "sit" in between
`gopher://` and `http://`;  however from what it seems I think it
resembles more HTTP/0.9
(https://www.w3.org/Protocols/HTTP/AsImplemented.html);  i.e. it adds
only the virtual host and response MIME type capability on-top of
HTTP/0.9 or Gopher (plus TLS, but that's transport related).

Although I do agree that the HTTP/1.1 semantic (because a large part
is nowadays included in HTTP/2 and HTTP/3) has become extremely
complex (from chunked encoding, to caching, and to server side push
via `Link` headers, etc.), there are some features that I think are
useful, especially given some of the stated goals of `gemini://` (like
for example slow links, etc.):

* caching -- given that most content is going to be static, caching
should be quite useful;  however it doesn't seem to have been present
as a concern neither in the spec, FAQ or the mailing list archive;
I'm not advocating for the whole HTTP caching headers, but perhaps for
a simple SHA of the body so that clients can just skip downloading it
(although this would imply a more elaborate protocol, having a
"headers" and separate "body" phase);

* compression -- needless to say that `text/*` MIME types compress
very well, thus saving both bandwidth and caching storage;  (granted
one can use compression on the TLS side, although I think that one was
dropped due to security issues?);

* `Content-Length` -- I've seen this mentioned in the FAQ or the
mailing lists;  I think the days of "unreliable" protocols has passed;
 (i.e. we should better make sure that the intended document was
properly delivered, in its entirety and unaltered;)

* status codes -- although both Gemini and HTTP use numeric status
codes, I do believe that these are an artifact of ancient times, and
we could just replace them with proper symbols (perhaps hierarchical
in nature like `redirect:temporary` or `failure:temporary:slow-down`;

* keep-alive -- although in Gopher and Gemini the served documents
seem to be self-contained, and usually connections will be idle while
the user is pondering what to read, in case of crawlers having to
re-establish each time a new connection (especially a TLS one) would
eat a lot of resources and incur significant delays;  (not to mention
that repeated TCP connection establishment to the same port or target
IP might be misinterpreted as an attack by various security appliances
or cloud providers;)




Now on the transport side, somewhat related to the previous point, I
think TLS transient certificates are an overkill...  If one wants to
implement "sessions", one could introduce
"client-side-generated-cookies" which are functionally equivalent to
these transient certificates.  Instead of creating a transient
certificate, the client generates a unique token and sends that to the
server instead.  The server has no more control over the value of that
cookie as it does for the transient certificate.

Moreover the way sessions are signaled between the server and client,
piggy-backed ontop of status codes, seems rather an afterthought than
part of an orthogonal design.  Perhaps these sessions should "moved"
to a higher level (i.e. after transport and before the actual
transaction, just like in the case of OSI stack).

Also these transient certificates are sold as "privacy enablers" or
"tracking preventing" which is far from the truth.  The server (based
on IP, ASN or other information) can easily map various transient
certificates as "possibly" belonging to the same person.  Thus just by
allowing these one opens up the possibility of tracking (even if only
for a given session).  Moreover, securely generating these transient
certificates does require some CPU power.




On a second thought, why TLS?  Why not something based on NaCL /
`libsodium` constructs, or even the "Noise Protocol"
(http://www.noiseprotocol.org/)?  For example I've tried to build the
Asuka Rust-based client and it pulled ~104 dependencies and took a few
minutes to compile, this doesn't seem too lightweight...  Granted a
lot of those dependencies might have come from other direct
dependencies, and in general Rust takes a lot to compile, but it does
give a hint...

Why not just re-use PGP to sign / encrypt requests and replies?  With
regard to PGP, given that Gopher communities tend to be quite small,
and composed of mostly "techie" people, this goes hand-in-hand with
the "web-of-trust" that is enabled by PGP and can provide something
that TLS can't at this moment: actual "attribution" of servers to
human beings and trust delegation;  for example for a server one could
generate a pair of keys and other people could sign those keys as a
way to denote their "trust" in that server (and thus the hosted
content).  Why not take this a step further and allow each document
served to be signed, thus extending this "attribution" not only to the
servers, but to the actual contents.  This way a server could provide
a mirror / cached version of a certain document, while still proving
it is the original one.

In fact with such an PGP approach one would no more authenticate the
server, but authenticate the actual document it receives;  thus the
server becomes a simple "conduit" through which the user downloads the
content, enabling one to proxy or mirror other servers and still keep
intact the cryptographic "proof of origin".




Now getting back to the `gemini://` protocol, another odd thing I
found is the "query" feature.  Gemini explicitly supports only `GET`
requests, and the `text/gemini` format doesn't support forms, yet it
still tries to implement a "single input-box form"...  Granted it's a
nice hack, but it's not "elegant"...  (Again, like in the case of
sessions, it seems more as an afterthought, even though this is the
way Gopher does it...)

Perhaps a simple "form" solution would be better?  Perhaps completely
eliminating for the time these "queries"?  Or perhaps introducing a
new form of URL's like for example:
`gemini-query:?url=gemini://server/path&prompt=Please+enter+something`
which can be served either in-line (as was possible in Gopher) and /
or served as a redirect (thus eliminating another status code family).




Regarding the `text/gemini` format -- and taking into account various
emails in the archive about reflowing, etc -- makes me wonder if it is
actually needed.  Why can't CommonMark be adopted as the HTML
equivalent, and a more up-to-date Gopher map variant as an alternative
for menus?  There are already countless safe CommonMark parsers
out-there (for example in Rust there is one implemented by Google) and
the format is well understood and accepted by a large community
(especially the static side generators community).

Regarding an up-to-date Gopher map alternative, I think this is an
important piece of the Gopher ecosystem that is missing from today's
world:  a machine-parsable standard format of indexing documents.  I
very fondly remember "directory" sites of yesteryear (like DMOZ or the
countless other clones) that strives to categorize the internet not by
"machine learning" but by human curation.




In fact (and here I stop speaking about Gemini as it is right now, but
instead I try to summarize what I believe a proper alternative for the
"web" would be) if one puts together:
* a simple Gemini like protocol;
* the Gopher-like map alternative (thus indexing);
* the PGP signed documents;
* more structured links between these documents;
* perhaps add support for versioning;
* and perhaps add support for content-based addressing (as opposed to
server-based addressing) (i.e. persistent URL's);

, we get closer to the initial "spirit" of both the "web" (i.e. the
90's era WWW), namely:
* a "body" of "documents" that aren't tied to a particular server,
that link to one-another;
* that have a minimal metadata (especially author and date) and
perhaps revisions;
* and a way to categorize and organize these into a proper (perhaps
hierarchical) structure;

(Perhaps the closest to this ideal would be a Wikipedia style web...)




All in all I find the `gemini://` project quite interesting, and I'll
keep an close eye on it.  I'm also glad to see that the Gopher world
hasn't yet died, but instead spawned a modern alternative.

Also, although all of my above comments are somewhat in a negative
tone, please take them in a constructive manner, and please note that
I do appreciate other aspects of the Gemini proposal (from the
simplification of the protocol and allowing as first class citizen the
proxying of other kinds of URL's, to the fact that the `text/gemini`
mandates that the client is free to wrap the text as one sees fit).

Good work guys, and I hope you'll find this useful,
Ciprian.
Previous message (by thread): WWW indexing concerns (was: Gemini Universal Search)
Next message (by thread): An outsider's view of the `gemini://` protocol
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Gemini mailing list