libmagic

John Cowan cowan at ccil.org
Sat Nov 7 00:28:16 GMT 2020


This post is to suggest that servers currently using file extensions to
determine MIME-types switch to libmagic.  This C library analyzes the
content of a file (by name, by file descriptor, or by looking at a buffer
containing the content) and can provide a MIME-type and an encoding.  This
library is behind the `file` command on Linux, FreeBSD, and NetBSD (but not
OpenBSD), and there are interfaces for at least Python, Rust, and Go, plus
a version for Windows.  If you are testing (or serving!) on a Mac, use
Homebrew or Guix.  Googling for "libmagic" and some keyword will probably
find more.

Obviously, using libmagic is slower than just comparing a file extension to
a list of known extensions.  But Gemini servers are not, in general,
high-volume, and it has the advantage of being maintained by an outside
group that is quite good about accepting information about new file
formats.  This means that Gemini servers can serve content in most formats
without a problem.

Unfortunately, at the moment `file --mime`  will report either "text/plain;
charset=utf-8" or "text/plain; charset=us-ascii" instead of text/gemini.
So an interesting question is: how can a text/gemini file best be
identified by its content? It doesn't have to be infallible, because it can
be backed up by checking the extension.


John Cowan          http://vrici.lojban.org/~cowan        cowan at ccil.org
The native charset of SMS messages supports English, French, mainland
Scandinavian languages, German, Italian, Spanish with no accents, and
GREEK SHOUTING.  Everything else has to be Unicode, which means you get
only 70 16-bit characters in a text instead of 160 7-bit characters.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201106/a3ebaca9/attachment.htm>


More information about the Gemini mailing list