Good practices regarding MIME type

Philip Linde linde.philip at gmail.com
Fri Dec 11 09:41:37 GMT 2020


On Thu, 10 Dec 2020 22:12:41 +0100
Solène Rapenne <solene at perso.pw> wrote:

> Hi,
> 
> I wrote a gemini server in C and I currently use an hardcoded list of 
> file extensions <-> MIME type assocation.
> This isn't great because it relies on file extension which can be wrong, 
> but a file without extension would
> use a default.
> 
> I chose to set a default text/gemini in case the extension is unknown or 
> if the file has no extension.
> 
> What are the good practices to determine a file MIME type?

There is no way that will work completely reliably without implementing
full parsers of the different file types. For a complete server I'd
expect to be able to determine file type for a certain served resource
myself without relying on an extension-type mapping. For example, to be
able to say that every file under /text/ is text/plain.

John Cowan suggests libmagic and file. AFAIK utilities/libraries like
this can operate using matching rules on some-few bytes of a file to
determine the file type with a limited degree of accuracy

I suggest a procedure like this to determine the file type:

1. Check if there is a configuration rule for this particular file to
   determine its file type. If so, use that.
2. If there is not, check if the server extension-type mapping
   configuration contains the file extension. If so, use that.
3. If there is not, check the system level mime type database if there
   is a type assigned to the extension. If so, use that.
4. If there is not, you can now optionally use some heuristic approach
   to determine the file type. This can be via a library like
   libmagic, or a simpler approach as suggested by makeworld to
   determine whether you can defer to text/plain or not. If so, use
   that.
5. If not, assume application/octet-stream and use that.

You could cache the results in memory and drop the cache on e.g. SIGHUP

As for extension-less files, if you don't want extensions visible to
the client, you can still use extensions on the server side, which the
server optionally strips off.

Overall I think it's fair to expect some level of effort from the
server operator in making sure that the static files have sensible
extensions. Any smartness beyond extension mapping is at best a bonus
AFAIC, at worst a potentially nasty surprise.

-- 
Philip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201211/818212f4/attachment-0001.sig>


More information about the Gemini mailing list