Good practices regarding MIME type
Philip Linde
linde.philip at gmail.com
Fri Dec 11 09:41:37 GMT 2020
On Thu, 10 Dec 2020 22:12:41 +0100
Solène Rapenne <solene at perso.pw> wrote:
> Hi,
>
> I wrote a gemini server in C and I currently use an hardcoded list of
> file extensions <-> MIME type assocation.
> This isn't great because it relies on file extension which can be wrong,
> but a file without extension would
> use a default.
>
> I chose to set a default text/gemini in case the extension is unknown or
> if the file has no extension.
>
> What are the good practices to determine a file MIME type?
There is no way that will work completely reliably without implementing
full parsers of the different file types. For a complete server I'd
expect to be able to determine file type for a certain served resource
myself without relying on an extension-type mapping. For example, to be
able to say that every file under /text/ is text/plain.
John Cowan suggests libmagic and file. AFAIK utilities/libraries like
this can operate using matching rules on some-few bytes of a file to
determine the file type with a limited degree of accuracy
I suggest a procedure like this to determine the file type:
1. Check if there is a configuration rule for this particular file to
determine its file type. If so, use that.
2. If there is not, check if the server extension-type mapping
configuration contains the file extension. If so, use that.
3. If there is not, check the system level mime type database if there
is a type assigned to the extension. If so, use that.
4. If there is not, you can now optionally use some heuristic approach
to determine the file type. This can be via a library like
libmagic, or a simpler approach as suggested by makeworld to
determine whether you can defer to text/plain or not. If so, use
that.
5. If not, assume application/octet-stream and use that.
You could cache the results in memory and drop the cache on e.g. SIGHUP
As for extension-less files, if you don't want extensions visible to
the client, you can still use extensions on the server side, which the
server optionally strips off.
Overall I think it's fair to expect some level of effort from the
server operator in making sure that the static files have sensible
extensions. Any smartness beyond extension mapping is at best a bonus
AFAIC, at worst a potentially nasty surprise.
--
Philip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201211/818212f4/attachment-0001.sig>
More information about the Gemini
mailing list