libmagic

Philip Linde linde.philip at gmail.com
Sat Nov 7 14:44:36 GMT 2020


On Fri, 6 Nov 2020 20:21:50 -0500
John Cowan <cowan at ccil.org> wrote:

> Any ideas for identifying text/gemini by regex?

There are none that aren't likely to produce either false positives or
false negatives. Consider this valid, plausibly realistic text/gemini
document (indented):

  # Hello
  
  This is the first paragraph.
  
  This is the second.

  * List item 1
  * List item 2
  
  ## Subsection 1
  
  This is a subsection.

  ```
  This is some pre-formatted text
  ```

This is also a valid text/markdown and text/plain document. I'd say
that the most easily identifiable trait of text-gemini is the link
arrows, but plenty of documents contain no links, and (less likely)
examples that would generate false positives/negatives can still be
created.

IMO, solutions like libmagic should be used as a last resort. Let the
server admin associate file types with extensions (for example via a
system provided MIME database, or in a cascading fashion starting with
manual associations in the server configuration file, and using the MIME
database if that fails). If no such association exists, by all means
utilize libmagic and hope for the best.

-- 
Philip
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201107/d27f4ba2/attachment.sig>


More information about the Gemini mailing list