[spec] Limit valid encodings of text/gemini to UTF-8

Peter Vernigorov pitr.vern at gmail.com
Wed Dec 30 02:31:06 GMT 2020


On Wed, Dec 30, 2020 at 00:04 Petite Abeille <petite.abeille at gmail.com>
wrote:

>
>
> > On Dec 29, 2020, at 22:24, Peter Vernigorov <pitr.vern at gmail.com> wrote:
> >
> > Looking at latest stats on
> > gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi it looks like
> > UTF-8 (this includes unspecified charsets which per spec default to
> > UTF-8) is used by 81% of pages, US-ASCII accounts for 17%.
>
> The actual numbers are as follow:
>
> • Unspecified: 39628
> • us-ascii: 9995
> • utf-8: 7090
> ( 56,713 total)
>
> It's not clear if this pertain to the 36,477 text/gemini documents only,
> or the entire dataset (57,164 url vs. 56,713 encodings. 451 MIA).
>

Could you clarify which part is unclear to you here? 56,713 is, by design,
a strict /superset/ of 36k :)


> Looking at the numbers I guess it covers the entire data set as there are
> more 'Unspecified' than 'text/gemini' to start with.
>
> I'm not sure what these numbers mean at all, but they are not describing
> text/gemini.
>
> Not sure why we would draw any conclusion from them in regards to
> text/gemini.
>

While it’s true that the thread subject mentions text/gemini, the oft
quoted part of the spec is in section “3.3 Response bodies” and talks about
any text/* responses. The only mention of charset in section 5 (which
describes text/gemini) is a reference to 3.3. Also, looking at stats of
either entire dataset or only text/gemini shows the same picture: utf-8 and
us-ascii account for ~99% of all charset values.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20201230/ea7dac64/attachment.htm>


More information about the Gemini mailing list