Our Old Friend, the Data URL


Thinking a bit deeper about what data URLs could do for . They are not suitable for all content, of course, but in some situations there are interesting benefits.

There's a valid case to be made that allowing data URLs steps over the intended constraints of Gemini.

Data URLs basically turn Gemtext into a container format that can have arbitrary other data types inside it. This also takes away the user's ability to completely choose which links' content is fetched, giving a bit more power to the server who is effectively able to "preload" content of some links, forcing them on the client whether it wants the data or not.

A compromise would be to add a requirement in the specification that clients must disregard links that are longer than N bytes, where N is a reasonably low number like 4096 or even 1024 bytes.

This would allow some of the nice use cases of data URLs (like small metadata attachments), but would protect against inappropriate image attachments, for example.

I've updated the post with this as a downside ("Breaking constraints").

@jk Great post! I'm new to Gemini and I'm absolutely in love with it.

@jk I think URLs in gemtext links should respect the maximum length of 1024 bytes that spec already defines for URLs used in requests. Without that we could end up with "unfollowable" gemini://... links.

Of course, we could define a different max URL length depending on the scheme, but that seems unnecessarily complicated.

@antolius When it comes to Gemini URLs, the length limit is enforced by spec-compliant servers who will refuse to process requests with overly long URLs.

I think the core issue is whether it's justified to limit the length of all URLs regardless of scheme. That kind of a decision could impose unwanted functional limitations on linking to external resources.

@jk Yeah, that's kind of what I was saying: gemini scheme URLs are already effectively limited to 1024 bytes.
So now we could:
1. Define another limit for data scheme URLs, at which point we'll have different per-scheme limits. This is the one option that seems messy to me.
2. Reuse the same 1024 limit for data & potentially other schemes, keeping the spec relatively simple. We could phrase this as: clients may ignore any URL link longer than 1024 bytes.
3. Leave tings as is, i.e. unrestricted.

@jk is it still possible with the protocol? I mean, you client can either skip rendering of those links, or drop the connection whatsoever, but there is no solution to drop continue loading document unbloated.

@yottatsa Indeed there isn’t, except for a client to pause the download at N total bytes received, asking the user if they want to continue. If accepted, the client can just resume reading from the socket. This works for long “normal” pages, too.

The protocol doesn’t require a client to finish reading the response ASAP.

@jk this is neat solution, especially if the servers would be using poll rather than a thread per request model. This also could be used to DoS the server intentionally.

@yottatsa True, servers would have to be prepared to terminate connections that pause for too long. Having implemented my own server, it was necessary to add this kind of a timeout anyway given the nature of the traffic out there.

Misbehaving clients could also be intentionally reading 1 byte per second, so servers can't be naive about how they manage their socket I/O.

@jk From my point of view there is nothing that speaks against usage of data urls in general. Enrich content with images if images add real value, this is great. And the same is valid as in the web: The smaller images are the better.

@jk This looks cool :) I tested it out in Lagrange but found a bug where your image of the rocket system is displayed to be 0.0 Mo. Is it really so lightweight that weights less than 0.1 Mo?

@Arco It sure is: the original PNG is 4315 _bytes_. 🙂

But you're right, the caption should switch to KB for such small images.

@jk Ah yes indeed it's very small :) But it would be cool if the unit adapts automatically!

@jk another downside I see with bloating that if the documents size grow uncontrollably, it would inevitably put a strain on network/server, as protocol has no caching.

@jk I don't think I want them but I surely appreciated how measured your post was 😀

Sign in to participate in the conversation

skyjake's personal Mastodon instance