I feel like whenever I see the ampersand on this website, it’s followed with “amp;”. I’ve noticed it other places on the internet also. Why does this happen? Is it some programming thing?
Just for a test: &
Let’s see of I can give a trimmed-down explanation of what “character escapement” is, because others have covered that
&
in web-land is an escapement character.The simplest type of escapement is probably quotes.
var myString = "This is a string";
This little line of pseudo-code is roughly what you would write (depending on language) to make a program write the text
This is a string
into some location in memory, that is, a sequence of numbers that are the standard numbers for representing those letters.Here, the double-quote character is serving a special purpose, to designate that the characters within the set of quotes represent not instructions for things that the program should do, but instead just bits of data that the program should load.
Now consider: what if the character data that you want to load into memory has an actual double-quote character within it? How does the compiler (the program that turns your code into its own program) know the difference between a double-quote character that’s supposed to serve the special purpose, and a double-quote character that’s just supposed to be a piece of data like the other characters? The answer is escapement.
var myString = "This is a \"string\"";
Here, the backslash character serves its own special purpose of escaping other characters. When the compiler is reading this code, it knows that whatever character follows the backslash is supposed to be interpreted specially: in this case, the double-quote should not be interpreted as the end of the string, as usual, but as just a character to be put within the string. The backslash doesn’t end up in memory with the other characters, but it tells the compiler how to interpret things.
In web-land, ampersand is an escape character. If you want to embed plain text to be displayed on the screen, within HTML, you need to “escape” special characters that have a non-text purpose normally, in order to get those characters to display as text. Ampersand is the escape character in HTML, and by extension, it also has its OWN escape sequence, which is
&
.The reason you see
&
in places across Lemmy is likely just due to a bug of some kind. Somewhere between when the user is entering this text, and when it later gets displayed, there’s code that’s adding escapement to the text an extra time than is necessary.It’s because some part of the post is being sanitized to reduce the possibility of a security flaw by someone managing to type in something that could be executed by the server or your web browser in an unexpected way.
https://github.com/LemmyNet/lemmy/blob/main/RELEASES.md#major-changes-1
In terms of security, Lemmy now performs HTML sanitization on all messages which are submitted through the API or received via federation. Together with the tightened content-security-policy from 0.18.2, cross-site scripting attacks are now much more difficult.
The
&
symbol is however incorrectly parsed by the sanitizer, which will eventually be patched by the devs.The reason is because a programmer at some point decide that
&
should indicate the start of a special symbol in HTML. In programming parlance this is a means of “escaping” characters which are reserved.For example, in HTML, things look something like this:
<p>Hello, World!</p>
The p in the less than and greater symbol symbols means “paragraph” where the ending version with the slash means “the paragraph is done”.
However, there’s a problem. What if you wanted to actually type out
<p>
to the end-user and have it not be treated as HTML? You use the ampersand syntax to write<
by using<
andby using
.
</p><p><p></p>
Yet another problem: If we use
&
as a special character in HTML, we also need a way to display it—the answer is&
There’s not enough symbols on my keyboard and I want to type ¢ © ÷, so let’s invent a system so we can write them and other symbols!
- lets say & means start of code
- and say ; means end of code
- Between the start and end is the code
Now let’s make some codes
- ¢ can be ¢
- © can be ©
- ÷ can be ÷
I want to tell other people how to use our new code, but if I tell them to “just write ÷” it’ll turn my message into “just write ÷” !! So how can we fix this?
What if we make & its own code?
- & —> &
- &divide; —> ÷ ???
Yes! That’ll work :)
This is how & came to be, and it’s specifically used in HTML as a way to write those symbols above (and escape other a few other symbols for similar reasons we did with &)
As for why & shows up as &, there are 2 main places I can see this happening:
- The editor you use to write it automatically converts an & —> &. But the user typed in & (making it &amp;). I think this is most likely. I’m guessing the title of posts automatically do the conversion, but the post body and comments do not because it uses a raw markdown editor
- In some contexts the & specifically doesn’t get converted? like how you can write `&` to get
&
as opposed to seeing&