What is url encoding and decoding

URL encoding

URL encoding is a method with the help of which special characters or invalid characters in a URL can be interpreted by browsers or servers. In this case, the corresponding URL is transferred to the ASCII code. It is therefore advisable to use only ASCII characters for URLs. The standard for the structure of URLs is defined in RFC3986.

Background [edit]

Data transmission on the Internet has been based on the American Standard Code for Information Interchange, or ASCII for short, since the beginning. The characters available are based on an English typewriter keyboard and include both the Latin alphabet in upper and lower case as well as Arabic numerals and some punctuation marks. ASCII characters were originally assigned a 7-bit pattern. An 8-bit pattern is used today. In this way, all possible characters and codes can be displayed, since 28 possible combinations can be created.

Finally, URL encoding is based on ASCII and provides solutions for spaces or other special characters in URLs. These problems can occur especially with automatically generated URLs, for example when product or article titles are converted into URLs.

Encoding in URLs is always started with a%.

Example [edit]

A space in the URL is usually interpreted as the end of a URL. If there is a space in the middle of the URL (e.g. www.example.de/new page.html) it will lead to an error because the browser cannot resolve the URL. If a user calls up such a URL, he receives, for example, a 404 error code. The URL encoding replaces the space with an ASCII character, in this case 20 hexadecimal (% 20)


Invalid characters [edit]

These characters run the risk of not being interpreted correctly. It is recommended to encode the following characters in any case:

"<> #% {} \ | ^ [] `and spaces

Reserved characters [edit]

The following characters are reserved and have a specific meaning in the data path. They cannot always be easily encoded. This includes:

! # $ % & ' ( ) * + , / : ; = ? @ [ ]

For example, the # in a URL marks a jump label within a website. The & symbol marks a query string and separates individual parameters in the URL, while the equal sign (=) specifies the value of a parameter.

Unreserved characters [edit]

These characters are not reserved and have no predefined meaning for the URL. The unreserved characters include:

Letters [A-Z, a-z], digits [0-9] and - _. ~

Encoding tools [edit]

There are many tools on the web that can quickly and easily convert an invalid URL to a valid one. For smaller websites, manual URL encoding can still be handled. But for larger web projects, webmasters and SEO should make sure in advance that URLs are encoded in such a way that they can be easily interpreted by browsers and servers.

Importance for SEO

URL encoding is important so that users and servers can correctly interpret and retrieve URLs. Incorrect URLs can lead to a large number of error codes. Every error code, in turn, can be interpreted by search engines as insufficient maintenance of the website. Users themselves send negative signals to Google and other search engines if they quickly jump off error pages, which are caused, for example, by aborted URLs with spaces. When encoding URLs, it is important to ensure that on the one hand no parameter characters or other reserved characters are encoded. At the same time, converting to SEF URLs can lead to problems with double coding and thus to problems when retrieving URLs. To avoid problems with URL encoding, UTF-8 should be used for encoding.

Web links [edit]