How post values are encoded

How post values are encoded

XMLHttpRequest POST, формы и кодировка

Материал на этой странице устарел, поэтому скрыт из оглавления сайта.

Более новая информация по этой теме находится на странице https://learn.javascript.ru/xmlhttprequest.

Во время обычной отправки формы

* ‘ ( ) заменяются на их цифровой код в UTF-8 со знаком %.

в JavaScript есть функция encodeURIComponent для получения такой кодировки «вручную»:

Эта кодировка используется в основном для метода GET, то есть для передачи параметра в строке запроса. По стандарту строка запроса не может содержать произвольные Unicode-символы, поэтому они кодируются как показано выше.

GET-запрос

Поэтому в некоторых фреймворках, чтобы сказать серверу, что это AJAX, добавляют специальный заголовок, например такой:

POST с urlencoded

В стандартных HTTP-формах для метода POST доступны три кодировки, задаваемые через атрибут enctype :

В зависимости от enctype браузер кодирует данные соответствующим способом перед отправкой на сервер.

В случае с XMLHttpRequest мы, вообще говоря, не обязаны использовать ни один из этих способов. Главное, чтобы сервер наш запрос понял. Но обычно проще всего выбрать какой-то из стандартных.

Для примера отправим запрос в кодировке application/x-www-form-urlencoded :

Всегда используется только кодировка UTF-8, независимо от языка и кодировки страницы.

Если сервер вдруг ожидает данные в другой кодировке, к примеру windows-1251, то их нужно будет перекодировать.

Кодировка multipart/form-data

Кодировка urlencoded за счёт замены символов на %код может сильно «раздуть» общий объём пересылаемых данных. Поэтому для пересылки файлов используется другая кодировка: multipart/form-data.

В этой кодировке поля пересылаются одно за другим, через строку-разделитель.

Чтобы использовать этот способ, нужно указать его в атрибуте enctype и метод должен быть POST:

Форма при такой кодировке будет выглядеть примерно так:

…То есть, поля передаются одно за другим, значения не кодируются, а чтобы было чётко понятно, какое значение где – поля разделены случайно сгенерированной строкой, которую называют «boundary» (англ. граница), в примере выше это RaNdOmDeLiMiTeR :

Такой способ используется в первую очередь при пересылке файлов, так перекодировка мегабайтов через urlencoded существенно загрузила бы браузер. Да и объём данных после неё сильно вырос бы.

Однако, никто не мешает использовать эту кодировку всегда для POST запросов. Для GET доступна только urlencoded.

POST с multipart/form-data

Сделать POST-запрос в кодировке multipart/form-data можно и через XMLHttpRequest.

Достаточно указать в заголовке Content-Type кодировку и границу, и далее сформировать тело запроса, удовлетворяющее требованиям кодировки.

Пример кода для того же запроса, что и раньше, теперь в кодировке multipart/form-data :

Тело запроса будет иметь вид, описанный выше, то есть поля через разделитель.

Можно создать запрос, который сервер воспримет как загрузку файла.

Для добавления файла нужно использовать тот же код, что выше, модифицировав заголовки перед полем, которое является файлом, так:

FormData

Современные браузеры, исключая IE9- (впрочем, есть полифил), поддерживают встроенный объект FormData, который кодирует формы для отправки на сервер.

Это очень удобно. Например:

Другие кодировки

XMLHttpRequest сам по себе не ограничивает кодировку и формат пересылаемых данных.

Поэтому для обмена данными часто используется формат JSON:

Итого

В XMLHttpRequest можно использовать и другие HTTP-методы, например PUT, DELETE, TRACE. К ним применимы все те же принципы, что описаны выше.

Understanding HTML Form Encoding: URL Encoded and Multipart Forms

The other day I was trying to write a REST endpoint in Go, which uploads the contents of a form submitted in a browser to another REST endpoint, in other words,

While doing that I ended up learning some fundamentals of how HTML forms work. So thought it might be a good thing to share what I learned and hence the post.. 🙂

Now, let us look at each form type with an example to understand them better.

URL Encoded Form

As the name suggests, the data that is submitted using this type of form is URL endcoded. Take the following form,

Exit fullscreen mode

Here, you can see that the form is submitted to the server using a POST request, this means that it has a body. But how is the body formatted? It is URL encoded. Basically, a long string of (name, value) pairs are created. Each (name, value) pair is separated from one another by a & (ampersand) sign, and for each (name, value) pair, the name is separated from the value by an = (equals) sign, like say,

For the above form, it would be,
username=sidthesloth&password=slothsecret

Don’t the URL encoded body and the query parameters passed in the action URL look awfully similar? It’s because they are similar. They share the same format discussed above.

Try creating an HTML file with the above code and see how it’s submitted in the dev tools. Here is a snap,

How post values are encoded. Смотреть фото How post values are encoded. Смотреть картинку How post values are encoded. Картинка про How post values are encoded. Фото How post values are encoded

Note: Don’t get confused by the term Form Data in the screen shot. It’s just how Google Chrome represents form fields.

Exit fullscreen mode

Now try to submit the form and see how the form fields are transferred in the dev tools. Here is a dev tools snap in Chrome.

How post values are encoded. Смотреть фото How post values are encoded. Смотреть картинку How post values are encoded. Картинка про How post values are encoded. Фото How post values are encoded

Clearly, you can see that the spaces are replaced by either ‘%20’ or ‘+’. This is done for both the query parameters and the form body.

Read this to understand when + and %20 can be used. This encompasses the URL encoding process.

Multipart Forms

Multipart forms are generally used in contexts where the user needs files to be uploaded to the server. However, we’ll just focus on simple text field based forms, as is it enough to understand how they work.

Exit fullscreen mode

Let’s go ahead and submit it and see how it appears in the dev tools.

How post values are encoded. Смотреть фото How post values are encoded. Смотреть картинку How post values are encoded. Картинка про How post values are encoded. Фото How post values are encoded

There are the two things to notice here, the Content-Type header and the payload of the form request. Let’s go through them one by one.

Content-Type Header

Request Body

The request payload contains the form fields themselves. Each (name, value) pair is converted into a MIME message part of the following format,

— >
Content-Disposition: form-data; name=» >»

The above format is repeated for each (name, value) pair.

— >
Content-Disposition: form-data; name=» >»

>
— >
Content-Disposition: form-data; name=» >»

Now, we see how the boundary value is used.

In the case of an application/x-www-form-urlencoded form, the & ampersand kind of acts as a delimiter between each (name, value) pair, enabling the server to understand when and where a parameter value starts and ends.

—XXX
Content-Disposition: form-data; name=»username»

sidthesloth
—XXX
Content-Disposition: form-data; name=»password»

The hyphens themselves are not part of the boundary value but rather needed as part of the request format. The Content-Type header for the above request would be,

Content-Type: multipart/form-data; boundary=XXX

This allows the browser to understand, when and where each field starts and ends.

Text/plain Forms

These forms are pretty much the same as the URL encoded forms, except that the form fields are not URL encoded when sent to the server. These are not used widely in general, but they have been introduced as a part of the HTML 5 specification.

Avoid using them as they meant for human understanding and not for machines.

As quoted from the spec,

Payloads using the text/plain format are intended to be human readable. They are not reliably interpretable by computer, as the format is ambiguous (for example, there is no way to distinguish a literal newline in a value from the newline at
the end of the value).

Hope, I was clear in explaining what I learnt..See you in the next one guys..Peace.. 🙂

Encoding data for POST requests

Right now, when you go to copilot.github.com you’re greeted with this example:

I’m going to dig into the right way, but also take a stroll around some related, lesser-known APIs:

URLSearchParams

URLSearchParams handles encoding and decoding application/x-www-form-urlencoded data. It’s pretty handy, because, well…

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms.

…so yeah, it’s a bad idea to try and encode/decode it yourself. Here’s how it works:

The constructor also accepts an array of name/value pairs, or an iterator that yields name/value pairs:

Reading URLSearchParams

Which means you can easily convert it into an array of name/value pairs:

But, be aware that converting to an object is sometimes a lossy conversion:

url.searchParams

URL objects have a searchParams property which is really handy:

Unfortunately, location.searchParams is undefined. This is because the definition for window.location is complicated by how certain properties of it work across origins. For instance setting otherWindow.location.href works across origins, but getting it isn’t allowed. Anyway, to work around it:

URLSearchParams as a Fetch body

Ok, now we’re getting to the point. The code in the example at the start of the article is broken as it isn’t escaping the input:

To make things easier, URLSearchParams can be used directly as a Request or Response body, so the ‘correct’ version of the code from the start of the article is:

FormData

You can populate FormData state directly:

This gives you the data that would be submitted by the form. I often find this much easier than getting the data from each element individually.

FormData as a Fetch body

…which logs something like:

Converting to URLSearchParams

Since the URLSearchParams constructor accepts an iterator that yields name/value pairs, and FormData ‘s iterator does exactly that, you can convert from one to the other:

Reading Fetch bodies as FormData

You can also read a Request or Response object as FormData :

Other Fetch bodies

There are a few other formats that can be fetch bodies:

Blobs

Blob objects (and therefore File since it inherits from Blob ) can be fetch bodies:

Strings

Buffers

This doesn’t set the Content-Type header automatically, so you need to do that yourself.

Streams

And finally, fetch bodies can be streams! For Response objects, this allows all kinds of fun with a service worker, and more recently they can be used with requests too.

So yeah, don’t try to handle multipart/form-data or application/x-www-form-urlencoded yourself, let FormData and URLSearchParams do the hard work!

I’m not against things like GitHub Copilot either. Just treat the output like an answer on StackOverflow, and review it before committing it.

Bonus round: Converting FormData to JSON

Nicholas Mendez tweeted me to ask how FormData could be serialised as JSON without data loss.

Forms can contain fields like this:

…where multiple values can be selected, or you can have multiple inputs with the same name:

The result is a FormData object that has multiple entries with the same name, like this:

There are a few ways to avoid data loss and still end up with something JSON-stringifyable. Firstly, there’s the array of name/value pairs:

But if you want an object rather than an array, you can do this:

…which gives you:

I like that every value is an array, even if it only has one item. That prevents a lot of code branching on the server, and simplifies validation. Although, you might prefer the PHP/Perl convention where a field name that ends with [] signifies «this should produce an array»:

And to convert it:

…which gives you:

How post values are encoded. Смотреть фото How post values are encoded. Смотреть картинку How post values are encoded. Картинка про How post values are encoded. Фото How post values are encoded

Hello, I’m Jake and that is my tired face. I’m a developer advocate for Google Chrome.

Elsewhere

Contact

Feel free to throw me an email, unless you’re a recruiter, or someone trying to offer me ‘sponsored content’ for this site, in which case write your request on a piece of paper, and fling it out the window.

Should I URL-encode POST data?

I’m POSTing data to an external API (using PHP, if it’s relevant).

Should I URL-encode the POST variables that I pass?

Or do I only need to URL-encode GET data?

UPDATE: This is my PHP, in case it is relevant:

4 Answers 4

Trending sort

Trending sort is based off of the default sorting method — by highest score — but it boosts votes that have happened recently, helping to surface more up-to-date answers.

It falls back to sorting by highest score if no posts are trending.

Switch to Trending sort

General Answer

The general answer to your question is that it depends. And you get to decide by specifying what your «Content-Type» is in the HTTP headers.

A value of «application/x-www-form-urlencoded» means that your POST body will need to be URL encoded just like a GET parameter string. A value of «multipart/form-data» means that you’ll be using content delimiters and NOT url encoding the content.

Specific Answer

For an answer specific to the PHP libraries you’re using (CURL), you should read the documentation here.

Here’s the relevant information:

TRUE to do a regular HTTP POST. This POST is the normal application/x-www-form-urlencoded kind, most commonly used by HTML forms.

CURLOPT_POSTFIELDS

The full data to post in a HTTP «POST» operation. To post a file, prepend a filename with @ and use the full path. The filetype can be explicitly specified by following the filename with the type in the format ‘;type=mimetype’. This parameter can either be passed as a urlencoded string like ‘para1=val1&para2=val2&. ‘ or as an array with the field name as key and field data as value. If value is an array, the Content-Type header will be set to multipart/form-data. As of PHP 5.2.0, value must be an array if files are passed to this option with the @ prefix.

application/x-www-form-urlencoded and charset=»utf-8″?

In particular, when using accept-charset=»utf-8″ in a form tag, I would expect some indication that utf-8 is being used in the headers, but I’m not seeing any.

Here is my simple test in Chrome. The form page is:

And the headers for the generated request are:

What’s the convention for specifying how the form parameter values are encoded?

2 Answers 2

Trending sort

Trending sort is based off of the default sorting method — by highest score — but it boosts votes that have happened recently, helping to surface more up-to-date answers.

It falls back to sorting by highest score if no posts are trending.

Switch to Trending sort

There is no charset parameter defined for this media type.

The application/x-www-form-urlencoded standard implies UTF-8 and percent-encoding.

How post values are encoded. Смотреть фото How post values are encoded. Смотреть картинку How post values are encoded. Картинка про How post values are encoded. Фото How post values are encoded

Note: that in step 2 of the above link it says: «Otherwise, let the selected character encoding be UTF-8.» (see:http://www.w3.org/TR/html5/forms.html#application/x-www-form-urlencoded-encoding-algorithm.)

I also, believe this seems to indicate that it’s a best practice for User agents to use UTF-8?

Here’s what it says: B.2.1 Non-ASCII characters in URI attribute values

Although URIs do not contain non-ASCII values (see [URI], section 2.1) authors sometimes specify them in attribute values expecting URIs (i.e., defined with %URI; in the DTD). For instance, the following href value is illegal:

We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases:

This procedure results in a syntactically legal URI (as defined in [RFC1738], section 2.2 or [RFC2141], section 2) that is independent of the character encoding to which the HTML document carrying the URI may have been transcoded.

Note. Some older user agents trivially process URIs in HTML using the bytes of the character encoding in which the document was received. Some older HTML documents rely on this practice and break when transcoded. User agents that want to handle these older documents should, on receiving a URI containing characters outside the legal set, first use the conversion based on UTF-8. Only if the resulting URI does not resolve should they try constructing a URI based on the bytes of the character encoding in which the document was received.

Note. The same conversion based on UTF-8 should be applied to values of the name attribute for the A element.

Источники информации:

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *