Discussion:
[issue13866] {urllib, urllib.parse}.urlencode should not use quote_plus
(too old to reply)
Stephen Day
2012-01-25 22:12:02 UTC
Permalink
from urllib import urlencode
urlencode({'a': 'some param'})
'a=some+param'
urlencode({'a': 'some param'})
'a=some%20param'

But there is no way to get this behavior in the standard library.

It would probably best to change this so it defaults to use the regular quote function, but allows callers who need the legacy quote_plus behavior to pass that in as a function parameter.
urlencode({'a': 'some param'})
'a=some+param'
from urllib import quote
urlencode({'a': 'some param'}, quote=quote)
'a=some%20param'

----------
components: Library (Lib)
messages: 151980
nosy: Stephen.Day
priority: normal
severity: normal
status: open
title: {urllib,urllib.parse}.urlencode should not use quote_plus
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Terry J. Reedy
2012-01-27 23:34:02 UTC
Permalink
Changes by Terry J. Reedy <tjreedy at udel.edu>:


----------
nosy: +orsenthil
stage: -> test needed
type: -> enhancement
versions: -Python 2.7, Python 3.1, Python 3.2, Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Éric Araujo
2012-02-03 13:49:25 UTC
Permalink
Changes by ?ric Araujo <merwok at netwok.org>:


----------
nosy: +eric.araujo

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Senthil Kumaran
2012-02-13 07:23:49 UTC
Permalink
Senthil Kumaran <senthil at uthcode.com> added the comment:

Stephen - urlencode is responsible for producing the application/x-www-form-urlencoded format, usually used in the FORMs in the web.
As per the spec, the Space characters are replaced by `+'. -

http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

What you are looking for is probably quote and quote_plus helper functions.

When I had this doubt (long back), I referred to Java's URLEncoder class to see how it was behaving and then looked at the HTML specs. It was kind of standard behavior across different libraries. Closing this as invalid.

----------
resolution: -> invalid
stage: test needed -> committed/rejected
status: open -> closed

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Stephen Day
2012-02-13 20:46:46 UTC
Permalink
Stephen Day <stevvooe at gmail.com> added the comment:

I apologize for reopening this bug, but I find your interpretation to be inaccurate. While technically valid, the combination of the documentation, the function name and the main use cases yields pathological invocations of urlencode. My bug report is to help mitigate these problems.
from urllib import urlencode
from urlparse import urlunparse
urlunparse(('http', 'example.com', '/', None, urlencode({'a': 'some string'}), None))
'http://example.com/?a=some+string'

Any sane person would naturally gravitate to a function called "urlencode" to url encode a mapping type. If the urllib.urlencode function is indeed intended for form-encoding, as I agree is hinted in the documentation, it should indicate that its result is 'application/x-www-form-urlencoded' or it should be called "formencode".
quote({'a': 1})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1248, in quote
if not s.rstrip(safe):
AttributeError: 'dict' object has no attribute 'rstrip'

In addition, Java's URLEncoder implementation is hardly a good example of standards compliant URL manipulation. Python is not Java. The Python community needs to make its own, independent, mature language decisions. In general, the use of '+' to encode spaces in content, even if it is compliant against an arbitrary standard, is pathological, especially when used in urls. Even though python's quote_plus function works symmetrically on its own, when pluses are used in a multi-language environment it can become impossible to tell whether a plus is a literal '+' or an encoded space. In addition, the usage of '%20' for spaces will work in almost all cases.

RFC3986, Section 2 [1] describes the use of percent-encoding as a solution to representing reserved characters. In practice, percent-encoding is used on the value component of 'key=value' productions and this works in nearly all cases. The referenced standard [2], while relevant to the "implied" use case, is not applicable to url assembly.
'&'.join(['='.join((quote(k), quote(v))) for k,v in {'a': '1', 'b': 'with spaces'}.iteritems()])
'a=1&b=with%20spaces'

In most cases, people will just use urlencode, which uses pluses for spaces, yielding pathological, noncompliant urls.

In deference to this bug closure, there are a few options:

1. Close this issue and keep polluting the world's urls with pluses for spaces.

2. Make urlencode target path/query parameter encoding and then create a new function, formencode, for use in encoding form data, breaking backwards compatibility.

3. Simply add a keyword argument to urlencode to allow the caller to specify the encoding function and separator, retaining compatibility and satisfying all of the above use cases.

Naturally, 3 seems to be a very reasonable solution to this bug.

[1] http://tools.ietf.org/html/rfc3986#section-2 explicitly covers
[2] http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

----------
resolution: invalid ->
status: closed -> open

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Ezio Melotti
2012-02-18 10:12:44 UTC
Permalink
Changes by Ezio Melotti <ezio.melotti at gmail.com>:


----------
nosy: +ezio.melotti

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Michele Orrù
2012-02-21 08:53:43 UTC
Permalink
Changes by Michele Orr? <maker.py at gmail.com>:


----------
nosy: +maker

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Senthil Kumaran
2012-02-23 19:07:23 UTC
Permalink
Senthil Kumaran <senthil at uthcode.com> added the comment:

A couple of points to help summarize and to help come to a conclusion.

In the initial message, Stephen pointed out, "it would be desirable to merely encode spaces using percent encoding".

It seems to me that only in cases where a custom handling of query string is done, would space be encoded to %20 (or if it's an IRI instead of URI - details below) and for HTTP requests and in both GET and POST, encoding to space in a URI to + is a correct thing to do.

The query part in the URL always needs to follow the application/x-www-form-urlencoded format, so even when urlencode is used for constructing a query parameters, it should encode space to +

The argument that all characters should be hex encoded (and thereby space should be %20), seems to apply if it is an IRI. Look at an interesting discussion in this link:
http://stackoverflow.com/questions/5366007/why-does-the-encodings-of-a-url-and-the-query-string-part-differ/5433216#5433216

Only with this point as consideration. I think, sending a parameter for quote to use quote or quote_plus may be worthy option to consider (Stephen's point #3).

But I have to add that the existing behavior of replacing space with "+" in "URL"s is not breaking anything and in fact is following the rules properly.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Stephen Day
2012-02-23 20:01:02 UTC
Permalink
encodeURI('+ ')
"+%20"

Now, we have a string that will not decode symmetrically. In other words, we cannot tell if this string should decode to ' ' or '+ '. And while use of encodeURI is discouraged, application developers still use it places, introducing these kinds of errors.

Conversely, we can see that the behavior of encodeURIComponent, is unambiguous:

encodeURIComponent('+ ')
"%2B%20"

And while these are analogues to quote and quote_plus (there exists now analogue to javascripts urlencode), it's easy to see that disambiguating the encoding of the resulting output of urlencode would be desirable.

There is a similar situation with php library functions.

Furthermore, it is agreed that urlencode does follow the rules, but the rules, as they are, introduce an asymmetrical, pathological encoding. Most services accept '%20' as space in lieu of '+' when data is encoded as 'application/x-www-form-urlencoded' anyway.

Concluding, I know it seems a little silly to spend time filing this bug and provide relevant cases, but I'd like to cite professional experience in this matter; I have seen "pluses-for-spaces" introduce errors time and time again.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Chris Rebert
2012-02-23 21:22:49 UTC
Permalink
Changes by Chris Rebert <pybugs at rebertia.com>:


----------
nosy: +cvrebert

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
jin
2012-07-11 21:21:05 UTC
Permalink
jin <jin at mediatomb.cc> added the comment:

I just ran into exactly the same problem and was quite disappointed to see that urlencode does not provide an option to use percent encoding.

My use case: I'm preparing some metadata on the server side that is stored as an url encoded string, the processing is done in python.

The metadata is then deocded by a JavaScript web UI.

So I end up with:
urllib.urlencode({ 'key': 'val with space'}) which produces "key=val+with+space" which of course stays that way after processing it with JavaScript's decodeURI().

So basically I seem to be forced to implement my own urlencode function... Most thing I like about python that it always seems to have exactly what one needs, unfortunately not in this specific case.

IMHO Stephen's suggestion #3 makes a lot of sense, while '+' maybe correct for forms, it's simply not useful for a number of other situations and I was really surprised by the fact that there's no standard function that would url-encode with percentage encoding.

----------
nosy: +jin

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
samwyse
2012-07-13 22:42:40 UTC
Permalink
Changes by samwyse <samwyse at gmail.com>:


----------
nosy: +samwyse

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
samwyse
2012-07-14 10:52:16 UTC
Permalink
samwyse <samwyse at gmail.com> added the comment:

Since no one else seems willing to do it, here's a patch that adds a 'quote_via' keyword parameter to the urlencode function.
import urllib.parse
query={"foo": "+ "}
urllib.parse.urlencode(query)
'foo=%2B+'
urllib.parse.urlencode(query, quote_via=urllib.parse.quote)
'foo=%2B%20'

----------
keywords: +patch
Added file: http://bugs.python.org/file26378/urllib_parse.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Ronan Amicel
2013-01-07 22:35:16 UTC
Permalink
Changes by Ronan Amicel <ronan.amicel at gmail.com>:


----------
nosy: +ronnix

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Jeff Edwards
2014-01-26 18:42:18 UTC
Permalink
Jeff Edwards added the comment:

It's interesting how long this issue has been around. It seems to be because the form-urlencoded spec is specified as url-percent-encoding EXCEPT for ' ' -> '+', which does seem to be unintuitive.

To note, there are a few known cases where the exception does lead to either confusion or outright breakage, such as AWS Signature V4 authentication which requires an an HMAC of the 'canonical' query string which expected the parameters sorted and url encoding where ' ' -> '%20'. While I do not believe that should be the sole reason to force a change, it does add to the utility of the currently-submitted patch as written.

----------
nosy: +Jeff.Edwards

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Facundo Batista
2014-05-20 12:59:49 UTC
Permalink
Changes by Facundo Batista <facundo at taniquetil.com.ar>:


----------
stage: resolved -> patch review
versions: +Python 3.4 -Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Arnon Yaari
2015-04-14 18:58:52 UTC
Permalink
Arnon Yaari added the comment:

Updated patch to the correct format, added a test and some more documentation.

----------
nosy: +wiggin15
versions: +Python 3.5 -Python 3.4
Added file: http://bugs.python.org/file39006/issue13866.patch

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Martin Panter
2015-04-15 11:17:46 UTC
Permalink
Martin Panter added the comment:

To be consistent, I think the documentation should mark up the parameters with asterisks: *quote_via*. Also, you lost the markup for :func:`quote_plus`.

The test cases should probably use self.assertEqual(). The “assert” statement is not appropriate for testing because it can be optimized away.

You also need to clarify in the documentation and tests how the “safe” parameter interacts with the choice of quote function. Are slashes encoded or not by default with quote_via=quote?

----------
nosy: +vadmium

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Arnon Yaari
2015-04-15 14:32:34 UTC
Permalink
Changes by Arnon Yaari <***@gmail.com>:


Removed file: http://bugs.python.org/file39006/issue13866.patch

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Arnon Yaari
2015-04-15 14:33:13 UTC
Permalink
Arnon Yaari added the comment:

Fixed Martin's comments.

----------
Added file: http://bugs.python.org/file39036/issue13866.diff

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Martin Panter
2015-04-16 03:51:25 UTC
Permalink
Martin Panter added the comment:

New patch looks good.

----------

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
R. David Murray
2015-04-16 21:44:55 UTC
Permalink
R. David Murray added the comment:

Martin, if you think the patch is complete and ready to commit, please change the stage to commit review. I'm trying to encourage core devs to look at the patches in commit review state and commit them :)

----------
nosy: +r.david.murray

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Martin Panter
2015-04-17 04:18:21 UTC
Permalink
Martin Panter added the comment:

Yep I think this is ready. I’ll keep your advice in mind for other patches as well :)

----------
stage: patch review -> commit review

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Roundup Robot
2015-05-18 00:45:03 UTC
Permalink
Roundup Robot added the comment:

New changeset c7d82a7a9dea by R David Murray in branch 'default':
Issue #13866: add *quote_via* argument to urlencode.
https://hg.python.org/cpython/rev/c7d82a7a9dea

----------
nosy: +python-dev

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
R. David Murray
2015-05-18 00:45:36 UTC
Permalink
R. David Murray added the comment:

Thanks everyone.

----------
resolution: -> fixed
stage: commit review -> resolved
status: open -> closed

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Berker Peksag
2015-05-18 14:34:49 UTC
Permalink
Berker Peksag added the comment:

Just a suggestion: urlencode already has 5 parameters. We can make quote_via a keyword-only parameter.

----------
nosy: +berker.peksag

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
R. David Murray
2015-05-18 14:51:57 UTC
Permalink
R. David Murray added the comment:

I don't see any particular motivation to make it keyword only.

----------

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________
Martin Panter
2015-05-19 00:12:53 UTC
Permalink
Martin Panter added the comment:

Forcing the “quote_via” keyword wouldn’t help that much. I suggest to leave it as it is.

urlencode(query, True, "/", "ascii", "strict", quote)
urlencode(query, True, "/", "ascii", "strict", quote_via=quote)

On the other hand, forcing a keyword for the “doseq=True” flag would encourage easier-to-read code, but that ship has already bolted :)

----------

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue13866>
_______________________________________

Loading...