Discussion:
[issue22341] Python 3 crc32 documentation clarifications
Martin Panter
2014-09-05 11:55:29 UTC
Permalink
New submission from Martin Panter:

This is regarding the Python 3 documentation for binascii.crc32(), <https://docs.python.org/dev/library/binascii.html#binascii.crc32>. It repeatedly recommends correcting the sign by doing crc32() & 0xFFFFFFFF, but it is not immediately clear why. Only after reading the Python 2 documentation does one realise that the value is always unsigned for Python 3, so you only really need the workaround if you want to support earlier Pythons.

I also suggest documenting the initial CRC input value, which is zero. Suggested wording:

binascii.crc32(data[, crc])
Compute CRC-32, the 32-bit checksum of ?data?, starting with the given ?crc?. The default initial value is zero. The algorithm is consistent with the ZIP file checksum. Since the algorithm is designed for use as a checksum algorithm, it is not suitable for use as a general hash algorithm. Use as follows:

print(binascii.crc32(b"hello world"))
# Or, in two pieces:
crc = binascii.crc32(b"hello", 0)
crc = binascii.crc32(b" world", crc)
print('crc32 = {:#010x}'.format(crc))

I would simply drop the notice box with the workaround, because I gather that the Python 3 documentation generally omits Python 2 details. (There are no ?new in version 2.4 tags? for instance.) Otherwise, clarify if ?packed binary format? is a reference to the ?struct? module, or something else.

Similar fixes are probably appropriate for zlib.crc32() and zlib.alder32().

Also, what is the relationship between binascii.crc32() and zlib.crc32()? I vaguely remember reading that ?zlib? is not always available, so I tend to use ?binascii? instead. Is there any advantage in using the ?zlib? version? The ?hashlib? documentation points to ?zlib? without mentioning ?binascii? at all.

----------
assignee: docs at python
components: Documentation
messages: 226419
nosy: docs at python, vadmium
priority: normal
severity: normal
status: open
title: Python 3 crc32 documentation clarifications
versions: Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________
Martin Panter
2014-12-19 05:14:30 UTC
Permalink
Martin Panter added the comment:

Here is a patch that fixes the binascii, zlib.crc32() and adler32() documentation as I suggested.

I’m still interested why there are two ways to do a CRC-32, each equally non-obvious as the other.

----------
keywords: +patch
Added file: http://bugs.python.org/file37504/crc-sign.patch

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________
Serhiy Storchaka
2015-02-28 12:06:47 UTC
Permalink
Serhiy Storchaka added the comment:

crc & 0xffffffff is still used in gzip, zipfile and tarfile. And some comments say about signess of 32-bit checksums.

----------
nosy: +serhiy.storchaka

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________
Martin Panter
2015-03-09 02:28:51 UTC
Permalink
Martin Panter added the comment:

Posting a new patch that also removes the masking from the gzip, zipfile, tarfile, and test_zlib modules. I removed the comment about signedness in tarfile; let me know if you saw any others.

----------
versions: +Python 3.5 -Python 3.4
Added file: http://bugs.python.org/file38398/crc-sign.v2.patch

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________
Serhiy Storchaka
2015-03-20 19:35:11 UTC
Permalink
Serhiy Storchaka added the comment:

These notes was added by Gregory in r68535. Ask him if they still are needed.

----------
nosy: +gregory.p.smith

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________
Serhiy Storchaka
2015-03-20 19:36:55 UTC
Permalink
Serhiy Storchaka added the comment:

See also issue4903.

----------

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________
Gregory P. Smith
2015-03-20 21:07:32 UTC
Permalink
Gregory P. Smith added the comment:

I do not object to the removal of the & 0xfffffff from the stdlib library code if these functions have actually been fixed to always return unsigned now. (double check the behavior, and if good, just do it!)

But I think the docs should still mention that the & 0xffffffff is a good practice if code needs to be compatible with Python versions prior to X.Y (list the last release before the behavior was corrected). Possibly within a .. versionchanged: section.

People often use the latest docs when writing code in any version of Python as we continually improve docs and are pretty good about noting old behaviors and when behaviors changed.

----------

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________
Martin Panter
2015-03-20 22:52:10 UTC
Permalink
Martin Panter added the comment:

Hi Gregory, I think the three functions have been fixed since Python 3.0. It looks like you changed them in revisions 6a7fa8421902 (zlib module) and 132cf3a79126 (crc32 functions). Looking at the 3.0 branch, all three functions use PyLong_FromUnsignedLong(), but in the 2.7 branch, they use PyInt_FromLong(). This is tested in test_zlib.ChecksumTestCase.test_crc32_adler32_unsigned(). See also my changes to test_penguins() in the patch here for further testing. There is no explicit testing of binascii.crc32() unsignedness, but it is implicitly tested by test_same_as_binascii_crc32(), because the expected CRC is beyond the positive 32 bit signed limit.

I am happy to put back the suggestion of masking for backwards compatibility if you really think it is necessary. But since it only involves Python < 3.0 compatibility, I thought it would be okay to remove it; see my original post. The documentation is often pretty good about noting when Python 3 behaviours changed, but you usually have to look elsewhere to see the differences from Python 2.

----------

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________
Martin Panter
2015-03-30 12:19:38 UTC
Permalink
Martin Panter added the comment:

Patch v3:

* Reverted to original crc32(b"hello") example call with the implicit initial CRC
* Added “Changed in version 3.0” notices, restoring a brief version of the suggestion to use the bit mask, along with an explanation.

Python 2 compatibility information is generally unprecedented in the Python 3 documentation though, but hopefully this version should make more sense to people not already familiar with the Python 2 odd behaviour.

----------
Added file: http://bugs.python.org/file38737/crc-sign.v3.patch

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________
Martin Panter
2015-03-30 12:40:55 UTC
Permalink
Martin Panter added the comment:

V4 fixes a merge conflict with recent gzip changes.

----------
Added file: http://bugs.python.org/file38739/crc-sign.v4.patch

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue22341>
_______________________________________

Loading...