Michael Hudson
2009-07-30 00:32:16 UTC
New submission from Michael Hudson <mwh at users.sourceforge.net>:
If you call email.utils.make_msgid a number of times within the same
second, the uniqueness of the results depends on random.randint(100000)
returning different values each time.
A little mathematics proves that you don't have to call make_msgid
*that* often to get the same message id twice: if you call it 'n' times,
the probability of a collision is approximately "1 -
math.exp(-n*(n-1)/200000.0)", and for n == 100, that's about 5%. For n
== 1000, it's over 99%.
... msgids = [make_msgid() for i in range(n)]
... return len(msgids) - len(set(msgids))
...
I think probably having a counter in addition to the randomness would be
a good fix for the problem, though I guess then you have to worry about
thread safety.
----------
components: Library (Lib)
messages: 91073
nosy: mwh
severity: normal
status: open
title: calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids
type: behavior
versions: 3rd party, Python 2.4, Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1, Python 3.2
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6598>
_______________________________________
If you call email.utils.make_msgid a number of times within the same
second, the uniqueness of the results depends on random.randint(100000)
returning different values each time.
A little mathematics proves that you don't have to call make_msgid
*that* often to get the same message id twice: if you call it 'n' times,
the probability of a collision is approximately "1 -
math.exp(-n*(n-1)/200000.0)", and for n == 100, that's about 5%. For n
== 1000, it's over 99%.
... msgids = [make_msgid() for i in range(n)]
... return len(msgids) - len(set(msgids))
...
sum((collisions(100)>0) for i in range(1000))
49sum((collisions(1000)>0) for i in range(1000))
991I think probably having a counter in addition to the randomness would be
a good fix for the problem, though I guess then you have to worry about
thread safety.
----------
components: Library (Lib)
messages: 91073
nosy: mwh
severity: normal
status: open
title: calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids
type: behavior
versions: 3rd party, Python 2.4, Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1, Python 3.2
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6598>
_______________________________________