Discussion:
[issue15207] mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
Dave Chambers
2012-06-27 16:03:29 UTC
Permalink
New submission from Dave Chambers <dlchambers at aol.com>:

The current mimetypes.read_windows_registry() enums the values under HKCR\MIME\Database\Content Type
However, this is the key for mimetype to extension lookups, NOT for extension to mimetype lookups.
As a result, when >1 MIME types are mapped to a particular extension, the last-found entry is used.
For example, both "image/png" and "image/x-png" map to the ".png" file extension.
Unfortunately, what happens is this code finds "image/png", then later finds "image/x-png" and this steals the ".png" extension.


The solution is to use the correct regkey, which is the HKCR root.
This is the correct location for extension-to-mimetype lookups.
What we should do is enum the HKCR root, find all subkeys that start with a dot (i.e. file extensions), then inspect those for a 'Content Type' value.


The attached ZIP contains:
mimetype_flaw_demo.py - this demonstrates the error (due to wrong regkey) and my fix (uses the correct regkey)
mimetypes_fixed.py - My suggested fix to the standard mimetypes.py module.

----------
components: Windows
files: mimetype_flaw_demo.zip
messages: 164167
nosy: dlchambers
priority: normal
severity: normal
status: open
title: mimetypes.read_windows_registry() uses the wrong regkey, creates wrong mappings
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file26180/mimetype_flaw_demo.zip

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
R. David Murray
2012-06-27 17:39:18 UTC
Permalink
R. David Murray <rdmurray at bitdance.com> added the comment:

Thanks for working on this. Could you please post the fix as a patch file? If you don't have mercurial, you can generate the diff on windows using the python diff module (scripts/diff.py -u <yourfile> <origfile>). Actually, I'm not sure exactly where diff is in the windows install, but I know it is there.

Do you know if image/x-png and image/png are included in the registry on all windows versions? If so we could use that key for a unit test.

----------
components: +email
nosy: +barry, r.david.murray
stage: -> needs patch
versions: +Python 3.2, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Dave Chambers
2012-06-27 17:54:13 UTC
Permalink
Dave Chambers <dlchambers at aol.com> added the comment:

My first diff file... I hope I did it right :)

----------
keywords: +patch
Added file: http://bugs.python.org/file26181/mimetypes.py.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Dave Chambers
2012-06-27 18:06:04 UTC
Permalink
Dave Chambers <dlchambers at aol.com> added the comment:

I added a diff file to the bug.
Dunno if that's the same as a patch file, or how to create a patchfile if it's not.
Post by R. David Murray
Do you know if image/x-png and image/png are included in the registry on all
windows versions?
I think your question is reversed, in the same way that the code was reversed.
You're not looking for image/png and/or image/x-png. You're looking for .png in order to retrieve its mimetype (aka Content Type).
While nothing is 100% certain on Windows :), I'm quite confident that every copy will have an HKCR\.png regkey, and that regkey will have a Content Type value, and that value's setting will be the appropriate mometype, which I'd expect to be image/png.

I was kinda surprised to find this bug as it's so obvious
I started chasing it because Chrome kept complaining that pngs were being served as image/x-png (by CherryPy).
There are other bugs (eg: 15199, 10551) that my patch should fix.

-Dave

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
R. David Murray
2012-06-27 19:30:56 UTC
Permalink
R. David Murray <rdmurray at bitdance.com> added the comment:

Well, I had no involvement in the windows registry reading stuff, and it is relatively new. And, as issue 10551 indicates, a bit controversial. (issue 15199 is a different bug, in python's own internal table).

Can you run that diff again and use the '-u' flag? The -u (universal) format is the one we are used to working with. The one you posted still lets us read the changes, though, which is very helpful.

----------
nosy: +brian.curtin, tim.golden

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Dave Chambers
2012-06-27 19:54:01 UTC
Permalink
Changes by Dave Chambers <dlchambers at aol.com>:


Added file: http://bugs.python.org/file26185/mimetypes.py.diff.u

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Atsuo Ishimoto
2012-07-25 03:06:39 UTC
Permalink
Atsuo Ishimoto <ishimoto at gembook.org> added the comment:

This patch looks good to me.

I generated a patch for current trunk, with some cosmetic changes.

----------
nosy: +ishimoto
Added file: http://bugs.python.org/file26507/issue15207.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Yap Sok Ann
2012-08-22 01:17:34 UTC
Permalink
Yap Sok Ann added the comment:

On Python 2.7, I need to add this to the original diff by Dave, in the same try-except block:

mimetype = mimetype.encode(default_encoding) # omit in 3.x!

----------
nosy: +sayap

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
R. David Murray
2012-08-23 01:47:32 UTC
Permalink
R. David Murray added the comment:

Unfortunately I don't feel qualified to review the patch itself since I'm not a windows user and don't currently even have a windows box to test on. Hopefully one of the windows devs will take a look; the patch looks to be fairly straightforward to evaluate if one understands _winreg.

----------
nosy: +terry.reedy
stage: needs patch -> commit review

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Ben Hoyt
2012-12-06 07:57:41 UTC
Permalink
Ben Hoyt added the comment:

Ah, thanks for making this an issue of its own! As I commented over at Issue10551, it's a serious problem, and makes mimetypes.guess_type() unusable out of the box on Windows.

Yes, the fix in Issue4969 uses "MIME\Database\Content Type", which is a mime type -> file extension mapping, *not the other way around*.

So while this patch is definitely an improvement (for the most part it doesn't produce wrong values!), but I'm not sure it's the way to go, for a few reasons:

1) Many of the most important keys aren't in the Windows registry (in HKEY_CLASSES_ROOT, where this patch looks). This includes .png, .jpg, and .gif. All of these important types fall back to the hard-coded "types_map" in mimetypes.py anyway.

2) Some that do exist are wrong in the registry (or at the least, different from the built-in "types_map"). This includes .zip, which is "application/x-zip-compressed" (at least in my registry) but should be "application/zip".

3) It's slowish, as it has to load about 6000 registry keys (and more as you install more stuff on your system), but only about 200 of those have the "Content Type" subkey. On my machine (Windows 7, 64 bit CPython) this adds over 100ms to the startup time even on subsequent runs when cached -- and I think 100ms is pretty significant. Issue4969's version takes about 25ms, and reverting this completely would of course take 0ms.

4) Users and other programs can (and sometimes do!) change the Content Type keys in the registry -- whereas one wants mime type mappings to be consistent across systems. This last point is debatable for various reasons, and I think the above three points should carry the day, but I include it here for completeness. ;-)

For these reasons, I think we should revert the fix for Issue4969 and leave Windows users to get the default types_map as before, which is at least consisent -- and for mimetypes.guess_type(), you want consistency.

----------
nosy: +benhoyt

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Dave Chambers
2012-12-07 13:01:40 UTC
Permalink
Dave Chambers added the comment:

Disappointing that "faster but broken" is preferable to "slower but fixed"

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
R. David Murray
2012-12-07 13:33:44 UTC
Permalink
R. David Murray added the comment:

I will note that on unix the user is also free to update the machine's mime types registry (that's more than half the point of the mimetypes module). Usually this is only done by installed software...as I believe is the case on Windows as well.

That said, there should be a way to explicitly bypass this loading of local data for a program that wishes to use only the Python supplied types. And indeed, this is possible: just pass an empty list of filenames to init. This bypasses the windows registry lookup. (Note that this could be better documented...it is not made explicit that an empty list is different from not specifying a list or specifying it as None, but it is).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
R. David Murray
2012-12-07 13:37:00 UTC
Permalink
R. David Murray added the comment:

That said, the fact that windows is just *wrong* about some mimetypes is definitely an issue. We could call it a platform bug, but that would be a disservice to the user community.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Dave Chambers
2012-12-07 13:46:06 UTC
Permalink
Dave Chambers added the comment:

Seems to me that some hybrid would be a good solution: Hardcode the known types (which solves the "windows is just wrong" case) then as a default look in the registry for those that aren't hardcoded.
Therefore the hit of additional time would only be for lesser-known types.
In any case, it's pretty bad that python allows the wrong mimetype for PNG , even if it is a Windows registry issue.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
R. David Murray
2012-12-07 14:51:13 UTC
Permalink
R. David Murray added the comment:

To be consistent with the overall philosophy of the mimetypes module, it should be instead a list of "windows fixes" which are applied if the broken mimetype is found in the windows registry.

If you want to avoid the overhead, pass an empty list to init. A note about the overhead and fixes should be added to the docs.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Ben Hoyt
2012-12-10 03:18:31 UTC
Permalink
Ben Hoyt added the comment:

Either way -- this needs to be reverted or fixed. It's a nasty gotcha for folks writing Python web services at the moment. I'm still for reverting, per my reasons above.

Dave Chambers, I'm not for "faster but broken" but for "faster and fixed" -- from what I've shown above, it's the Windows registry that's broken, so removing read_windows_registry() entirely would fix this (and as a bonus, be faster and simplify the code :-).

Per your suggestion http://bugs.python.org/issue15207#msg177092 -- I don't understand how mimetypes.py would know the types "that aren't hardcoded".

R. David Murray, I don't understand the advantage of trying to maintain a list of "Windows fixes". What if this list was wrong, or there was a Windows update which broke more mime types? Why can't we just avoid the complication and go back to the hardcoded types for Windows?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Dave Chambers
2012-12-10 03:31:56 UTC
Permalink
Post by Ben Hoyt
removing read_windows_registry()
If you're suggesting hardcoding *ALL* the mimetypes for *ALL* OSes, I think that's probably the best overall solution.
No variability, as fast as can be.
The downside is that there would occasionally be an unrecognized type, thus there'd need to be diligence to keep the hardcoded list up to date, but overall I think Ben Hoyt's suggestion is best.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Ben Hoyt
2012-12-10 03:35:24 UTC
Permalink
Ben Hoyt added the comment:

Actually, I was suggesting using the hardcoded types for Windows only (i.e., only removing read_windows_registry). Several bugs have been opened on problems with the Windows registry mimetypes, but as far as I know this isn't an issue on Linux -- in other words, if Linux/other systems ain't broke, no need to fix them.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
R. David Murray
2012-12-10 12:19:47 UTC
Permalink
R. David Murray added the comment:

I'm personally OK with the option of removing the registry support (or making it optional-by-default), but I'm not going to make that call, we need a windows dev opinion.

Maintaining the list of windows exceptions shouldn't be much worse than maintaining the list of mime types. I can't imagine that Microsoft changes it all that often, given that you say they haven't bothered to update the zip type yet.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Dave Chambers
2012-12-10 13:09:14 UTC
Permalink
Dave Chambers added the comment:

(I'm a windows dev type)
I would say that there are 2 issues with relying on the registry:
1) Default values (ie. set by Windows upon OS install) are broken and MS never fixes them.
2) The values can be changed at any time, by any app. Thus the values are unreliable.

If I were to code it from scratch today, I'd create a three-pronged approach:
a) Hardcode a list of known types (fast & reliable).
b) Have a default case where unknown types are pulled from the registry. Whatever value is retrieved is likely better than returning e.g. "application/octet-stream".
c) When we neither find it in hardcoded list or in the registry, return a default value (e.g. "application/octet-stream")

For what it's worth, my workaround will be to have my app delete the
HKCR\MIME\Database\Content Type\image/x-png regkey, thus forcing the original braindead mimetypes.py code to use HKCR\MIME\Database\Content Type\image/png

And, for what it's worth, my patch is actually faster than the current mimetypes.py code because I'm not doing reverse lookups. Thus any argument about a difference in speed is moot. Arguments about the speed of pulling mimetypes from registry are valid.

Another registry based approach would be to build a dictionary of mimetypes on demand. In this scenario, at startup, the dictionary would be empty. When python needs the mimetype for ".png", on the 1st request it would cause a "slow" registry lookup for only that type but on all subsequent requests for the type it would use the "fast" value from the dictionary.
Given that an app will probably use only a handful of mimetypes but will use that same handful over and over, such a solution would have the benefits of (a) not using hardcoded values (thus no ongoing maintenance), (b) performing slow stuff only on demand, (c) optimizing repeat calls, and (d) consuming zero startup time.

I'll code his up & run some timing tests if anyone thinks it's worthwhile.

BTW, who makes the final determination as to if/when any such changes would be incorporated?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
R. David Murray
2012-12-10 13:35:12 UTC
Permalink
R. David Murray added the comment:

I would say Brian Curtin, Tim Golden, and/or Martin von L?wis, as
they are the currently active committers with significant Windows expertise. Other committers may have opinions as well. If you don't get an answer here in a reasonable amount of time, please post a discussion of the issue to python-dev (it may end up there anyway).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Terry J. Reedy
2012-12-10 20:09:48 UTC
Permalink
Terry J. Reedy added the comment:

Gabriel and Antoine: As I understand it, the claim in this issue is that the patch in #4969 (G. wrote, A. committed) is unsatisfactory. I think it would help if either of you commented.

----------
nosy: +gagenellina, pitrou

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Antoine Pitrou
2012-12-10 20:21:17 UTC
Permalink
Antoine Pitrou added the comment:

I'll leave it to a Windows expert.

----------
versions: -Python 2.7, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Antoine Pitrou
2012-12-10 20:21:26 UTC
Permalink
Changes by Antoine Pitrou <pitrou at free.fr>:


----------
stage: commit review -> patch review
versions: +Python 2.7, Python 3.3, Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Tim Golden
2012-12-10 20:22:55 UTC
Permalink
Tim Golden added the comment:

Sorry; late to the party. I'll try to take a look at the patches.
Basically I'm sympathetic to the problem (which seems quite
straightforwardly buggish) but I want to take a look around the issue first.

TJG

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Ben Hoyt
2013-01-31 03:36:42 UTC
Permalink
Ben Hoyt added the comment:

Any update on this, Tim or other Windows developers?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Brian Curtin
2013-01-31 03:45:51 UTC
Permalink
Brian Curtin added the comment:

I can't comment on what the change should be or how it should be done as I don't do anything with mimetypes, but nothing about how the patch was written jumps out at me for being incorrect (except I would not include ishimoto's name changes).

If there's a consensus that this is the appropriate change to be made, the patch still needs tests.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Tim Golden
2013-04-17 12:05:19 UTC
Permalink
Tim Golden added the comment:

Attached is a q&d script to produce the list of extension -> mimetype maps for a version of the mimetypes module.

----------
Added file: http://bugs.python.org/file29900/mt.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Tim Golden
2013-04-17 12:07:14 UTC
Permalink
Tim Golden added the comment:

Three outputs produced by mt.py: tip as-is; tip without registry; tip
with new approach to registry. The results for 2.7 are near-enough
identical. Likewise the results for an elevated prompt.

----------
Added file: http://bugs.python.org/file29901/mt-tip.txt
Added file: http://bugs.python.org/file29902/mt-tip-newregistry.txt
Added file: http://bugs.python.org/file29903/mt-tip-noregistry.txt

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
-------------- next part --------------
.jpg => image/jpg
.mid => audio/midi
.midi => audio/midi
.pct => image/pict
.pic => image/pict
.pict => image/pict
.rtf => application/rtf
.xul => text/xul
.3g2 => video/3gpp2
.3gp => video/3gpp
.AMR => audio/AMR
.a => application/octet-stream
.aac => audio/x-aac
.ac3 => audio/x-ac3
.acrobatsecuritysettings => application/vnd.adobe.acrobat-security-settings
.adts => audio/vnd.dlna.adts
.ai => application/postscript
.aif => audio/x-aiff
.aifc => audio/x-aiff
.aiff => audio/x-aiff
.amc => application/x-mpeg
.application => application/x-ms-application
.asx => video/x-ms-asf-plugin
.au => audio/basic
.avi => video/x-msvideo
.bat => text/plain
.bcpio => application/x-bcpio
.bin => application/octet-stream
.bmp => image/bmp
.c => text/plain
.c2r => text/vnd-ms.click2record+xml
.caf => audio/x-caf
.cat => vnd.ms-pki.seccat
.cdf => application/x-netcdf
.cer => x-x509-ca-cert
.contact => text/x-ms-contact
.cpio => application/x-cpio
.crl => pkix-crl
.csh => application/x-csh
.css => text/css
.dir => application/x-director
.dll => application/octet-stream
.doc => application/msword
.dot => application/msword
.dvi => application/x-dvi
.dwfx => model/vnd.dwfx+xps
.easmx => model/vnd.easmx+xps
.edrwx => model/vnd.edrwx+xps
.eml => message/rfc822
.eprtx => model/vnd.eprtx+xps
.eps => application/postscript
.etx => text/x-setext
.exe => application/octet-stream
.fdf => application/vnd.fdf
.fif => application/fractals
.flc => video/flc
.gif => image/gif
.gsm => audio/x-gsm
.gtar => application/x-gtar
.gz => application/x-gzip
.h => text/plain
.hdf => application/x-hdf
.hqx => application/mac-binhex40
.hta => application/hta
.htc => text/x-component
.htm => text/html
.html => text/html
.ico => image/x-icon
.ics => text/calendar
.ief => image/ief
.iqy => text/x-ms-iqy
.jnlp => application/x-java-jnlp-file
.jp2 => image/x-jpeg2000-image
.jpe => image/jpeg
.jpeg => image/jpeg
.jpg => image/pjpeg
.js => application/javascript
.jtx => application/x-jtx+xps
.ksh => text/plain
.latex => application/x-latex
.m1v => video/mpeg
.m3u => audio/x-mpegurl
.m3u8 => application/vnd.apple.mpegurl
.m4a => audio/x-m4a
.m4b => audio/x-m4b
.m4p => audio/x-m4p
.m4v => video/x-m4v
.man => application/x-troff-man
.mdi => image/vnd.ms-modi
.me => application/x-troff-me
.mht => message/rfc822
.mhtml => message/rfc822
.mid => midi/mid
.mif => application/x-mif
.mov => video/quicktime
.movie => video/x-sgi-movie
.mp2 => audio/mpeg
.mp3 => audio/x-mpg
.mp4 => video/mp4
.mpa => video/mpeg
.mpe => video/mpeg
.mpeg => video/x-mpeg2a
.mpf => application/vnd.ms-mediapackage
.mpg => video/mpeg
.ms => application/x-troff-ms
.nc => application/x-netcdf
.nix => application/x-mix-transfer
.nws => message/rfc822
.o => application/octet-stream
.obj => application/octet-stream
.oda => application/oda
.odc => text/x-ms-odc
.osdx => application/opensearchdescription+xml
.p10 => pkcs10
.p12 => x-pkcs12
.p7b => x-pkcs7-certificates
.p7c => application/pkcs7-mime
.p7m => pkcs7-mime
.p7r => x-pkcs7-certreqresp
.p7s => pkcs7-signature
.pbm => image/x-portable-bitmap
.pdf => application/pdf
.pdfxml => application/vnd.adobe.pdfxml
.pdx => application/vnd.adobe.pdx
.pfx => application/x-pkcs12
.pgm => image/x-portable-graymap
.pict => image/x-pict
.pko => vnd.ms-pki.pko
.pl => text/plain
.pls => audio/x-scpls
.png => image/x-png
.pnm => image/x-portable-anymap
.pntg => image/x-macpaint
.pot => application/vnd.ms-powerpoint
.ppa => application/vnd.ms-powerpoint
.ppm => image/x-portable-pixmap
.pps => application/vnd.ms-powerpoint
.ppt => application/x-mspowerpoint
.ps => application/postscript
.pwz => application/vnd.ms-powerpoint
.py => text/x-python
.pyc => application/x-python-code
.pyo => application/x-python-code
.qcp => audio/vnd.qcelp
.qt => video/quicktime
.qtif => image/x-quicktime
.qtl => application/x-quicktimeplayer
.ra => audio/x-pn-realaudio
.ram => application/x-pn-realaudio
.ras => image/x-cmu-raster
.rdf => application/xml
.rels => application/vnd.ms-package.relationships+xml
.rgb => image/x-rgb
.roff => application/x-troff
.rqy => text/x-ms-rqy
.rtsp => application/x-rtsp
.rtx => text/richtext
.sdp => application/x-sdp
.sdv => video/sd-video
.sgi => image/x-sgi
.sgm => text/x-sgml
.sgml => text/x-sgml
.sh => application/x-sh
.shar => application/x-shar
.sit => application/x-stuffit
.slupkg-ms => application/x-ms-license
.snd => audio/basic
.so => application/octet-stream
.spl => application/futuresplash
.src => application/x-wais-source
.sst => vnd.ms-pki.certstore
.stl => vnd.ms-pki.stl
.sv4cpio => application/x-sv4cpio
.sv4crc => application/x-sv4crc
.svg => image/svg+xml
.swf => application/x-shockwave-flash
.t => application/x-troff
.tar => application/x-tar
.targa => image/x-targa
.tcl => application/x-tcl
.tex => application/x-tex
.texi => application/x-texinfo
.texinfo => application/x-texinfo
.tgz => application/x-compressed
.tif => image/tiff
.tiff => image/tiff
.tr => application/x-troff
.tsv => text/tab-separated-values
.tts => video/vnd.dlna.mpeg-tts
.txt => text/plain
.ustar => application/x-ustar
.vcf => text/x-vcard
.vdx => application/vnd.ms-visio.viewer
.vsd => application/vnd.visio
.vsi => application/ms-vsi
.vsto => application/x-ms-vsto
.wal => interface/x-winamp3-skin
.wav => audio/x-wav
.wax => audio/x-ms-wax
.wiz => application/msword
.wlz => interface/x-winamp-lang
.wm => video/x-ms-wm
.wma => audio/x-ms-wma
.wmd => application/x-ms-wmd
.wmv => video/x-ms-wmv
.wmx => video/x-ms-wmx
.wmz => application/x-ms-wmz
.wpl => application/vnd.ms-wpl
.wsc => text/scriptlet
.wsdl => application/xml
.wsz => interface/x-winamp-skin
.wvx => video/x-ms-wvx
.xaml => application/xaml+xml
.xbap => application/x-ms-xbap
.xbm => image/x-xbitmap
.xdp => application/vnd.adobe.xdp+xml
.xfd => application/vnd.adobe.xfd+xml
.xfdf => application/vnd.adobe.xfdf
.xlb => application/vnd.ms-excel
.xls => application/x-msexcel
.xml => text/xml
.xpdl => application/xml
.xpm => image/x-xpixmap
.xps => application/vnd.ms-xpsdocument
.xsl => application/xml
.xwd => image/x-xwindowdump
.z => application/x-compress
.zip => application/x-zip-compressed
-------------- next part --------------
.jpg => image/jpg
.mid => audio/midi
.midi => audio/midi
.pct => image/pict
.pic => image/pict
.pict => image/pict
.rtf => application/rtf
.xul => text/xul
.3g2 => video/3gpp2
.3gp => video/3gpp
.3gp2 => video/3gpp2
.3gpp => video/3gpp
.AAC => audio/aac
.ADT => audio/vnd.dlna.adts
.ADTS => audio/aac
.AddIn => text/xml
.M2T => video/vnd.dlna.mpeg-tts
.M2TS => video/vnd.dlna.mpeg-tts
.M2V => video/mpeg
.MOD => video/mpeg
.MTS => video/vnd.dlna.mpeg-tts
.SDP => application/sdp
.SSISDeploymentManifest => text/xml
.TS => video/vnd.dlna.mpeg-tts
.TTS => video/vnd.dlna.mpeg-tts
.WMD => application/x-ms-wmd
.XOML => text/plain
.a => application/octet-stream
.ac3 => audio/ac3
.acrobatsecuritysettings => application/vnd.adobe.acrobat-security-settings
.ai => application/postscript
.aif => audio/aiff
.aifc => audio/aiff
.aiff => audio/aiff
.air => application/vnd.adobe.air-application-installer-package+zip
.amc => application/x-mpeg
.application => application/x-ms-application
.asa => application/xml
.asax => application/xml
.ascx => application/xml
.asf => video/x-ms-asf
.ashx => application/xml
.asm => text/plain
.asmx => application/xml
.asp => text/x-asp
.aspx => application/xml
.asx => video/x-ms-asf
.au => audio/basic
.avi => video/avi
.bat => text/plain
.bcpio => application/x-bcpio
.bflang2 => application/x-bluefish-language2
.bfproject => application/x-bluefish-project
.bin => application/octet-stream
.bmp => image/bmp
.c => text/plain
.caf => audio/x-caf
.cat => application/vnd.ms-pki.seccat
.cc => text/plain
.cd => text/plain
.cdda => audio/aiff
.cdf => application/x-netcdf
.cer => application/x-x509-ca-cert
.cod => text/plain
.config => application/xml
.contact => text/x-ms-contact
.coverage => application/xml
.cpio => application/x-cpio
.cpp => text/plain
.crl => application/pkix-crl
.crt => application/x-x509-ca-cert
.cs => text/plain
.csdproj => text/plain
.csh => application/x-csh
.csproj => text/plain
.css => text/css
.csv => application/vnd.ms-excel
.cur => text/plain
.cxx => text/plain
.d => text/x-dsrc
.datasource => application/xml
.dbproj => text/plain
.dcr => application/x-director
.def => text/plain
.der => application/x-x509-ca-cert
.dib => image/bmp
.dif => video/x-dv
.dir => application/x-director
.dll => application/x-msdownload
.doc => application/msword
.docm => application/vnd.ms-word.document.macroEnabled.12
.docx => application/vnd.openxmlformats-officedocument.wordprocessingml.document
.dot => application/msword
.dsp => text/plain
.dsw => text/plain
.dtd => application/xml-dtd
.dtsConfig => text/xml
.dv => video/x-dv
.dvi => application/x-dvi
.dwfx => model/vnd.dwfx+xps
.dxr => application/x-director
.easmx => model/vnd.easmx+xps
.edrwx => model/vnd.edrwx+xps
.eml => message/rfc822
.eprtx => model/vnd.eprtx+xps
.eps => application/postscript
.etl => application/etl
.etx => text/x-setext
.exe => application/x-msdownload
.fdf => application/vnd.fdf
.fif => application/fractals
.filters => Application/xml
.gif => image/gif
.gitattributes => text/plain
.gitignore => text/plain
.gitmodules => text/plain
.group => text/x-ms-group
.gsm => audio/x-gsm
.gtar => application/x-gtar
.gz => application/x-gzip
.h => text/plain
.hdf => application/x-hdf
.hpp => text/plain
.hqx => application/mac-binhex40
.hta => application/hta
.htc => text/x-component
.htm => text/html
.html => text/html
.hxa => application/xml
.hxc => application/xml
.hxd => application/octet-stream
.hxe => application/xml
.hxf => application/xml
.hxh => application/octet-stream
.hxi => application/octet-stream
.hxk => application/xml
.hxq => application/octet-stream
.hxr => application/octet-stream
.hxs => application/octet-stream
.hxt => application/xml
.hxv => application/xml
.hxw => application/octet-stream
.hxx => text/plain
.i => text/plain
.ico => image/x-icon
.ics => text/calendar
.idl => text/plain
.ief => image/ief
.inc => text/plain
.inl => text/plain
.ipproj => text/plain
.iqy => text/x-ms-iqy
.java => text/x-java
.jfif => image/jpeg
.jnlp => application/x-java-jnlp-file
.jpe => image/jpeg
.jpeg => image/jpeg
.jpg => image/jpeg
.js => application/javascript
.jsp => application/x-jsp
.jtx => application/x-jtx+xps
.ksh => text/plain
.latex => application/x-latex
.library-ms => application/windows-library+xml
.lst => text/plain
.m1v => video/mpeg
.m3u => audio/mpegurl
.m3u8 => application/vnd.apple.mpegurl
.m4a => audio/x-m4a
.m4b => audio/x-m4b
.m4p => audio/x-m4p
.m4v => video/x-m4v
.mac => image/x-macpaint
.mak => text/plain
.man => application/x-troff-man
.map => text/plain
.master => application/xml
.mdi => image/vnd.ms-modi
.mdp => text/plain
.me => application/x-troff-me
.mfp => application/x-shockwave-flash
.mht => message/rfc822
.mhtml => message/rfc822
.mid => audio/mid
.midi => audio/mid
.mif => application/x-mif
.mk => text/plain
.mov => video/quicktime
.movie => video/x-sgi-movie
.mp2 => audio/mpeg
.mp2v => video/mpeg
.mp3 => audio/mpeg
.mp4 => video/mp4
.mp4v => video/mp4
.mpa => video/mpeg
.mpe => video/mpeg
.mpeg => video/mpeg
.mpf => application/vnd.ms-mediapackage
.mpg => video/mpeg
.mpv2 => video/mpeg
.mqv => video/quicktime
.ms => application/x-troff-ms
.mw => text/x-mediawiki
.nc => application/x-netcdf
.nsh => text/x-nsh
.nsi => text/x-nsi
.nws => message/rfc822
.o => application/octet-stream
.obj => application/octet-stream
.oda => application/oda
.odc => text/x-ms-odc
.odh => text/plain
.odl => text/plain
.orderedtest => application/xml
.osdx => application/opensearchdescription+xml
.p10 => application/pkcs10
.p12 => application/x-pkcs12
.p7b => application/x-pkcs7-certificates
.p7c => application/pkcs7-mime
.p7m => application/pkcs7-mime
.p7r => application/x-pkcs7-certreqresp
.p7s => application/pkcs7-signature
.pbm => image/x-portable-bitmap
.pct => image/pict
.pdf => application/pdf
.pdfxml => application/vnd.adobe.pdfxml
.pdx => application/vnd.adobe.pdx
.pfx => application/x-pkcs12
.pgm => image/x-portable-graymap
.php => application/x-php
.php3 => application/x-php
.pic => image/pict
.pict => image/pict
.pkgdef => text/plain
.pkgundef => text/plain
.pko => application/vnd.ms-pki.pko
.pl => application/x-perl
.png => image/png
.pnm => image/x-portable-anymap
.pnt => image/x-macpaint
.pntg => image/x-macpaint
.po => text/x-gettext-translation
.pot => application/vnd.ms-powerpoint
.potm => application/vnd.ms-powerpoint.template.macroEnabled.12
.potx => application/vnd.openxmlformats-officedocument.presentationml.template
.ppa => application/vnd.ms-powerpoint
.ppm => image/x-portable-pixmap
.pps => application/vnd.ms-powerpoint
.ppsm => application/vnd.ms-powerpoint.slideshow.macroEnabled.12
.ppsx => application/vnd.openxmlformats-officedocument.presentationml.slideshow
.ppt => application/vnd.ms-powerpoint
.pptm => application/vnd.ms-powerpoint.presentation.macroEnabled.12
.pptx => application/vnd.openxmlformats-officedocument.presentationml.presentation
.ps => application/postscript
.psc1 => application/PowerShell
.pwz => application/vnd.ms-powerpoint
.py => text/x-python
.pyc => application/x-python-code
.pyo => application/x-python-code
.pyproj => text/plain
.pyw => text/plain
.qht => text/x-html-insertion
.qhtm => text/x-html-insertion
.qt => video/quicktime
.qti => image/x-quicktime
.qtif => image/x-quicktime
.qtl => application/x-quicktimeplayer
.ra => audio/x-pn-realaudio
.ram => application/x-pn-realaudio
.ras => image/x-cmu-raster
.rat => application/rat-file
.rb => text/x-ruby
.rc => text/plain
.rc2 => text/plain
.rct => text/plain
.rdf => application/xml
.rdlc => application/xml
.resx => application/xml
.rgb => image/x-rgb
.rgs => text/plain
.rmi => audio/mid
.roff => application/x-troff
.rpt => application/x-rpt
.rqy => text/x-ms-rqy
.rtf => application/msword
.rtx => text/richtext
.ruleset => application/xml
.s => text/plain
.scp => application/sqlcompare
.sct => text/scriptlet
.sd2 => audio/x-sd2
.searchConnector-ms => application/windows-search-connector+xml
.settings => application/xml
.sgm => text/x-sgml
.sgml => text/x-sgml
.sh => text/x-shellscript
.shar => application/x-shar
.shtml => text/html
.sit => application/x-stuffit
.sitemap => application/xml
.skin => application/xml
.slk => application/vnd.ms-excel
.sln => text/plain
.slupkg-ms => application/x-ms-license
.snd => audio/basic
.snippet => application/xml
.so => application/octet-stream
.sol => text/plain
.sor => text/plain
.spc => application/x-pkcs7-certificates
.spl => application/futuresplash
.src => application/x-wais-source
.srf => text/plain
.sst => application/vnd.ms-pki.certstore
.stl => application/vnd.ms-pki.stl
.sv4cpio => application/x-sv4cpio
.sv4crc => application/x-sv4crc
.svc => application/xml
.svg => image/svg+xml
.swf => application/x-shockwave-flash
.t => application/x-troff
.tar => application/x-tar
.tcl => application/x-tcl
.testrunconfig => application/xml
.testsettings => application/xml
.tex => application/x-tex
.texi => application/x-texinfo
.texinfo => application/x-texinfo
.tgz => application/x-compressed
.tif => image/tiff
.tiff => image/tiff
.tlh => text/plain
.tli => text/plain
.tpl => application/x-smarty
.tr => application/x-troff
.trx => application/xml
.tsv => text/tab-separated-values
.txt => text/plain
.user => text/plain
.ustar => application/x-ustar
.vb => text/plain
.vbdproj => text/plain
.vbproj => text/plain
.vbs => application/x-vbscript
.vcf => text/x-vcard
.vcproj => Application/xml
.vcxproj => Application/xml
.vddproj => text/plain
.vdp => text/plain
.vdproj => text/plain
.vdx => application/vnd.visio
.vscontent => application/xml
.vsct => text/xml
.vsd => application/vnd.visio
.vsi => application/ms-vsi
.vsix => application/vsix
.vsixlangpack => text/xml
.vsixmanifest => text/xml
.vsl => application/vnd.visio
.vsmdi => application/xml
.vspscc => text/plain
.vss => application/vnd.visio
.vsscc => text/plain
.vssettings => text/xml
.vssscc => text/plain
.vst => application/vnd.visio
.vstemplate => text/xml
.vsto => application/x-ms-vsto
.vsu => application/vnd.visio
.vsw => application/vnd.visio
.vsx => application/vnd.visio
.vtx => application/vnd.visio
.wal => interface/x-winamp3-skin
.wav => audio/wav
.wax => audio/x-ms-wax
.wbk => application/msword
.wdp => image/vnd.ms-photo
.wdseml => message/rfc822
.wiq => application/xml
.wiz => application/msword
.wlz => interface/x-winamp-lang
.wm => video/x-ms-wm
.wma => audio/x-ms-wma
.wmv => video/x-ms-wmv
.wmx => video/x-ms-wmx
.wmz => application/x-ms-wmz
.wpl => application/vnd.ms-wpl
.wsc => text/scriptlet
.wsdl => application/xml
.wsz => interface/x-winamp-skin
.wvx => video/x-ms-wvx
.xaml => application/xaml+xml
.xbap => application/x-ms-xbap
.xbm => image/x-xbitmap
.xdp => application/vnd.adobe.xdp+xml
.xdr => application/xml
.xfdf => application/vnd.adobe.xfdf
.xht => application/xhtml+xml
.xhtml => application/xhtml+xml
.xla => application/vnd.ms-excel
.xlam => application/vnd.ms-excel.addin.macroEnabled.12
.xlb => application/vnd.ms-excel
.xlc => application/vnd.ms-excel
.xld => application/vnd.ms-excel
.xlk => application/vnd.ms-excel
.xll => application/vnd.ms-excel
.xlm => application/vnd.ms-excel
.xls => application/vnd.ms-excel
.xlsb => application/vnd.ms-excel.sheet.binary.macroEnabled.12
.xlsm => application/vnd.ms-excel.sheet.macroEnabled.12
.xlsx => application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.xlt => application/vnd.ms-excel
.xltm => application/vnd.ms-excel.template.macroEnabled.12
.xltx => application/vnd.openxmlformats-officedocument.spreadsheetml.template
.xlv => application/vnd.ms-excel
.xlw => application/vnd.ms-excel
.xml => text/xml
.xmta => application/xml
.xpdl => application/xml
.xpm => image/x-xpixmap
.xps => application/vnd.ms-xpsdocument
.xrm-ms => text/xml
.xsc => application/xml
.xsd => application/xml
.xsl => application/xslt+xml
.xslt => application/xml
.xss => application/xml
.xwd => image/x-xwindowdump
.z => application/x-compress
.zip => application/x-zip-compressed
-------------- next part --------------
.jpg => image/jpg
.mid => audio/midi
.midi => audio/midi
.pct => image/pict
.pic => image/pict
.pict => image/pict
.rtf => application/rtf
.xul => text/xul
.a => application/octet-stream
.ai => application/postscript
.aif => audio/x-aiff
.aifc => audio/x-aiff
.aiff => audio/x-aiff
.au => audio/basic
.avi => video/x-msvideo
.bat => text/plain
.bcpio => application/x-bcpio
.bin => application/octet-stream
.bmp => image/x-ms-bmp
.c => text/plain
.cdf => application/x-netcdf
.cpio => application/x-cpio
.csh => application/x-csh
.css => text/css
.dll => application/octet-stream
.doc => application/msword
.dot => application/msword
.dvi => application/x-dvi
.eml => message/rfc822
.eps => application/postscript
.etx => text/x-setext
.exe => application/octet-stream
.gif => image/gif
.gtar => application/x-gtar
.h => text/plain
.hdf => application/x-hdf
.htm => text/html
.html => text/html
.ico => image/vnd.microsoft.icon
.ief => image/ief
.jpe => image/jpeg
.jpeg => image/jpeg
.jpg => image/jpeg
.js => application/javascript
.ksh => text/plain
.latex => application/x-latex
.m1v => video/mpeg
.m3u => application/vnd.apple.mpegurl
.m3u8 => application/vnd.apple.mpegurl
.man => application/x-troff-man
.me => application/x-troff-me
.mht => message/rfc822
.mhtml => message/rfc822
.mif => application/x-mif
.mov => video/quicktime
.movie => video/x-sgi-movie
.mp2 => audio/mpeg
.mp3 => audio/mpeg
.mp4 => video/mp4
.mpa => video/mpeg
.mpe => video/mpeg
.mpeg => video/mpeg
.mpg => video/mpeg
.ms => application/x-troff-ms
.nc => application/x-netcdf
.nws => message/rfc822
.o => application/octet-stream
.obj => application/octet-stream
.oda => application/oda
.p12 => application/x-pkcs12
.p7c => application/pkcs7-mime
.pbm => image/x-portable-bitmap
.pdf => application/pdf
.pfx => application/x-pkcs12
.pgm => image/x-portable-graymap
.pl => text/plain
.png => image/png
.pnm => image/x-portable-anymap
.pot => application/vnd.ms-powerpoint
.ppa => application/vnd.ms-powerpoint
.ppm => image/x-portable-pixmap
.pps => application/vnd.ms-powerpoint
.ppt => application/vnd.ms-powerpoint
.ps => application/postscript
.pwz => application/vnd.ms-powerpoint
.py => text/x-python
.pyc => application/x-python-code
.pyo => application/x-python-code
.qt => video/quicktime
.ra => audio/x-pn-realaudio
.ram => application/x-pn-realaudio
.ras => image/x-cmu-raster
.rdf => application/xml
.rgb => image/x-rgb
.roff => application/x-troff
.rtx => text/richtext
.sgm => text/x-sgml
.sgml => text/x-sgml
.sh => application/x-sh
.shar => application/x-shar
.snd => audio/basic
.so => application/octet-stream
.src => application/x-wais-source
.sv4cpio => application/x-sv4cpio
.sv4crc => application/x-sv4crc
.svg => image/svg+xml
.swf => application/x-shockwave-flash
.t => application/x-troff
.tar => application/x-tar
.tcl => application/x-tcl
.tex => application/x-tex
.texi => application/x-texinfo
.texinfo => application/x-texinfo
.tif => image/tiff
.tiff => image/tiff
.tr => application/x-troff
.tsv => text/tab-separated-values
.txt => text/plain
.ustar => application/x-ustar
.vcf => text/x-vcard
.wav => audio/x-wav
.wiz => application/msword
.wsdl => application/xml
.xbm => image/x-xbitmap
.xlb => application/vnd.ms-excel
.xls => application/vnd.ms-excel
.xml => text/xml
.xpdl => application/xml
.xpm => image/x-xpixmap
.xsl => application/xml
.xwd => image/x-xwindowdump
.zip => application/zip
Tim Golden
2013-04-17 12:23:36 UTC
Permalink
Tim Golden added the comment:

There seems to be a consensus that the current behaviour is undesirable,
indeed "broken" for any meaningful use.

The critical argument against the current Registry approach is that it
returns unexpected (or outright incorrect) mimetypes for very standard
extensions.

The arguments against reading the Registry at all are:

* That it requires some extra level of privilege to read the appropriate
keys.

* That there is a startup cost to reading the Registry

* That it can be and is updated by arbitrary programs (typically during
installation) and therefore its values cannot be relied upon.


We have 3.5 proposals on the table:

1) Don't read the registry at all, ie revert issue4969 (this is what Ben
Hoyt is advocating) [noregistry]

2) Read the registry *before* reading the standard types (this is not
strongly advocated by anyone).

3) Read the registry but in a different way, mapping from extension to
mimetype rather than vice versa. (This is Dave Chambers' patch from
issue15207). [newregistry]

3a) Lookup as per (3) but only on demand. This eliminates any startup cost.

I've produced three output files from a simple dump of the mimetypes database. For the purposes of taking this forward, we're really comparing the noregistry and the newregistry variants.

One key issue is what to do when the same key occurs in both sets but with a different value. (Examples include .avi -> video/x-msvideo vs video/avi; and .zip -> application/zip vs application/x-zip-compressed).

And the other key issue is whether the overheads (security, speed) of using the registry outweigh its usefulness in any case.

Could I ask those who would remove the registry use altogether to comment on the newregistry output (generating your own if it helps) to see whether it changes your views at all.

Either approach -- no-registry or new-registry -- feasible and the code churn is about equal. I'm unsure about compatibility issues: it's hard to see anyone relying on the incorrect mimetypes; but it's certainly possible to see someone relying on the additional (correct) mimetypes.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Marc-Andre Lemburg
2013-04-17 12:45:30 UTC
Permalink
Marc-Andre Lemburg added the comment:

I think it's important to stick to established standards for
MIME types and to make sure that Python returns the same values
on all platforms using the default settings.

Apache comes with a mime.types file which includes both the
official IANA types and many common, but unregistered types:

http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/conf/mime.types?view=markup

This can be used as reference (much like we use the X.org locale
database as reference for the locale module).

If an application wants to use the Windows registry settings instead,
that's fine, but it should not be the default if there's a difference
in output compared to the hard-coded list in mimetypes.

Note that this would probably require a redesign of the
internals of the mimetypes module. It currently provides only a
small subset as default mapping and then reads the full set from
one of the mime.types files it can find on the system.
Such a redesign would only be possible for Python 3.4, not
Python 2.7.

----------
nosy: +lemburg

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Dave Chambers
2013-04-17 14:29:28 UTC
Permalink
Dave Chambers added the comment:

Enough with the bikeshedding... it's been 10 months... fix the bug.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Brian Curtin
2013-04-17 14:43:27 UTC
Permalink
Brian Curtin added the comment:

Just an FYI, but if it takes 10 more months to get it right, we'll do that.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Ben Hoyt
2013-04-18 00:27:55 UTC
Permalink
Ben Hoyt added the comment:

Okay, I'm looking at the diff between mt-tip-noregistry.txt and mt-tip-newregistry.txt, and I've attached a file showing the lines that are *different* between the two, as well as the Apache mime.types value for that file extension.

In most cases, noregistry gives the right mime type, and newregistry is wrong. However, in a few cases, the registry value is right (i.e., according to Apache's mime.types). However, I think that's a totally separate issue, and maybe we should probably open a bug to update a few of the hard-coded mappings in mimetypes.py.

The cases where noregistry is right (according to Apache):

* .aif
* .aifc
* .aiff
* .avi
* .sh
* .wav
* .xsl
*. zip

The cases where noregistry is wrong (according to Apache):

* .bmp is hard-coded as "image/x-ms-bmp", but it should be image/bmp
* .dll and .exe are hard-coded as "application/octet-stream", but should be "application/x-msdownload"
* .ico is hard-coded as "image/vnd.microsoft.icon" but should be "image/x-icon"
* .m3u is hard-coded as "application/vnd.apple.mpegurl" but should be "audio/x-mpegurl"

None of these are standardized IANA mime types, and they're not particularly common for web servers to be serving, so it probably doesn't matter too much that the current hard-coded values are wrong. Also, I'm guessing web browsers will interpret the older type image/x-ms-bmp as image/bmp anyway. So maybe we should open another issue to fix the hard-coded types in mimetypes.py shown above, but again, that's another issue.

The other thing here is all the *new types* that the registry adds, such as ".acrobatsecuritysettings". I don't see that these add much value -- just a bunch of types that depend on the programs you have installed. And in my mind at least, the behaviour of mimetypes.guess_type() should not change based on what programs you have installed.

In short, "noregistry" gives more accurate results in most cases that matter, and I still strongly feel we should revert to that. (The only alternative, in my opinion, is to switch to Dave Chambers' version of read_windows_registry(), but not call it by default.)

----------
Added file: http://bugs.python.org/file29913/different.txt

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Tim Golden
2013-08-10 16:13:12 UTC
Permalink
Tim Golden added the comment:

I attach a patch against 3.3; this is substantially Dave Chambers' original patch with a trivial test added and a doc change. This means that HKCR is scanned to determine extensions and these will override anything in the mimetypes db. The doc change highlights the possibility of overriding this by passing files=[].

I can't see an easy solution for this which will suit everyone but I've sat on it waay too long already. The module startup time is increased but, for bugfix releases, I can't see any other solution which won't break compatibility somewhere.

I'm taking the simplest view which says that: .jpg => image/pjpeg is broken but that the winreg code has been in place for too long to simply back it out altogether.

I'll commit appropriate versions of this within the next day to 2.7, 3.3 and 3.4 unless anyone objects. Please understand: this *is* a compromise; but I don't think there's a perfect solution for this, short of the rethink which mimetypes needs per MALs suggestion or otherwise.

----------
Added file: http://bugs.python.org/file31217/issue15207.33.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Ben Hoyt
2013-08-11 19:43:03 UTC
Permalink
Ben Hoyt added the comment:

Thanks, Tim! Works for me! A couple of code review comments:

1) On 2.7, guess_type(s)[0] is a byte string as usual if the type doesn't exist in the registry, but it's a unicode string if it came from the registry. Seems like it should be a byte string in all cases (the mime type can only be ASCII char). I would say .encode('ascii') and if it raises UnicodeError, ignore that key.

2) Would 'subkeyname[0] == "."' be better as subkeyname.startswith(".")? More idiomatic, and won't bomb out if subkeyname is zero length (though that probably can't happen). Relatedly, "not subkeyname.startswith()" with an early-continue would avoid an indent and is what the rest of the code does.

3) I believe the default_encoding variable is not needed anymore. That was used in the old registry code.

4) Super-minor: "raises EnvironmentError" would be the Pythonic way to say "throws EnvironmentError".

5) Would it be worth adding a test for 'foo.png' as well, as that was another super-common type that was wrong?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Tim Golden
2013-08-12 21:05:35 UTC
Permalink
Tim Golden added the comment:

Thanks for the review, Ben. Updated patches attached.

1 & 3) default_encoding -- Your two points appear to contradict each
other slightly. What's in the updated patches is: 3.x has no encoding
(because everything's unicode end-to-end); 2.7 attempts to apply the
default encoding -- which is probably ascii -- to the extension and the
mimetype and continues on error. I'm not 100% sure about this because it
seems possible if unlikely to have a non-ascii extension / mimetype, but
this seems like the best compromise (and is no worse than what was there
before). Does that seem to fit the bill?

2) subkeyname[0] -- done

4) "throws EnvironmentError" -- done

5) test for .png -- done

----------
Added file: http://bugs.python.org/file31259/issue15207.27.2.patch
Added file: http://bugs.python.org/file31260/issue15207.33.2.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
-------------- next part --------------
diff --git a/Doc/library/mimetypes.rst b/Doc/library/mimetypes.rst
--- a/Doc/library/mimetypes.rst
+++ b/Doc/library/mimetypes.rst
@@ -85,6 +85,9 @@
:const:`knownfiles` takes precedence over those named before it. Calling
:func:`init` repeatedly is allowed.

+ Specifying an empty list for *files* will prevent the system defaults from
+ being applied: only the well-known values will be present from a built-in list.
+
.. versionchanged:: 2.7
Previously, Windows registry settings were ignored.

diff --git a/Lib/mimetypes.py b/Lib/mimetypes.py
--- a/Lib/mimetypes.py
+++ b/Lib/mimetypes.py
@@ -254,23 +254,26 @@
i += 1

default_encoding = sys.getdefaultencoding()
- with _winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT,
- r'MIME\Database\Content Type') as mimedb:
- for ctype in enum_types(mimedb):
+ with _winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT, '') as hkcr:
+ for subkeyname in enum_types(hkcr):
try:
- with _winreg.OpenKey(mimedb, ctype) as key:
- suffix, datatype = _winreg.QueryValueEx(key,
- 'Extension')
+ with _winreg.OpenKey(hkcr, subkeyname) as subkey:
+ # Only check file extensions
+ if not subkeyname.startswith("."):
+ continue
+ # raises EnvironmentError if no 'Content Type' value
+ mimetype, datatype = _winreg.QueryValueEx(
+ subkey, 'Content Type')
+ if datatype != _winreg.REG_SZ:
+ continue
+ try:
+ mimetype = mimetype.encode(default_encoding)
+ subkeyname = subkeyname.encode(default_encoding)
+ except UnicodeEncodeError:
+ continue
+ self.add_type(mimetype, subkeyname, strict)
except EnvironmentError:
continue
- if datatype != _winreg.REG_SZ:
- continue
- try:
- suffix = suffix.encode(default_encoding) # omit in 3.x!
- except UnicodeEncodeError:
- continue
- self.add_type(ctype, suffix, strict)
-

def guess_type(url, strict=True):
"""Guess the type of a file based on its URL.
diff --git a/Lib/test/test_mimetypes.py b/Lib/test/test_mimetypes.py
--- a/Lib/test/test_mimetypes.py
+++ b/Lib/test/test_mimetypes.py
@@ -85,6 +85,8 @@
# Use file types that should *always* exist:
eq = self.assertEqual
eq(self.db.guess_type("foo.txt"), ("text/plain", None))
+ eq(self.db.guess_type("image.jpg"), ("image/jpeg", None))
+ eq(self.db.guess_type("image.png"), ("image/png", None))

def test_main():
test_support.run_unittest(MimeTypesTestCase,
-------------- next part --------------
diff --git a/Doc/library/mimetypes.rst b/Doc/library/mimetypes.rst
--- a/Doc/library/mimetypes.rst
+++ b/Doc/library/mimetypes.rst
@@ -85,6 +85,9 @@
:const:`knownfiles` takes precedence over those named before it. Calling
:func:`init` repeatedly is allowed.

+ Specifying an empty list for *files* will prevent the system defaults from
+ being applied: only the well-known values will be present from a built-in list.
+
.. versionchanged:: 3.2
Previously, Windows registry settings were ignored.

diff --git a/Lib/mimetypes.py b/Lib/mimetypes.py
--- a/Lib/mimetypes.py
+++ b/Lib/mimetypes.py
@@ -249,19 +249,21 @@
yield ctype
i += 1

- with _winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT,
- r'MIME\Database\Content Type') as mimedb:
- for ctype in enum_types(mimedb):
+ with _winreg.OpenKey(_winreg.HKEY_CLASSES_ROOT, '') as hkcr:
+ for subkeyname in enum_types(hkcr):
try:
- with _winreg.OpenKey(mimedb, ctype) as key:
- suffix, datatype = _winreg.QueryValueEx(key,
- 'Extension')
+ with _winreg.OpenKey(hkcr, subkeyname) as subkey:
+ # Only check file extensions
+ if not subkeyname.startswith("."):
+ continue
+ # raises EnvironmentError if no 'Content Type' value
+ mimetype, datatype = _winreg.QueryValueEx(
+ subkey, 'Content Type')
+ if datatype != _winreg.REG_SZ:
+ continue
+ self.add_type(mimetype, subkeyname, strict)
except EnvironmentError:
continue
- if datatype != _winreg.REG_SZ:
- continue
- self.add_type(ctype, suffix, strict)
-

def guess_type(url, strict=True):
"""Guess the type of a file based on its URL.
diff --git a/Lib/test/test_mimetypes.py b/Lib/test/test_mimetypes.py
--- a/Lib/test/test_mimetypes.py
+++ b/Lib/test/test_mimetypes.py
@@ -98,7 +98,8 @@
# Use file types that should *always* exist:
eq = self.assertEqual
eq(self.db.guess_type("foo.txt"), ("text/plain", None))
-
+ eq(self.db.guess_type("image.jpg"), ("image/jpeg", None))
+ eq(self.db.guess_type("image.png"), ("image/png", None))

def test_main():
support.run_unittest(MimeTypesTestCase,
Ben Hoyt
2013-08-12 22:23:39 UTC
Permalink
Ben Hoyt added the comment:

All looks great. I like what you've done with default_encoding now. Thanks, Tim (and Dave for the original report).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Roundup Robot
2013-10-22 19:05:10 UTC
Permalink
Roundup Robot added the comment:

New changeset 95b88273683c by Tim Golden in branch '3.3':
Issue #15207: Fix mimetypes to read from correct area in Windows registry (Original patch by Dave Chambers)
http://hg.python.org/cpython/rev/95b88273683c

New changeset 12bf7fc1ba76 by Tim Golden in branch 'default':
Issue #15207: Fix mimetypes to read from correct area in Windows registry (Original patch by Dave Chambers)
http://hg.python.org/cpython/rev/12bf7fc1ba76

----------
nosy: +python-dev

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Roundup Robot
2013-10-22 19:45:54 UTC
Permalink
Roundup Robot added the comment:

New changeset e8cead08c556 by Tim Golden in branch '2.7':
Issue #15207: Fix mimetypes to read from correct area in Windows registry (Original patch by Dave Chambers)
http://hg.python.org/cpython/rev/e8cead08c556

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Tim Golden
2013-10-22 19:53:54 UTC
Permalink
Tim Golden added the comment:

*cough* Somehow that didn't actually get pushed. Rebased against 2.7, 3.3 & 3.4 and pushed.

----------
assignee: -> tim.golden
resolution: -> fixed
stage: patch review -> committed/rejected
status: open -> closed
versions: -Python 3.2

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Christoph Zwerschke
2014-06-05 09:59:12 UTC
Permalink
Christoph Zwerschke added the comment:

After this patch, some of the values in mimetypes.types_map now appear as unicode instead of str in Python 2.7.7 under Windows. For compatibility and consistency reasons, I think this should be fixed so that all values are returned as str again under Python 2.7.

See https://groups.google.com/forum/#!topic/pylons-devel/bq8XiKlGgv0 for a real world issue which I think is caused by this bugfix.

----------
nosy: +cito

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Mark Lawrence
2014-07-30 15:33:15 UTC
Permalink
Mark Lawrence added the comment:

@Christoph please raise a new issue regarding the problem you describe in msg219788.

----------
nosy: +BreamoreBoy

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Mark Lawrence
2014-07-30 15:41:11 UTC
Permalink
Mark Lawrence added the comment:

@Christoph sorry #21652 has already been raised to address the problem of mixed str and unicode objects.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________
Brian Curtin
2014-07-30 15:41:54 UTC
Permalink
Changes by Brian Curtin <brian at python.org>:


----------
nosy: -brian.curtin

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15207>
_______________________________________

Loading...