Discussion:
[issue25584] a recursive glob pattern fails to list files in the current directory
Xavier de Gaye
2015-11-08 15:54:35 UTC
Permalink
New submission from Xavier de Gaye:

On archlinux during an upgrade, the package manager backups some files in /etc with a .pacnew extension. On my system there are 20 such files, 9 .pacnew files located in /etc and 11 .pacnew files in subdirectories of /etc. The following commands are run from /etc:

$ shopt -s globstar
$ ls **/*.pacnew | wc -w
20
$ ls *.pacnew | wc -w
9

With python:

$ python
Python 3.6.0a0 (default:72cca30f4707, Nov 2 2015, 14:17:31)
[GCC 5.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import glob
len(glob.glob('./**/*.pacnew', recursive=True))
20
len(glob.glob('*.pacnew'))
9
len(glob.glob('**/*.pacnew', recursive=True))
11

The '**/*.pacnew' pattern does not list the files in /etc, only those located in the subdirectories of /etc.

----------
components: Library (Lib)
messages: 254344
nosy: serhiy.storchaka, xdegaye
priority: normal
severity: normal
status: open
title: a recursive glob pattern fails to list files in the current directory
type: behavior
versions: Python 3.6

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
R. David Murray
2015-11-08 17:46:12 UTC
Permalink
R. David Murray added the comment:

I believe this behavior matches the documentation:

"If the pattern is followed by an os.sep, only directories and subdirectories match."

('the pattern' being '**')

I wonder if '***.pacnew' would work.

----------
nosy: +pitrou, r.david.murray

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
Serhiy Storchaka
2015-11-08 18:30:59 UTC
Permalink
Serhiy Storchaka added the comment:

I already don't remember if it was a deliberate design, or just implementation detail. In any case it is not documented.
No, it is not related. It is that './**/' will list only directories, not regular files.
Post by R. David Murray
I wonder if '***.pacnew' would work.
No, only ** as a whole path component works.

----------
assignee: -> serhiy.storchaka

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
R. David Murray
2015-11-09 04:39:57 UTC
Permalink
R. David Murray added the comment:

Ah, I see, 'pattern' there means the whole pattern. That certainly isn't clear.

----------

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
Serhiy Storchaka
2015-11-09 07:14:45 UTC
Permalink
Serhiy Storchaka added the comment:

Likely it was implementation artifact. Current implementation is simpler butter fitted existing glob design. The problem was that '**/a' should list 'a' and 'd/a', but '**/' should list only 'd/', and not ''.

Here is a patch that makes '**' to match also zero directories. Old tests were passed, new tests are added to cover this case.

----------
keywords: +patch
stage: -> patch review
versions: +Python 3.5
Added file: http://bugs.python.org/file40986/rglob_zero_dirs.patch

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
Serhiy Storchaka
2015-11-09 08:45:36 UTC
Permalink
Changes by Serhiy Storchaka <***@gmail.com>:


Added file: http://bugs.python.org/file40987/rglob_zero_dirs_2.patch

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
Xavier de Gaye
2015-11-09 13:03:51 UTC
Permalink
Xavier de Gaye added the comment:

FWIW the patch looks good to me.

I find the code in glob.py difficult to read as it happily joins regular filenames together with os.path.join() or attempts to list the files contained into a regular file (sic). The attached diff makes the code more correct and easier to understand. It is meant to be applied on top of Serhiy's patch.

----------
Added file: http://bugs.python.org/file40988/rglob_isdir.diff

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
Xavier de Gaye
2015-11-09 18:02:23 UTC
Permalink
Xavier de Gaye added the comment:

glob('invalid_dir/**', recursive=True) triggers the assert that was added by my patch in _rlistdir().

This new patch fixes this: when there is no magic character in the dirname part of a split(), and dirname is not an existing directory, then there is nothing to yield and the processing of pathname must stop (and thus in this case, no call is made to glob2() when basename is '**').

----------
Added file: http://bugs.python.org/file40989/rglob_isdir_2.diff

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
Roundup Robot
2015-11-09 21:19:30 UTC
Permalink
Roundup Robot added the comment:

New changeset 4532c4f37429 by Serhiy Storchaka in branch '3.5':
Issue #25584: Fixed recursive glob() with patterns starting with '**'.
https://hg.python.org/cpython/rev/4532c4f37429

New changeset 175cd763de57 by Serhiy Storchaka in branch 'default':
Issue #25584: Fixed recursive glob() with patterns starting with '**'.
https://hg.python.org/cpython/rev/175cd763de57

New changeset fefc10de2775 by Serhiy Storchaka in branch '3.5':
Issue #25584: Added "escape" to the __all__ list in the glob module.
https://hg.python.org/cpython/rev/fefc10de2775

New changeset 128e61cb3de2 by Serhiy Storchaka in branch 'default':
Issue #25584: Added "escape" to the __all__ list in the glob module.
https://hg.python.org/cpython/rev/128e61cb3de2

----------
nosy: +python-dev

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
Serhiy Storchaka
2015-11-09 21:58:34 UTC
Permalink
Serhiy Storchaka added the comment:

Please open new issue for glob() optimization Xavier.

----------
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________
Xavier de Gaye
2015-11-10 11:21:41 UTC
Permalink
Xavier de Gaye added the comment:

New issue 25596 entered: regular files handled as directories in the glob module.

Thanks for fixing this Serhiy.

----------

_______________________________________
Python tracker <***@bugs.python.org>
<http://bugs.python.org/issue25584>
_______________________________________

Loading...