Support bagging to a destination other than the source#92
Support bagging to a destination other than the source#92runderwood wants to merge 5 commits intoLibraryOfCongress:masterfrom
Conversation
…iple sources remain broken, awaiting clarification.
|
Looks good to me. @acdha is on vacation, but I am sure once he gets back he will take a look at it. |
|
Ping. |
|
|
||
| def make_bag(bag_dir, bag_info=None, processes=1, checksums=None, checksum=None, encoding='utf-8'): | ||
| def make_bag(bag_dir, | ||
| bag_info=None, processes=1, checksums=None, checksum=None, |
There was a problem hiding this comment.
I'd prefer not to have formatting changes in this PR or to follow PEP-8 if we do. In this case, the existing code has been using the aligned indent style.
| bag_dir = os.path.abspath(bag_dir) | ||
| LOGGER.info(_("Creating bag for directory %s"), bag_dir) | ||
| if dest_dir: | ||
| LOGGER.info(_("Creating bag for directory %s"), dest_dir) |
There was a problem hiding this comment.
I would prefer a single logging call rather than two near-duplicate lines. In this case, I think it might be easier to follow if we set dest_dir using bag_dir when called with dest_dir=None so most of the code could assume that dest_dir is what you use when assembling paths
| # FIXME: we should do the permissions checks before changing directories | ||
| old_dir = os.path.abspath(os.path.curdir) | ||
| if dest_dir is not None: | ||
| dest_dir = os.path.abspath(dest_dir) |
There was a problem hiding this comment.
I'd prefer to normalize dest_dir earlier in the function
| old_dir = os.path.abspath(os.path.curdir) | ||
| if dest_dir is not None: | ||
| dest_dir = os.path.abspath(dest_dir) | ||
| if os.path.exists(dest_dir): |
There was a problem hiding this comment.
This block of code is somewhat complicated. As a minor point, OSError or EnvironmentError is probably more appropriate than RuntimeError but I'm leaning towards simplifying the entire block to:
if not os.path.isdir(dest_dir):
os.makedirs(dest_dir)That will raise OSError if the path already exists but has a component which is not a directory.
| data_subdir = os.path.relpath( | ||
| bag_dir, os.path.join(bag_dir, os.pardir) | ||
| ) | ||
| if os.path.abspath(os.path.join(bag_dir, data_subdir)) == bag_dir: |
There was a problem hiding this comment.
I think this could be os.path.samefile(os.path.join(bag_dir, data_subdir), bag_dir)
| if not os.path.exists(data_dir): | ||
| os.mkdir(data_dir) | ||
| elif os.path.exists(os.path.join(data_dir, data_subdir)): | ||
| shutil.rmtree(os.path.join(data_dir, data_subdir)) |
There was a problem hiding this comment.
This seems risky at first read – shouldn't we just throw an error if the data directory already exists?
| os.chmod('data', os.stat(cwd).st_mode) | ||
| os.chmod(data_dir, os.stat(bag_dir).st_mode) | ||
|
|
||
| if dest_dir: |
There was a problem hiding this comment.
I'm wondering whether this should be one call with src_dir either being bag_dir or None for clarity
| txt = """BagIt-Version: 0.97\nTag-File-Character-Encoding: UTF-8\n""" | ||
| with open_text_file('bagit.txt', 'w') as bagit_file: | ||
| bagit_file.write(txt) | ||
| if dest_dir: |
There was a problem hiding this comment.
This entire section is where I was thinking it might be cleaner to always have dest_dir defined since I'm not happy about so many if statements in a row
|
@acdha Not sure if this branch is worth keeping around. I'm going to be out of pocket for at least a few months and don't have time to tighten it up before. |
See #35.