Conversation
…e islandora model in the model map.
ctgraham
left a comment
There was a problem hiding this comment.
Initial readthrough with some questions and suggestions.
…as being the first value from the return column.
| @@ -1,4 +1,5 @@ | |||
| #!/usr/bin/python3 | |||
| #!/usr/bin/python3.12 | |||
There was a problem hiding this comment.
Explicitly depending on 3.12 should be fine as long as the machine running the script has 3.12 (in the future, some machines may have versions >3.12). It might help to add a warning like if sys.version_info() < (3, 12): logger.warn("unsupported python version")
There was a problem hiding this comment.
One way to not use the 3.6 default, but also not use an explicitly defined version would be to use #!/usr/bin/env python and then run the script within a virtual environment generated by python 3.12
There was a problem hiding this comment.
The choosing of what version of python being run has been an ongoing issue for a while. There are various ways this can be achieved. None seem to be the best way. RHEL 8 ships with 3.6, RHEL 9 ships with 3.9, and the Ansible version used to build our systems requires 3.12. It is recommended by RedHat to not remove the shipped version as it impacts their code. But 3.6 is too old for some of the python modules we need to use. How best to not have to worry about the required version? Your test of sys.version_info() will probably help but only if the OS can expose the correct version to the script at runtime even if 3.12 is installed. RHEL has the 'alternatives' command but again changing the default version of python may impact OS required scripts is not recommended. This is why I hardcoded the 3.12 version. Additionally, I don't believe logger.warn will work as the logger hasn't been setup at the time of the check which I would do very early in the script.
| print(f"Invalid content in YAML file: {path}.") | ||
| raise | ||
| except Exception as e: | ||
| print(f"An unexpected error occurred while reading the YAML file: {str(e)}") |
There was a problem hiding this comment.
Log yaml error so it can be seen in log file if script is executed from non cli environment
There was a problem hiding this comment.
@ojas-uls-dev at the time of executing read_yaml_file the log hasn't been created yet for the function to write to the log. This is a catch 22 if we move the creation of the log file before the read_yaml_file as the config has the default log file settings. I'm not sure what the best approach is to resolve this.
| parent (str) The parent directory of the object. | ||
| df (pd.DataFrame) The Pandas DataFrame we will be updating. | ||
| # Validate columns | ||
| if match_column not in df.columns: |
There was a problem hiding this comment.
is match_column intended to be global/inherited from caller? if not, it should be passed as arg.
match_value and return_column also seem to be inherited
There was a problem hiding this comment.
@ojas-uls-dev, the match_column and return_column are both passed into the get_value_from_df function as arguements. They are not per say global.
| headers["Authorization"] = f"Bearer {auth_token}" | ||
|
|
||
| try: | ||
| response = requests.get(url, headers=headers, params=params) |
There was a problem hiding this comment.
response.raise_for_status() can also be used to handle 403s and 404s in same try except block
There was a problem hiding this comment.
@ojas-uls-dev, are you suggesting halting the program if a 403/404 is encountered? And yes, we can add stanzas to handle 403/404 as well. Just wasn't sure you were indicating a failure should trigger a stop or just handle the 403/404 differently.
| return(df) | ||
|
|
||
|
|
||
| def get_directory(directory: str): |
There was a problem hiding this comment.
os.walk does something very similar to this function
There was a problem hiding this comment.
@ojas-uls-dev, it does look like it might do the same as this function except the function sorts the result alphabetically. os.walk doesn't look like it does that, but maybe there is away to sort the os.walk result the same way. This would be something that could be looked into.
This PR addresses issue #1.
This is a full rewrite of the original scan-batch-dir script to allow it to be more modular so that changes needed in the future should be easier to implement. This also added the ability to use PDF files as newspaper issues.