3. Dev Info
This chapter should include all the useful information that you will need to
make contributions to the dvmdostem
project.
Important
In order to keep the base repository size from growing out of control, the
testing and sample data is tracked using Git LFS. If
you need to use the testing data (i.e. to run the tests, or to build the
documentation from source) then you should install Git LFS and run git lfs
pull
to make sure that the actual data is downloaded.
3.1. Languages, Software Structure
The core model (dvmdostem
) is written in C++, while most of the supporting
tooling is written in Python. There are also an assortment of bash scripts, R
code, and IPython Notebooks floating around for various tasks.
The core code is compiled with a basic Makefile. This documentation is written in reStrutured Text and Sphinx and is also compiled with a Makefile.
3.2. Coding Conventions
Table? List? Subsections?
Here are some first things off the top of my head.
Indent with spaces, use 2 spaces for the tab width.
Aim for lines to be <80 chars long,
For Python write docstrings in
numpydoc
(Link?) format.For C++ write comments in Doxygen format.
Favor verbose descriptive variable names.
For documentation (*.rst
and *.md
files, docstring
s, etc), please
hard wrap lines at 80 charachters before comitting. Many text editors have
settings or extensions that can help with this tedium. With VSCode, try the
Rewrap extension. For Sublime, try “Select -> Edit -> Wrap”.
3.3. Documentation
There are several places that you will find information about dvmdostem
,
each with a different type of info:
README file(s) - overviews of a repository or project. Used as a rough introduction, installation instructions and links to other resources.
This document (which is the formatted output of the
.rst
source files). The document is split into several chapters:
“Model Overview” - narrative, scientific description of the
dvmdostem
model.“Running”- info and examples for hands on use of the model.
“Dev Info” - (this chapter) which contains all the other programming, usage and workflow information for hands on work with the model.
Several other chapters, available in the table of contents.
It is likely that you are reading this document where it has been published online; if you build the documentation locally, then by default, the output ends up in
docs_src/sphinx/build/html
.Doxygen output. Similar to Sphinx, Doxygen is a documentation processing tool that scans source files and can create a variety of output formats. Doxygen for this project is configured to analyze the C++ source files and generate an interactive html page with detailed call graphs and text parsed from the C++ files.
The
--help
flag fordvmdostem
and many of the scripts in thescripts/
directory. This info is generally specific usage information for the given tool.Comments in the code are helpful for implementation details.
Github wiki (hopefully deprecated soon). This may end up being a home for assorted tutorials and examples.
In a perfect world most of the comments in the code will eventually be formatted in such a way that they can be picked up by tools such as Doxygen (for C++) and Sphinx (for Python). But because that is a work in progress it is still helpful to browse the code especially for implementation details about the code.
We used the Github Wiki for several years, but we are trying to move away from it towards a more robust, fully featured platform that integrates better with CI/CD tooling. Github Wiki might be a good place to keep certain tutorial-like information.
To build the Sphinx documentation (this document) locally, then do the following:
$ cd docs_src/sphinx
$ make clean && make html
Warning
Note that we need to set the PYTHONPATH
in order for the qcal.py
to
be imported and documented during this build process. Not sure the best way
/place to set this yet, so showing an example here:
$ make clean
$ PYTHONPATH="/work:/work/calibration:$PYTHONPATH" make html
The resulting files are in the docs_src/sphinx/build/html
directory and can
be viewed locally with a web browser.
To build the Doxygen documentation locally, then do the following:
$ cd docs_src/doxygen
$ doxygen
The resulting files are in the docs_src/doxygen/doxygen_build
directory and
can be viewed locally with a web browser.
3.3.1. Preview -> Editing -> Contributing
Previewing
Here are the steps to preview documentation changes (perhaps made by someone else) in your local environment. Assuming you have a development environment, a cloned copy of the repo, and a “clean” working state:
Checkout the branch you are interested in previewing. For example someone else has pushed to the
upstream/<BRANCH-NAME>
branch and you’d like to see what they have written or how it all looks:$ git remote update && git checkout <BRANCH-NAME>
.Clean the existing docs and build them:
$ cd docs_src/sphinx && make clean && make html
Preview the results in your browser (
file:///path/to/your/repo/docs_src/sphinx/build/html
).
Note
It is generally easiest to run the documentation build using the
dvmdostem-dev
Docker container so that the build environment (Sphinx
version, etc) match the environment used to publish.
Editing
The writing and editing process for the docuemtation ends up looking essentially like the general coding or programing process:
setup a development environment of your choice
clone the repository to your development environment
checkout a new or existing topic branc to work on
edit the source files (
docs_src/sphinx/*.rst
)process the
.rst
files:cd docs_src/sphinx && make clean && make html
preview the results in your browser
(
file:///path/to/your/repo/docs_src/sphinx/build/html
)commit your changes
For more details about the coding process see the Workflow section.
Contributing
If you would like to contribute your edits use a Pull Request.
To make a Pull Request, you must push your commits to Github (either your fork)
or the uaf-arctic-eco-modeling/dvm-dos-tem
, depending on your choice of
workflow and your status as a collaborator.
3.3.2. Publishing
Publishing (updating the live website at github.io) is reserved for the
maintainers, tcarman2
and rarutter
.
In the current implementation with Sphinx (used to format this document), we
have a docs_src
folder within which is a subdirectory for each documentation
tool (presently Doxygen and Sphinx). Each tool is setup to put its outputs in
its own directory. To publish outputs, the contents are copied to the docs/
directory in the root of the repo and then pushed to the gh-pages
branch of
the upstream repo. Pushing to the gh-pages
branch leverages the free
publishing available from Github and is a simple way to make the documentation
publicly available. See the publish_github_pages.sh
for more details.
Automated publishing (e.g. for each release) is still a work in progress.
Currently the Sphinx documentation is designed to be published to Github Pagesand the Doxygen documentation is only intended for local use.
3.3.3. Note about images
Including images in documentation presents similar challenges for raw, rendered, and word processing systems. One choice is whether to embed the image directly or provide a link to it. And another choice has to do with how to version control the image and make it easy to update in the future.
The simplest solution is to simply not worry about it and commit the .png
or .jpg
files directly to the repo. This certainly works, but imagine a
scenario where you need to update the image, say to fix a typo. If you were
the original creator, then you open the drawing file (e.g. Photoshop, Visio,
Open Office Draw; whatever you used to create the image) edit the image,
export it, move it into the documentation structure, overwriting the original,
and commit the result to version control. This assumes that you have the
original image. If you don’t (either because you lost it, or perhaps you were
not the original creator, then you must completely redraw the image from
scratch, which is ridiculous in many cases.
One way to solve this is to commit the original image file to version
control (e.g. the .ps
or .dwg
file) alongside the exported image that
will be included in the documentation. This is essentially the same dilemma
as with the raw → generated text documentation. However drawing files
typically don’t read well with file diffs, so it is hard to tell what changed
with the images, making it important to have good commit messages and keep
the exported files as well. And keeping all these binary files uses quite a
bit more space than plain text files, so it is easy for the size of the
repository to get out of control.
A novel solution that we discovered for this problem is to use linked Google Drawing documents roughly as follows:
Make a Google Drawing and save it (with a name)
Click the Share button
Edit the preferences so that the drawing is viewable to anybody with the link
Under File menu select “Publish to Web”
Select “Embed”
Copy the embed link
Paste the link into the appropriate place in your document
For each type of document there might be a different way to render the link, and this may not be possible in all languages/environments. In the Github wiki, which uses, Markdown, including something like this will allow the image to render, directly from Google Docs when someone loads the page:
<!-- From Tobey Carman's google drawing "dvmdostem-general-idea-science"-->
<img src="https://docs.google.com/drawings/d/17AWgyjGv3fWRLhEPX7ayJKSZt3AXcBILXN2S-FGQHeY/pub?w=960&h=720">
If the original Google Drawing is updated, then the drawing seen in the wiki will be updated too. Take caution with the permissions granted for editing on the original drawing!
Warning
When you are editing an image that is embedded, the edits are automatically live on the published website! This is fine for quick edits such as fixing a typo, but for anything more substantial, it is reccomended that you make a duplicate of the Google Drawing, edit the duplicate and then copy it back over the original. This will keep your edits from showing up on the live site until you are done with them!
Warning
Soure drawings for this document should probably be stored in the Shared Google Drive so that they are not tied to an individual’s account.
In Google Docs, there is a way to insert a Google Drawing from a menu: Insert > Drawing > From Drive.
With Sphinx, use the :raw:: html
directive. The Sphinx documentation warns
against abusing the :raw::
directive, so this might not be a good long
term solution but it could be useful for creating a bunch of the drawings
while they are in draft stages.
We have not tested this approach with a system such as Doxygen but assume it should work. This solution is not perfect, downsides include:
Drawing is not strictly version controlled along with other content (Google Drawings offers some version control but this would not be linked to the
dvmdostem
git repository).The end user must have web connectivity to see the drawings.
3.4. Version Management
The primary reasons for using a version management system for dvmdostem
are:
To maintain a meaningful history of the codebase so that the provenance of the code is not in question.
To facilitate the addition or modification of code by many developers.
To maintain the ability to revert to or recover specific points in the history of the codebase. This may be for the purpose of duplicating prior work, or to recover a lost behavior of the software, or both.
There are two (related) parts to fulfilling the above goals:
Making the commits (file diffs) easy to read and understand.
Having a strategy or pattern for bringing different lines of development together.
If the file diffs are unreadable or the lines of development are not brought together in an organized fashion, then the project history is harder to trust which brings into question the provenance of the code, and makes it harder for people to contribute.
3.4.1. Version Control and Hosting
This project is using Git for version control and Github for hosting. The primary fork of the code (referred to as “upstream”) is currently hosted under the uaf-arctic-eco-modeling organization, so the primary (upstream) repository address is: https://uaf-arctic-eco-modeling.github.io/dvm-dos-tem.
Note
The Source Control Management (SCM) or Version Control software is named
git
.git
is really a general tool for managing a certain type of data structure (Directed Acyclic Graph or DAG for the curious). As such, there are many ways it can be used correctly and it is up to each group to find a pattern that works for the project.Github is a website that uses git and provides web hosting as well as other features such as access management, wikis, issue tracking, and support for automated workflow and actions.
The dvmdostem
code is open source and the repository is publicly available
and can be cloned by any interested party. However write access to the
upstream repository is only granted to trusted collaborators. We gladly
accept contributions to the code via pull request from anyone, but the pull
request will have to be merged by a collaborator with write access to the
upstream repo. See the branching and workflow sections below for more details.
3.4.2. Branching Model
A generalized view of our branching model can be seen in the diagram:
The image shows one long-running branch (red commits; master
), three topic
branches (green commits; issue-47
, modify-dvm
, and bugfix-4
) and
three “experiment branches’’ (gray commits; exp-iem-0
, exp-akyrb-0
,
exp-QCF-SA
).
Two of the topic branches have been merged (blue arrows). One of the topic
branches (modify-dvm
) will be merged in the future (dotted blue arrow). The
dark red commits on the master branch have been tagged to make an official
release of the code. The gray commits are for “experiment branches” which are
used to track a specific model run or set of model runs. Often the changes on
these branches are only to config and parameter files, but some experiments
might require code changes as well.
This diagram does not explicitly show interaction between multiple developers; assume that each commit in the drawing could be made by any of the trusted collaborators with push access to the upstream repository.
As a basic safety feature we have placed a restriction on the master branch of
the upstream repository such that only the administrators (tcarman2@alaska.edu
and rarutter@alaska.edu ) are allowed push access. This restriction makes it
unlikely that a trusted collaborator can accidentally push something that breaks
the master branch. The best way for trusted collaborators to get code into the
upstream/master
is to open a pull request from their topic branch (e.g.
upstream/topic-foo-bar
) into upstream/master
using the Github web
interface for pull requests. All interested parties then have an opportunity to
review the code, comment on Github, and push new commits to the topic branch (if
necessary). Only the administrators can merge the pull request.
As a general practice we try to have most work done in topic branches and merged into master using Github pull requests. For some small changes (usually for details that were inadvertently excluded from a recent pull request) we will make commits directly on the master branch without using the topic branch/pull request process. Using the topic branch/pull request process helps to organize work and will provide a convenient place to run Github Actions, for example an action to run the test suite before green-lighting a pull request for merging.
Recently (2022 and the several years prior) we have been using a single
long-running branch (master
) and have been able to manage all contributions
by periodically merging topic branches. If the need arises we can switch back to
using an additional long-running branch. This would allow different levels of
stability as described in the Git Book Branching Workflows
section.
In the event that you need work from upstream/master
in order to continue
the work on your topic branch, you can periodically merge upstream/master
into your topic branch. However please only use this when absolutely necessary
as it can make the history harder to read and the pull requests harder to
review. See this Note for a description of one
potential problem with merges.
Note
One problem with casually using merges in a workflow as opposed to using rebase is that the default merge messages can:
Clutter the history.
Be very confusing if you end up changing a branch name at a later date.
For instance if you have a long-running branch with a large feature you are working on and you need to get updates from upstream, if you choose to merge into your “long-running-branch”:
$ git checkout long-running-branch
(long-running-branch)$ git pull upstream master
Then you will get a merge message by default that starts with something like this:
Merge branch 'master' from github.com:uaf-arctic-eco-modeling/dvm-dos-tem into 'long-running-branch'
All well and good, but later, once you work has evolved, you may decide to change the name of long-running-branch to something more relevant:
(long-running-branch)$ git checkout -b more-descriptive-name
(more-descriptive-name)$ git branch -D long-running-branch
While renaming the branch is not a problem in and of itself, the merge commit title will contain “…into ‘long-running-branch’”. The long- running-branch no longer exists! So the merge commit message will be confusing to anyone who was not involved with long-running-branch or forgot about it. Without good commit messages, it is harder to understand the history and without a good understanding of the history it is easy to lose control of the project. So please learn to use rebase and merge appropriately!
3.4.3. Workflow
We are primarily using the “Centralized Workflow” described in the Git Book Distributed Workflows. We have a number of trusted developers at collaborating institutions and we grant them write (push) access to the upstream repository. With this model, each developer can push directly from their local repository to the upstream repository - developers do not need to maintain their personal forks on Github (but are free to do so if they wish).
If you are not one of our trusted collaborators and have contributions to make,
then you will need to follow the Git Book “Integration Manager Workflow”. You
will simply fork the upstream repository on Github, clone to your computer and
push changes back to your fork. You can then make a pull request from your fork
into the upstream/master
.
When two or more developers want or need to work contemporaneously on a topic
branch, it is up to the developers to communicate and make sure that they do not
step on each other’s toes. In practice this simply amounts to communicating with
other folks via email, the Arctic Eco Modeling Slack, or Github Issues and
remembering to run git pull --rebase
. Using --rebase
prevents
unnecessary merge commits that can make the history confusing and harder to
trust.
Note
A big part of maintaining a low friction workflow revolves around
understanding what types of files or information should not be included in
version control and figuring out how to exclude these files. The general
idea is that you don’t want to keep generated files (e.g.: *.o
, or
Doxygen output), but you do want to track code that can generate certain
outputs. If you need the outputs, then you run the generating code to
produce it. The general rule is don’t track files that you can generate,
track the code to generate them.
Note
Another common sticking point is figuring out how to track host specific settings, such as specific environment variables, build settings, or the project settings files generated by many IDEs. You may need to devise your own way to track these settings locally on an individual developer or workstation level without pushing them to the central shared repository.
Note
Learn to use git-stash
, it is very handy for setting aside work before
pulling or rebasing from upstream so as to prevent unnecessary merge
commits!
Note
See the following helpful discussions:
3.4.4. Releases and Version Numbering
Begining in 2021, we started using the “Releases” feature of Github to package
and distribute specific versions of dvmdostem
. We would like to make this a
fully or nearly fully automated process but for the time being it is rather
manual.
As described in the HOWTO_RELEASE.md
document in the repo, the project uses
a three part version number: vMAJOR.MINOR.PATCH.
- We use the following rules for incrementing the version number:
The PATCH number (farthest right) will be incremented for changes that do not affect the general scientific concepts in the software.
The MINOR number (middle) will be updated when changes have been made to science concepts, major implementation changes for scienctifc aspects of the code calibration numbers are updated, or large new features are added.
The MAJOR (left) number will be updated for major milestones. This will likely be points where the model is run for “production” or major testing and validation steps are completed and documented.
This project is not using traditional Semantic Versioning, however we have borrowed some concepts.
Until the project reaches v1.0.0
, we will not make any guarantees about
backwards compatibility. Once the project reaches v1.0.0
, we may decide to
handle the rules for incrementing version numbers differently.
Releases are currently made on an as-needed basis by tcarman2@alaska.edu or rarutter@alaska.edu.
The steps are described in the HOWTO_RELEASE.md
document and the result is
that release is visible here: https://github.com/uaf-arctic-eco-modeling/dvm-dos-tem/releases
3.4.5. Keeping your repo up to date with upstream
See the “Command Cheat Sheet”.
Note
A common developer issue is that you may have installed custom libraries that
are not available yet inside the dvmdostem Docker image. When you shutdown
your Docker containers, then any custom libraries you have installed will be
lost. When you start your containers again, you will have to re-install these
libraries. This can be somewhat tedious. One solution for this is that you
keep a custom requirements file and ask pip to install packages from that when
you start up your Docker containers. For example if you need the Python
package BeautifulSoup
, and PyDemux
(don’t ask why) you might make a
file in your repository requirements_custom.txt
with the following lines:
BeautifulSoup==4.8.1
PyDemux=1.0
And then when you start up your Docker container, you can run the following to install your custom pacakges:
develop@263004fd19aa:/work$ pip install -r requirements_custom.txt
Your requirements_custom.txt
should not be tracked with Git. If you have
further customizations beyond this there is likely a way to inject your
specific environment needs into the Docker container using custom .bashrc
files or the docker compose .env
file or some combination thereof.
Note
A common issue that comes up when you have multiple branches that you are working on is that you checkout a different branch and try to run something in your docker container and it fails because a library is not installed. For example:
docker compose exec dvmdostem-dev bokeh serve scripts/bk_timeslider.py --port 7001
2023-02-09 23:16:41,834 Starting Bokeh server version 2.4.2 (running on Tornado 6.2)
2023-02-09 23:16:41,835 User authentication hooks NOT provided (default user enabled)
2023-02-09 23:16:41,838 Bokeh app running at: http://localhost:7001/bk_timeslider
2023-02-09 23:16:41,838 Starting Bokeh server with process id: 5351
2023-02-09 23:16:48,986 Error running application handler <bokeh.application.handlers.script.ScriptHandler object at 0x7fdd8517b910>: No module named 'xarray'
File 'bk_timeslider.py', line 7, in <module>:
import xarray as xr Traceback (most recent call last):
File "/home/develop/.pyenv/versions/3.8.6/lib/python3.8/site-packages/bokeh/application/handlers/code_runner.py", line 231, in run
exec(self._code, module.__dict__)
File "/work/scripts/bk_timeslider.py", line 7, in <module>
import xarray as xr
ModuleNotFoundError: No module named 'xarray'
This happens when one of the branches introduces a library requirement that is not yet in the upstream codebase. Ideally the library has been added to the requirements file, but this is an easy step to forget. If the library is in the requirements file, then all you usually need to do is ask pip to install everything again:
develop@a2d3e3cb5a55:/work$ pip install --upgrade -r requirements_general_dev.txt
If the offending library is not yet in the requirements file, then it is usually a good idea to add it and make a commit first.
3.5. Testing and Deployment
There is currently (Sept 2022) a very limited set of tests and their execution
is not automated. It is a goal to increase the test coverage and automate the
test exectution in the near future. We are hoping to setup a CI/CD pipeline
using Github Actions that can automatically test and deploy the dvmdostem
model and supporting tooling.
Testing is currently implemented for some of the Python scripts in the
scripts/
directory using the Python doctest
module. The style and
structure of tests reflects the challenges we have had getting testing intgrated
into this project. The doctest
module has a nice feature that allows tests
to be written in a literate fashion with much explanatory text. This allows us
to hit several goals with one set of testing material:
explanations and examples of code/script usage;
testing across a wide range of encapsulation; for example some of the tests are very granular unit tests of single functions in the script files, while others test comprehensive behavior of entire modules and command line interfaces;
basic regression testing.
There are two primary places that the doctests
will show up:
In the
__docstring__
of a given Python script or function.In a standalone markdown (.md) or reStructuredText (.rst) file with specially formatted test code.
The tests that are in the docstrings of a given file or function should be very narrow in their scope and should only check the functionality of that specific function, independant from everything else, whereas tests in a standalone file can be much broader and more flexible in their design - i.e. module level tests.
At present we have had much more luck writing the broader tests (that also serve
as examples of usage) in stand alone files named with the following pattern:
scripts/tests/doctests/doctests_*[.md | .rst]
. The files are markdown or
reStructuredText formatted with embedded code that is executed by the
doctest
module. The execution context and other doctest
particulars are
described here:
https://docs.python.org/3/library/doctest.html#what-s-the-execution-context
To run the tests that are in __docstring__
s of a function or file:
$ PYTHONPATH="/work/scripts" python -m doctest scripts/util/param.py # <-- script name!
To run the tests that are in an independent file:
$ PYTHONPATH="/work/scripts" python -m doctest scripts/tests/doctests/doctests_param_util.md # <-- test file name!
In either case, if all the tests execute successfully, then the command exits
silently. If there errors, the doctest
package tries to point you towards
the tests that fail.
Note that in both cases, the PYTHONPATH
variable is set so that the module
imports work properly in the scripts and tests. Many of the test currently use
the demo-data, config files and parameter files in the main repo. The paths for
these in the tests are assumed to be relative to the repo root. So you will
likely have the best luck running the tests from the repo-root. For this reason
you need to specify PYTHONPATH
so that inside the test scripts, imports can
be made of scripts and tools in the scripts folder.
In order to run all the tests, this loop should work:
for i in $(ls scripts/tests/doctests/);
do
PYTHONPATH="/work/scripts" python -m doctest scripts/tests/doctests/$i;
done
3.6. Setting up a dev environment
There are many paths to setting up a development environment and the specific path you choose will depend on your experience and needs. Over the years we have tried all of the following:
Local installation.
Hand managed Virtual Box VM.
Vagrant managed VM.
Docker container stack.
The current (2022) preference is generally for the Docker container stack, although on some systems a local installation is still preferable.
3.6.1. Setting up with Vagrant
WRITE THIS…
3.6.2. Setting up with Docker
WRITE THIS… Install docker desktop Make sure you have docker and docker compose available on the command line Find a place on your computer for: Your dvmdostem repo Your catalog of inputs Your catalog of “workflows”
3.6.3. Setting up with Ubuntu
WRITE THIS…
3.7. Debugging strategies
For problems with running dvmdostem itself, the first thing to do is generally
run with a higher log level. This is available as a command line flag with both
long and short forms (--log-level
, -l
).
You will imediately notice that with the more verbose levels the amount of stuff
printed to your console will be overwhelming and likely saturate your scrollback
buffer, making it impossible to read messages from the beginning of the run,
which is where you usually want to look to diagnose initialization errors. One
trick to overcome this is to redirect the standard output (stdout
, 1
)
and standard error (stderr
, 2`
) streams to a file which you can search
thru post-hoc using less
or a text editor. For example:
$ dvmdostem --log-level debug > MY_OUTPUT.txt 2>&1
Nothing will be output to your console and you should have a file that you
can search through when the run is done. See the tee
command if you want to
see the output on your console as well as save it to a file.