Friday, December 1, 2017

Krigings paper

Finally we submitted the Kriging paper. Interpolation of hydrological quantities is a necessity in hydrological modeling. Since the beginning of last century, various techniques were implemented to obtain it:
or other types of interpolation 

We prefer Kriging. This paper accounts for the implementation of Krigings inside the JGrass-NewAGE system


The paper can be seen here. However, you can find all the material of the paper on the OSF platform. We tried to share everything from code to data and even the simulation we have performed. Therefore, in principle, any reader could try our software and reproduce our result.  

Thursday, November 30, 2017

Journal Papers using GEOtop

This is the growing list of papers built upon GEOtop in its various versions.

Thursday, November 23, 2017

Monday's discussion on evapotranspiration - Part I - The vapor budget

Last Monday at lunch, I and my students discussed about evapotranspiration. I already talk about it in various comments here. However, the starting point was the impression, coming from one of my student that the topic of transpiration is still in its infancy. I agree with him and I offered my synthesis.

  1. In hydrology we use Dalton’s law (here, slide 21) or the derived equations named Penman-Monteith and Priestley Taylor (forgetting all the empirical formulas).
  2. Dalton’s law  puts together, diffusive vapor flux, vapor storage and turbulent transport.
  3. We should have a water vapor budget equation instead, written for some control volume where all the stuff is at its right place.

There is no difficult to recognize that control volume is limited in our case by the open atmosphere and the surfaces that are emitting vapor as consequence of  their thermodynamics. There is also no difficult to recognize that such limiting surfaces can be very complicate, as those of a canopy, for instance, which is also varying in shape and form with time. Out of these surfaces comes a vapor flux which is dependent on the thermodynamics and physiology existing below them, the holes in number and distribution through which the vapor is emitted, the water availability (and dynamics) in the storage below the surface and, last but not least, the vapor content in the control volume which, as in Dalton’s law commands the driving force.

It is not easy actually to account well for all of these factors. We can, maybe, for a single leaf. It is more complicate for the canopy of a single tree. It is even more difficult for a forest. Unless some goddess  acts to simplify the vapor budget, over the  billions of details, and reduces all to some tretable statistics (we can call this statistics the Holy Grail of evapotranspiration - or the whole Hydrology itself) 

Assume we can deal with it, and  we have the fluxes right. Then the vapor budget seems cristalline simple to obtain, the variation of vapor in the control volume is given by the incoming vapor flux, minus the outcoming vapor flux, minus, in case, the vapor condensation. The good old mass conservation. Unfortunately, also the output flux is not that easy to estimate, because the transport agent is atmospheric turbulence, which is affected by the non-linearities of Navier-Stokes (NSeq) equations, and its interactions  with the complex boundary represented by the terrain/vegetation surfaces. All of this involves a myriads of spatio-temporal scales and degrees of freedom which are not easy to simplify. 

Therefore, literature treats the evaporation as a flux, forgets the real mechanics of fluxes and simplifies turbulence according to similarity theory, essentially due to Prandtl work at the beginning of the last century with the additions of Monin-Obukhov theory.  However, in real cases, the hypotheses of similarity theory are easily broken and the velocities distributions are rarely those expected. All of this makes largely unreliable the transport theory applied in a pedestrian way (as we do). See References below and here a quite informed lecture to get a deeper view.

Summarizing,  the transport is complicate because turbulence interacting with complex surfaces is complicate (probably would be better to say "complex"). Numerically is a problem whose solution is still open (a full branch of science, indeed), and we do not know how to model the rustling of leaves (“and the icy cool of the far, far north, with rustling cedars and pines). 
In fact, the models we use for what we usually call potential evapotranspiration, are an extreme simplification of the wishlist.

Finally, the above picture forgets the role of air and soil temperature (thermodynamics). We were thinking, in fact, only to the mass budget and the momentum budget (the latter is what NSeq is),  but there is no doubt that evaporation and transpiration are commanded also, and in many ways, by the energy budget. Turbulence itself is modified by temperature gradients, but also water vapor tension which concurs to establish the quantity of water vapor ready to be transported at vapor emitting surfaces. A good news is that energy is conserved as well, but this conservation includes the phase of the matter transported (the so-called latent heat). So necessarily, in relevant hydrological cases, we have to solve besides the mass and momentun conservation, the energy conservation itself.


Looking for simplified versions of mass, momentum and energy budget, would require a major rethinking of all the derivations and new impulse to proper measurements that, however, some authors already started (e.g., for instance Schymansky, Or and coworkers,  here).

References

Tuesday, November 14, 2017

Open Science

Nothing really original in this post. I just recollect what already said in the FosterOpenScience web pages. Their definition is:

"Open Science represents a new approach to the scientific process based on cooperative work and new ways of diffusing knowledge by using digital technologies and new collaborative tools (European Commission, 2016b:33). The OECD defines Open Science as: “to make the primary outputs of publicly funded research results – publications and the research data – publicly accessible in digital format with no or minimal restriction” (OECD, 2015:7), but it is more than that. Open Science is about extending the principles of openness to the whole research cycle (see figure 1), fostering sharing and collaboration as early as possible thus entailing a systemic change to the way science and research is done." The wikipedia page is also useful.

This approach get along with the one of doing reproducible research which I already talked about several times. I do not have very much to add to what they wrote, but I also want to make you note that "there are in fact multiple approaches to the term and definition of Open Science, that Fecher and Friesike (2014) have synthesized and structured by proposing five Open Science schools of thought" .

In our work a basic assumption is that openness require also the appropriate tools, and we are working hard to produce them and use those other that make a scientific workflow open.

Friday, November 10, 2017

About Benettin et al. 2017, equation (1)

Gianluca (Botter) in his review of Marialalaura (Bancheri) Ph.D. Thesis brought to my attention the paper Benettin et al. 2017. A great paper indeed, where a couple of ideas are clearly explained:

  • SAS functions can be derived from the knowledge of travel and residence times probability
  • a virtual-experiment where they show that traditional pdfs (travel times pdf) can be seen an the ensamble of the actual time-varying travel times distributions.

The paper is obviously relevant also for the hydrological contents it explains, but it is not the latter point the one which I want  to argue a little. I want here just to argue about the way they present their first equation.

SAS stands for StorAge Selection functions and they are defined, for instance in Botter et al. 2011 (with a little difference in notation) as:
$$
\omega_x(t,t_{in}) = \frac{p_x(t-t_{in}|t)}{p_S(t-t_{in}|t)} \ \ \ (1)
$$
as the ratio between the travel time probability related to output $x$ (for instance discharge or evapotranspiration) and the residence time probability.
In the above equation (1)
  •  $\omega_x$ is the symbol that identifies the SAS
  • $t$ is the clock time
  • $t_{in}$ is the injection time, i.e. the time when water has entered the control volume
  • $p_x(t-t_{in}|t)$ with $x \in \{Q, ET, S\}$  is the probability that a molecule of water entered in the system at time $t_{in}$ is inside the control volume, $S$, revealed as discharges, $Q$, or evapotranspiration, $ET$

Equation (1) in Benettin et al. is therefore written as
$$
\frac{\partial S_T(T,t)}{\partial t} + \frac{\partial S_T(T,t)}{\partial T} = J(t) - Q(t) \Omega_Q(S_T(T,t),t)-ET(t) \Omega_{ET}(S_T(T,t),t) \ \ \ \ (2)
$$
Where:

  • $T$ is residence time (they call  water age but this could be a little misleading because the water age of water in storage could be, by their own theory different in storage, discharge, evapotranspiration)
  • $S_T$ is the age-ranked storage, i.e. “the cumulative volumes of water in storage as ranked by their age” (I presume the word “cumulative”  implies some integration. After thinking a while and looking around, also to paper van der Velde et Al. 2012, I presume the integration is over all the travel times up to $T$ which, because the variable of integration in my notation is $t_{in}$ means that $t_{in} \in [t,t-T]$  )
  • $J(t)$ is  the precipitation rate at time $t$
  • $Q(t)$ is the discharge rate at time $t$
  • $\Omega_x$ are the integral of the integrated SAS function which are more extensively derived below.

In fact, this (2) should be just an integrated version (integrated over $t_i$) of equation (9) of Rigon et al., 2016:
$$
\frac{ds(t,t_{in})}{dt} = j(t,t_{in}) - q(t,t_{in}) -et(t,t_{in})
\ \ \ \ (3)
$$
where:
  • $s(t,t_{in})$ is the water stored in the control volume at time $t$ that was injected at time $t_{in}$
  • $j(t,t_{in})$ is the water input which can have age $T=t-t_i$
  • $q(t,t_{in})$ is the discharge that exits the control volume at time $t$ and entered the control volume at time $t_{in}$
  •  $et(t,t_{in})$ is the evapotranspiration that exits the control volume at time $t$ and entered the control volume at time $t_{in}$
In terms of the SAS and the formulation of the problem given in Rigon et al. (2016), the $\Omega$s can be defined as follows:
\begin{equation}
\Omega_x(T,t) \equiv \Omega_x(S_T(T,t),t) := \int_{t-T}^t \omega_x(t,t_i) p_S(t-t_i|t) dt_i = \int_0^{p_S(T|t)} \omega_x(P_S,t) dP_S
\end{equation}
Where the equality ":=" on the l.h.s is a definition, so the $\Omega$s ($\Omega_Q$ and $\Omega_{ET}$) are this type of object. The identity $\equiv$ stresses that the dependence on $t_in$ is mediated by a dependence on the cumulative storage $S_T$ and $T$ is the travel time. As soon as $T \to \infty$, $\Omega \to 1$ (which is what written in equation (2) of Benettin's paper). This is easily understood because by definition ${\omega_x(t,t_i) p_S(t-t_i|t)} \equiv {p_x(t-t_i|t)}$ are probabilities (as deduced from (1)).
An intermediate passage to derive (2) from (3) requires to make explicit the dependence of the age-ranked functions from the probabilities. From definitions, given in Rigon et al., 2016. It is
$$
\frac{d S(t) p_S(t-t_{in}|t)}{dt} = J(t) \delta (t-t_in) - Q(t) p_Q(t-t_{in}|t) - ET(t) p_{ET}(t-t_{in}|t)
$$
which  is Rigon et al. equation (14).
Now integration over $t_i \in [t-T, t]$ can be performed to obtain:
$$
S_T(t_{in},t):= \int_{t-T}^T s(t_{in},t) dt_{in}
$$
and, trivially,
$$
J(t) = J(t) \int_{t-T}^T \delta(t-t_{in}) dt_{in}
$$
while for the $\Omega$s I already said.
The final step is finally to make a change of variables that eliminate $t_{in}$ in favor of $T := t-t_{in}$. This actually implies the last transformation. In fact:
$$
\frac{dS(t,T(t_{in},t))}{dt} =\frac{\partial S(t,T(t_{in},t))}{\partial t} + \frac{\partial S(t,T(t_{in},t))}{ \partial T}\frac{\partial T}{ \partial t} = \frac{\partial S(t,T(t_{in},t))}{\partial t} + \frac{\partial S(t,T(t_{in},t))}{ \partial T}
$$
since $\partial T/\partial t$ =1. Assembling all the results, equation (2) is obtained.

Note:
Benettin et al., 2017 redefines the probability $p_S$ as “normalized rank storage … which is confined in [0,1]” which seems weird with respect to the Authors own literature. In previous papers this $p_S$ was called backward probability and  written as $\overleftarrow{p}_S(T,t)$. Now probably they have doubt we are talking about probability.  In any case, please read it again: "normalized rank storage … which is confined in [0,1]”. Does not sound unnatural is not a probability ? Especially when you repetitively estimate averages with it and comes out with “mean travel times”?Operationally, it IS a probability. Ontologically the discussion about if there is really random sampling or not because there is some kind of convoluted determinism in the travel times formation can be interesting but it brings to a dead end. On the same premised we should ban the word probability from the stochastic theory of water flow, that, since Dagan has been enormously fruitful.

This long circumlocution,  looks to me like the symbol below

or TAFKAP, which was used by The Artist Formerly Known As Prince when he had problems with his record company.
In any case, Authors should pay attention in this neverending tendency to redefine the problem rather beacause it can look what Fisher (attribution by Box, 1976) called mathemastry. This is fortunately not the case of the paper we are talking about. But then why not sticking with the assessed notation ?

The Authorea version of this blog post can be found here.

References 

Tuesday, October 31, 2017

Meledrio, or a simple reflection on Hydrological modelling - Part VI - A little about calibration

The normal calibration strategy is to split the data we want to reproduce into two setz:

  • one for the calibration phase
  • one for the "validation" phase
Let's assume that we have an automatic calibrator. It usually:
  • generates a set of model's parameters, 
  • estimates with the rainfall-runoff hydrological model and any given set of parameters the discharges, 
  • compares what computed with what is measured by using a goodness of fit indicator
  • keeps the set of parameter that gives the best performances
  • repeats the operation a huge number of times (and use some heuristics for searching the best set overall)

This  set of parameters is the one used for "forecasting" and

  • is now used against the validation set to check its performances.
However, my  experience (with my students who usually perform it) is that the best parameter set in the calibration procedure, is not usually the best in validation procedure. So I suggest, at least as a trial and for further investigations to:

  • separate the initial data set into 3 parts (one for first calibration, one for selection, and one for validation).
  • Among the 1% (or x% where x is let at your decision) of best performing in the calibration phase  is selected (called the behavioural set). Then 1% (one over 10^4) best performing in the selection phase is further sieved. 
  • This 1 per ten thousand is chosen to be used in the validation phase
The hypothesis to test is that this three steps way to calibrate returns usually better performances in validation than the original two step steps one.

Sunday, October 29, 2017

Open Science Framework - OSF

And recently I discovered OSF, the Open Science Framework. My students told me that there exists many of them, of this type of on-line tools that make leverage of the cloud to store and helps groups to manage their workflow. However, OSF seems particularly well suited to work for scientists’ group, since it contains links various science-oriented features, like connections to Mendeley, Figshare, Github and others. An OSF “project” can contain writings, figures, codes, data. All of this can be uploaded for free in their servers or being maintained in one of your cloud storage like Dropbox or GoogleDrive. 

For starting, you can take one our of your time to follow one of their YouTube video, like the one below.

Their web page contains also some useful guides that make the rest (do not hesitate to click on icons: they contain useful material!). The first you can start with is the one about the wiki, a customizable initial page that appear in any project or sub-project. There are some characteristics that I want to emphasize here. Startin a new project is easy, and when you have learn how to do it, you almost have learn all of it. Any project can have subprojects, called “components”. Each component behaves like a project by itself, so when dealing with it, you do not have to learn something really new. Any (sub)project can be private (the default) or public, separately, and therefore your global workflow can contain private and public stuff. 

Many people are working on OSF. For instance Titus Brown’s Living in a Ivory Basement blog also has some detailed review of it. They also coded a command line client for downloading files from OSF which can be further useful.