dinsdag 31 oktober 2017

Improving transparancy of scenario analysis in GAMS



I have recently gotten back to work with GAMS after having spent most of the last 6 years working with R, and I quickly got struck in the typical trap of trying to do things in GAMS that I had gotten used to do with R – just as most of my early R code was basically a translation from GAMS. (Now, let me get one thing straight from the outset: I think discussions about which of the two is the superior tool are meaningless, and I am not going to spend any time on this type of discussions. Both have been designed to deal with specific issues, and they are well suited for the uses they have been built for.)

However, after some years of complete immersion in R, I did get use to take some of its features for granted, and found it frustrating I couldn’t implement them in the same way in GAMS.

Take for instance the issue of comparing different scenarios. Of course, GAMS allows you to “loop” through different scenarios. For instance, your input data can be organised in Excel files where different ranges correspond to different scenarios. This works fine in a relatively simple setting where the nature of the work implies that the number of alternative scenarios is limited and known in advance. 

However, this is not always the case. Take for instance the PLANET model , which has been developed by the Belgian Federal Planning Bureau for long term forecasts of transport demand in Belgium. The model has been developed initially around 2008. In the last ten years, the policy context has evolved, the range of transport technologies has expanded, new transport solutions have emerged, the research questions themselves have changed… Ideally, one would like to have a core model that is relative stable even when the scenarios under investigations change. If the scenarios are dug deeply in the code of GAMS (or in external files), this increases the risk of errors as time goes by. Moreover, these errors are unlikely to yield any error messages, and one may well end up publishing erroneous results. 

Similarly, at the output side, one would like to have spreadsheets or CSV files, with a name that clearly indicates to which scenario they correspond – which, as a side benefit, would make subsequent analysis with R more convenient.

Fortunately, there is a relatively simple work-around: when you call GAMS from the command line, you can use control variables to navigate through scenarios, and produce appropriately named output files for each scenario. An additional advantage of this method is that you can also call R scripts (or Python, depending on your preferences) from the command line. This allows you to move any pre- or post-processing of your data and results from your spreadsheets to the command line, increasing the transparency and the reproducibility of your work. 

For instance, the following code is a stylized version of a script I have recently implemented in PowerShell – obviously, similar code can easily be written in bash if you work on a Linux system:


gams definitions s=defin
$Scenarios = ("BAU","SC1")
foreach ($Sc in $Scenarios ){
gams step2 r=defin,s=save2 --Yr=2012 --YrPrev=2011 --YrNxt=2013 --scen=$Sc
gams step3 r=save2,s=save3 --Yr=2012 --YrPrev=2011 --YrNxt=2013 --scen=$Sc
gams step4 r=save3,s=save4
$Years = (2013..2016)
foreach ($Yr in $Years ) {
gams step2 r=save4,s=save2 --Yr=$Yr --YrPrev=$($Yr-1) --scen=$Sc
gams step3 r=save2,s=save4 --Yr=$Yr --YrNxt=$($Yr+1--scen=$Sc
}
gams step5 r=save4 --scen=$Sc
}

In this example, definitions, step2, step3, step4 and step5 are gms files. 

The model is thus composed of:
·         An initialisation step definitions, where all relevant definitions are given that are stable across scenarios and through time.
·         For each scenario, step2 and step3 are first runs for a base year, followed by a non-repeated run of step4.
·         Next, the model loops through the years for step2 and step3, followed by,
·         step5, where all the results for this scenario are saved as Excel files

Although the code is mostly self-explanatory, some details are noteworthy:

  • With $include, GAMS offers the possibility to call one gms script from another gms script. In the approach proposed here, all gms scripts are called from PowerShell (or bash).
  • The ‘save’ and ‘restart’ options need always to be put before the other control variables.
  • In the loop over the years, information is accumulated in the model. Therefore, the ‘save’ file for the last gms script in each iteration should be the ‘restart’ for the first script. For each scenario, however, we start again from the ‘save’ file of the initialisation step, and there is no need to ‘save’ the results from the final step. 

The crucial change, however, is that control parameters now fulfill a double role:

  • On the one hand, they are used to load input data files and to create output files whose names refer to the scenario and/or the year. For instance, for scenario “BAU” the following code will create an output file ResultsBAU.xlsx, where the parameter of interest MyPar will be written to the sheets of the same name, starting from cell A1:
execute_unload "Results%scen%.gdx";
execute 'gdxxrw Results%scen%.gdx Output=C:\MyModel\output\Results%scen%.xlsx par=MyPar rng= MyPar!A1';

  • On the other hand, these control parameters can also be used as labels. Thus, if the command line ‘feeds’ Yr=2012 to the GAMS script, then GAMS will correctly ‘translate’ LHS(set1;”label1”,set2) = RHS(set1,”label1”,set2, ,"%Yr%") as LHS(set1;”label1”,set2) = RHS(set1,”label1”,set2, ,"2012").
  • One subtle complication concerns operations on "%Yr%" within the GAMS script. For instance, to build on the previous example, "%yearp% + 1" will not be ‘translated’ in “2013” inside the GAMS script. If you wish the script to use label referring to the next year, you need to add this as a control variable to the command line. In the example above, we have used --YrNxt=$($Yr+1). In the script, PAR1(set&,"label1","%YrNxt%") will then be translated correctly to PAR1(set&,"label1","2013") if $Yr=2012.