## Technical Article - Optimal Parameter Estimation in NZEB Renovation Projects

## Technical Article - Optimal Parameter Estimation in NZEB Renovation Projects

__ Authors__:

**I.T. Christou**, NetCompany-Intrasoft, Luxembourg, Luxembourg**M. Founti**, National Technical University of Athens, Athens, Greece**M. Logothetis**, NetCompany-Intrasoft, Luxembourg, Luxembourg**A. Grigoropoulos**, NetCompany-Intrasoft, Luxembourg, Luxembourg**C. Stavrogiannis**, NetCompany-Intrasoft, Luxembourg, Luxembourg**I. Atsonios**, National Technical University of Athens, Athens, Greece

*Note: opinions in the articles are of the authors only and do not necessarily reflect the opinion of the EU.*

**Introduction**

The EU-funded project PLURAL aims at developing a number of key technologies for the **deep renovation of buildings** in the EU states with its main goal being that of **near-zero energy buildings**, better known as NZEB. In PLURAL, three major categories of Plug-and-Use kits (‘PnU kits’) are developed for this purpose. Each of these kits comes in a wide variety of optional configurations, and each configuration results in different performance characteristics of the building where the kit is to be installed; the same configuration therefore requires a new, potentially expensive simulation of the building in which it is to be installed to measure its performance in the particular building.

In addition to the main three technology pillars, another important aspect of the project is the development of a **Decision Support Tool **(MODEST) **that allows users to sort through the collected simulation as well as real live building monitoring data** and see the configuration choices that have led to the best KPIs the user has selected as sorting criteria. The **MODEST platform** is a cloud-based platform that is comprised of a distributed message bus infrastructure built on top of the Apache Kafka, and a state-of-the-art time series NoSQL database (called InfluxDB) that stores building monitoring as well as weather data that comes from an external third-party platform providing appropriate query APIs.

While the main functionalities of the tool, as exposed in a web-based **Graphical User Interface (GUI)**, are sufficient to provide the user with all the data the system manages, certain queries cannot be answered with the standard query tools available to the infrastructure. For example, no query in the Structured Query Language standard (SQL) can be formulated to answer the question of what are the minimal sets of parameter configurations that always lead to NZEB status, and yet this information is important for users that seek to understand the variables that play the most important roles in driving the energy requirements of a building to **NZEB**, or even **Zero Energy Building (ZEB)** status when using a specific technology.

**Computing all settings that lead to NZEB**

Figure 1 depicts the main information screens of the MODEST tool (https://modest.plural.rid-intrasoft.eu, requires user credentials to log in.)

*Figure 1. MODEST Tool Screens, top left is the tool home-screen, top right shows sensor monitoring screen, bottom left shows graphical representation of time-series sensor monitoring data, and bottom right show the export to CSV functionality of the tool.*

Figure 2 shows the selection process of PnU alternative configurations based on desired KPIs. All the important parameters used in the determination of a number of Key Performance Indicators (KPIs) about the renovation outcome, together with the various computed values of KPIs are shown in different rows in a table, where each row specifies the KPIs given a particular set of parameter values for the PnU kit.

*Figure 2. MODEST Tool Main Search Screenshot*

While the GUI (illustrated in Figures 1 and 2) is intuitive and reminiscent of how a spreadsheet works, the pilot users in our project were quick to point out that instead of the user having to look through different sorts of the same data in order to eventually locate a row that is of interest to them, a particularly nice way of ‘combing through’ the data would be, at the press of a button, to show to the user ALL minimal combinations of parameter settings that would lead to a particular desired outcome, e.g. reaching NZEB status. Notice that in the same manner, if actual ZEB (Zero Energy Building, as opposed to Near-Zero Energy Building) status information is actually available, the same computation would work with ZEB as well as NZEB.

The above-mentioned functionality is illustrated with a contrived example. Consider the following *model* Table 1, of parameters and their values, and the corresponding column indicating whether NZEB status has been reached or not.

*Table 1: Model Example of PnU Parameters with Calculated NZEB KPI Result*

From this table, one could easily conclude that for any PnU kit, whenever the insulation material is of type ‘Mineral Wool’, the NZEB status is true. However, there are other combinations that can lead to believe that they would result in a near-zero energy building as well. For example, in the above dataset, whenever the insulation thickness is equal to 160 (mm), the result reaches NZEB with 100% confidence; this new result has 2/5 = 40% support (same as the previous result that also holds with 2/5=40% support and 100% confidence). Notice that other combinations, such as ‘Insulation Material’ = ’Mineral Wool’ AND ‘PV?’ = ‘Yes’ also always lead to NZEB status being true, but they are useless in that when such a rule holds, the base (smaller) rule ‘Insulation Material’ = ‘Mineral Wool’ already holds.

The problem, therefore, is to derive all minimal combinations of parameters and parameter values for which the NZEB status is always (100% of the time) true. One way to do this would be via a brute-force algorithm that would essentially exhaustively search all parameters and their values, to see for which such combinations, NZEB is always true, and then, assuming that these antecedent conditions have sufficient support and confidence, show these parameter-value settings, in order to allow the user to focus on a particular set of parameters. Such a computation would require O(N_1^2×…×N_k^2) steps, where N_i,i=1…k is the number of distinct values that the i-th parameter takes on the dataset; for example, for the dataset shown in Table 1, we have that N_1=N_2=N_3=N_4=N_5=2. In a dataset with 10 parameters, each of which assumes 10 different values in the dataset, the total number of computations is in the order of (10^2 )^10=10^20 which is one hundred billion billion steps; even on a latest generation intel core CPU that can perform approximately 1TFLOP (one trillion FP operations per second) the program would require such a CPU to run non-stop for approximately 3.17 years before it came up with the answer, and this is on a model-sized problem.

However, it is also possible to use an algorithm called QARMA (Christou, Amolochitis & Tan, 2018, Christou 2019) for extracting all such minimal (and non-dominated) Quantitative Association Rules from the dataset with a consequent clause of ‘NZEB=true’ and use the rules’ antecedents as the combinations of parameter settings that lead to NZEB being true. The expression ‘non-dominated’ means that no rule in the result set dominates any other rule in the result set. Without going into the mathematical definitions (see Christou, Amolochitis, & Tan (2018) for a detailed treatment), a rule r_1 is said to dominate another rule r_2 if rule r_1 has the same or better support and confidence on the dataset than rule r_2, both rules indicate the same status for NZEB, and in addition, whenever rule r_2 ‘fires’, the dominating rule r_1 also fires, therefore rendering the rule r_2 ‘redundant’ in that it provides no new information that the dominating rule does not already provide. The example with the ‘Insulation Material’ and the ‘PV?’ variables combination above, should provide enough indication of the notion of dominance between rules. However, to make things clear, we provide yet another example, this time based on the numeric nature of the ‘Insulation Thickness’ variable. Consider the rule r_1 that states

‘Insulation Thickness’ ≥ 160 then NZEB=YES.

The rule holds with support 40% and confidence 100%. Now, consider another rule r_2 that states

‘Insulation Thickness’ = 160 then NZEB=YES

The rule r_2 has the same support and confidence value as r_1 but is dominated by it, because whenever r_2 fires, then the rule r_1 also fires and has the same consequent as the rule r_2. Therefore, r_1 dominates r_2 and the rule r_2 cannot be included in the final result set.

QARMA is a parallel/distributed breadth-first algorithm with appropriate shortcuts in its search process that guarantees that all the correct rules are discovered. In our model example above, when we run QARMA with minimum required support set to 40% and confidence set to 100%, the system correctly identifies the following two rules:

- ‘Insulation Material’ = ‘Mineral Wool’ then NZEB=YES
- ‘Insulation Thickness’ ≥ 160 then NZEB=YES

Which in turn leads to the correct two combinations of parameters (‘Insulation Material’ = ‘Mineral Wool’) and (‘Insulation Thickness’ ≥ 160).

The discussion in this article initially treated the parameter variables as categorical variables that can only assume to hold certain discrete values; however, as is clear from the context, certain variables are numeric in nature, and their values could be made to lie in certain intervals. Such is the case in our model example with the variable ’Insulation Thickness’, measured in millimetres. QARMA is capable of computing all the rules that hold in the dataset regardless of the nature of the variables, be it categorical or numeric. Even more, the algorithm is uniquely capable of performing the task at hand, as almost all other algorithms for discovering quantitative association rules from data (e.g. Karel (2006), Saleb-Aouissi, Vrain & Nortet (2007), Minaei-Bidgoli, Barmaki, & Nasiri (2013)) are heuristic in nature and do not guarantee (nor do they have as objective) the extraction of all rules that hold in the dataset, and would result in incomplete information presentation to the end-user.

**Conclusions**

Overall, a Data Mining algorithm for rule extraction from multi-dimensional datasets has found an application to lighten the cognitive load imposed by the User-Interface of the informatics-related infrastructure of the PLURAL project. This helps with the overall User Experience of the project users. The algorithm itself scales up and out very efficiently and requires far less computations than those required by a brute-force algorithm, or by the heuristic algorithms mentioned above.

**References**

- Ioannis T Christou, Emmanouil Amolochitis, Zheng-Hua Tan (2018), “A parallel/distributed algorithmic framework for mining all quantitative association rules”, arXiv preprint arXiv:1804.06764, https://arxiv.org/ftp/arxiv/papers/1804/1804.06764.pdf
- I.T. Christou (2019) “Avoiding the hay for the needle in the stack: Online rule pruning in rare events detection”, IEEE Intl. Symp. on Wireless Communication Systems, Special Session on Energy IoT, Oulou, Finland, 27/8/2019, pp:661—665.
- F. Karel (2006) “Quantitative and ordinal association rules mining (QAR mining)”, In: Proc. 10th Intl. Conf. on Knowledge-based Intelligent Information and Engineering Systems (KES ’06), pp. 195-202.
- A. Saleb-Aouissi, C. Vrain, C. Nortet (2007), “QuantMiner: a genetic algorithm for mining quantitative association rules”, In: Proc. International Joint Conference on Artificial Intelligence (IJCAI ’07).
- B. Minaei-Bidgoli, R. Barmaki, M. Nasiri (2013) “Mining numerical association rules via multi-objective genetic algorithms”, Information Sciences, vol. 233, pp. 15-24.