Assessing the quality of the requirements specification by applying GQM approach and using NLP tools

. Software requirements are quite difficult to measure in terms of quality without reviews and subjective opinions of stakeholders. Quality assessment of specifications in an automated way saves project resources and prevents future latent defects in software. Requirements quality can be evaluated based on a huge variety of attributes, but their meaning is quite vague without any mapping to specific measurement metrics. Application of goal-question-metric (GQM) approach in the quality model helps to choose the most important quality attributes and create a mapping with metrics, which can be collected and calculated automatically. Text of software requirements written in natural language can be analyzed by NLP tools due to identify weak signle words and phrases, which make statements ambiguous. Metrics for such quality attributes as ambiguity, singularity, subjectivity, completeness, and readability are proposed in this work. The quality model was implemented in a prototype by adopting natural language processing techniques for requirements written in the Russian language with the support of external API. how to achieve the goals and metrics are provided to ascertain the progress in attaining set goals. This research work makes use of GQM by setting goals such as unambiguity, completeness, readability that the requirements must meet, questions on how to derive these quality attributes, and what to measure in determining if our requirements match defined goals.


Introduction
Low quality of requirements may cause expensive consequences during the software development lifecycle. Issues in requirements, such as ambiguity and incompleteness may lead to time and cost overrun in the project. Especially, if iterations are long and feedback comes too late -the faster a problem is found, the cheaper it is to fix. However, it is not so easy to properly detect in automated way whether a requirements specification has lack of clarity. Some of these issues require specific domain knowledge to be uncovered. For example, it is very difficult to detect with automatic approaches whether a requirements specification is lacking necessary features. There are a variety of requirements management techniques, tools, and practices in the software development field. However, they should be tailored to the choosen development methods. Requirements engineering assumes that the requirements must meet a number of criteria. Descriptions of such criteria can be found in both the scientific and methodological literature and put on standards. For instance, the IEEE standard [1] for requirements engineering defines quality attributes for a single requirement: necessary, appropriate, feasible, verifiable, correct, conforming, complete, consistent, comprehensible etc. Several language criteria are also defined for the text of requirements. Unbounded and ambiguous terms should be avoided. Requirements should state 'what' is needed, not 'how'. Despite the exact techniques on how to gather and validate this requirements quality metrics are not formulated by the standard (which is obviously beyond its area of consideration), but they are the topic for various researches. Software requirements in industry are most often written in natural language which has no any formal semantics. This is the main reason why issues in requirements are so hard to detect. Approach that is presented in this paper faces the problem of fast feedback and getting some knowledge about specification's semantics with concrete symptoms for a requirement artefact's quality defect. Natural language processing (NLP) tools and systems have been applied to analysing requirements texts since the 1980's [2]. More and more NLP systems and tools with applying requirements evaluation are developed in recent years. It is a most appropriate technique to analyse human text and collect some useful data about it. Goal-question-metric (GQM) involves defining achievable goals in order to attain quality thereby providing questions in relation to how to achieve the goals and metrics are provided to ascertain the progress in attaining set goals. This research work makes use of GQM by setting goals such as unambiguity, completeness, readability that the requirements must meet, questions on how to derive these quality attributes, and what to measure in determining if our requirements match defined goals.
The ultimate goal of this work was encapsulating the best of these techniques and methods for measurement requirement quality into a single model and provide a prototype of tool for automated validation of real-world requirements against it with Russian language support. This paper will present some measuring quality indicators for natural language requirements presented in textual natural format. Identified quality indicators first of all should point out concrete defects and provide suggestions for improvements. Proposed tool prototype combined all this indicators and computes quality measures in a fully automated way. This paper describes a workflow (see, e.g. fig 1) of a model that helps to obtain low-level quality indicators based on some metrics of textual requirements, such as text length, number of ambiguous terms, imperative verbal forms, etc. This model has been implemented in a tool that computes quality metrics in a fully automated way. Fig. 1. Example of requirement analysis workflow for automated tool conQAT [3]

Related studies
Many publications discuss different problems in requirements specifications. First problem is automatization of assessment process. Mostly researchers focus on approaches for automated detection of specific defects in requirements specifications without any additional interaction with user. The main features of such tools include detection of similarity in requirements [4] and ambiguity [5], detection of missing information, linguistic flaws and passive sentences [6]. K¨orner and Brumm presented RESI, a tool that scans documents for linguistic flaws and reports them to the user (see Section II-C). It can be used to detect defects in requirements specifications, but the high number of false positives results prohibits the actual use of this tool in a real case [7]. Fabbrini et al. presented QuARS (Quality Analyzer for Requirement Specifications) tool that checks requirements specification by comparing with predefined word lists [5]. The lists give indicators for problems and if the number of indicators in the phrase exceeds a given threshold, requirements is ambigious. Verma et al. presented their RAT (Requirements Analysis Tool) [8] tool -a word processor that able to analyze natural language requirements based on a user-defined glossary and constrained language. RAT highlights problematic requirements directly in the requirements specifications, but this process requires some training by real users. Goldin and Berry [9] implemented a tool called Abstfinder to identify the abstractions from natural language text used for requirements elicitation. Lee and Bryant [10] developed an automated system to assist the engineers to build a formal representation from informal requirements. Overhelming majority of authors suppose that ambiguity carries a high risk of misunderstanding among different readers. Several studies dealing with ambiguity identification have aimed to help improve the quality of requirements documents. Some tools have been developed specifically to detect, measure and reduce possible structural ambiguities in text. In paper their paper, Yang H, Willis A, De Roeck A and Nuseibeh B describe an automated approach for characterizing and detecting ambiguities that was implemented in NAI (Nocuous Ambiguity Identification) tool prototype [11]. Implemented tool uses machine learning algorithm to determine whether an ambiguous sentence is nocuous or innocuous, based on a set of heuristics that draw on human judgments, which we collected as training data. The tool focuses on coordination ambiguity. Kamsties et al. [12] described pattern-driven inspection technique to detect ambiguities in requirements. Mich and Garigliano [13] investigate the use of a set of ambiguity for the measurement in syntactic and semantic ambiguity, which is implemented in tool LOLITA using NLP algorithms. Kiyavitskaya et al. [14] proposed a two-step approach in which lexical and syntactic analysys was performed to identify ambiguity. An automated tool was implemented to measure what is potentially might be ambiguous specifically for each sentence. Another important proplem related to specification quality assesment is a choice of correct criteria for overall evaluation. Davis et al. evaluated 24 criteria and metrics for determination of the overall requirements specification quality [15]. Some of the criteria may affect and even contradict each other. Therefore, the authors made a conclusion that a perfect requirements specification does not exist. Another approach was proposed by Wilson et al. -he counted the occurrences of certain expressions in a document to evaluate its quality [16] with indicators that include completeness and consistency. This group of researchers developed a tool that focus on a broader understanding of requirements quality, instead of just a single aspect. Implemented ARM tool is based on the IEEE 830 standard and aims at developing metrics for requirements quality. Ambriola and Gervasi [17] developed a web-based NLP tool, called Circe, which was designed to facilitate the gathering, elicitation, selection, and validation of requirements. Unlike other related works show attempts to evaluate requirements in automatically by only one quality criteria, current paper describes an approach to identify several correct quality attributes with correlated metrics to measure, combining them into one overall evaluation in an automated way. Moreover, all the mentioned-above tools support only English language and don't support requirements written in Russian.

GQM approach
The Goal-Question-Metric is a method based on system of questions and simple answers about properties evaluation [18]. This approach consists of three main steps: specifying goals, pointing relevant attributes and providing measurements. GQM framework helped to define appropriate metrics and estimate the quality of requirements in current case. The goal should be defined for an object, with a purpose, from a perspective, in an environment. The overall goal of current project it to measure quality of requirements and it can be formulated by following template: Analyze requirement quality for the purpose of improving with respect to quality attributes from the viewpoint of project managers in the context of product development.
In addition, several sub-goals were identified, which should be fulfilled to achieve the main goal.
For instance: Sub-goal: Analyze requirement unambiguity for the purpose of improving with re-spect to quality attributes from the viewpoint of project managers in the context of product development. Question: How many vague words and weak phrases make requirement ambiguous?
Metric: Number of ambiguous words in 1 requirement divided by an average number of words in 1 requirement.
In this approach, identification of the questions and metrics allows to properly clarify the goals in order to achieve the transparency and propose how and why the goals are supposed to be achieved. Clarification becomes more concrete during the movement from top level to bottom and helps to avoid abstract unreal goals. [19] GQM approach is supported by several specific methodological phases ( fig. 2) [19].

Fig. 2. Phases of Goal/Question/Metric approach
• Definition -goals, questions, metrics and hypotheses are defined and documented. Main attributes, formulas and measurement approaches and exact metrics are defined. • Data Collection -searching and counting for ambiguous words and other quality indicators in source text.
• Interpretation -collected data is processed into quality measurement results, that provides answers on defined questions to reach the goal.  [19] GQM approach in current case consists of 5 main steps ( fig. 3) [19].
• Business goal setting. The main purpose of this case is automated evaluating quality of software requirements by range of attributes.
• Generating questions. Breaking down goals into components, defining them in a quantifiable way, e.g., «How much should the proposal structure be complied with so that the requirement has the quality property of completeness»?
• Specifying measures. Detecting metrics that should be collected to answer questions, e.g., «Percentage of matching requirement sentence structure template».
• Defining mechanism of data collection. Measures are collected by semantic and syntax text analysis based on matching words with predefined dictionaries.
• Gathering and analyzing of collected data. Calculated metrics should be interpreted into quality Timoshchuk E.V. Assessing the quality of the requirements specification by applying GQM approach and using NLP tools. Trudy ISP RAN/Proc. ISP RAS, vol. 32, issue 2, 2020. pp. 15-28 20 estimation for each requirement and overall Software Requirements Specification (SRS).
Process of measuring quality in software development has it's certain difficulties. In order to understand the effects of actions that are implemented in software development and gain the understanding of how the improvements can be made for a future process a certain purpose should be put in measurement process. The purpose may be: 1. Understanding of the product requirements. Correct measurements will allow to see the graphical or mathematical representation of a requirement elicitation process, whether it will be a time spend on describing every feature. 2. Controlling the product requirements. While having a graphical representation of the SRS document, the relations between different requirements and user actions can be identified, which further would allow to control the impact on the development process in total. 3. Improving the product requirements. can be achieved after the control of the development processes is gained. Certain improving effect can be applied to processes, variables and their relationships.
Metrics for the requirements should allow to determine their quality for the current development process and to represent collected resulting data in a graphical way.

Quality attributes
Many authors in their methodologies have already defined the key interdependent ( Fig. 4) quality attributes [20]. Fig. 4. Dependencies between of qualitative attributes [6] • Validity -the clients should be able to validate (confirm) the requirement according to their needs. • Verifiability -the engineer must be able to verify that the system-to-be meets the specified system.
• Modifiability -requirements must be able modifiable with ease for maintenance.
• Completeness -all client's needs must be covered.
• Consistency -should not be any contradiction among requirement.
• Understandability -the requirements are correctly understood without difficulty.
• Unambiguity -there exists only one interpretation of the requirement (no ambiguous words in the requirement sentence).
• Traceability -there exists an explicit relationship of each requirement with design, implementation and testing artefacts.
• Singularity -each requirement is clearly determined and identified, without mixing it with other requirements. Mentioned-above attributes come out from following types of unbounded or ambiguous terms that should be avoided according to the standard: • superlatives ('best', 'most'); • subjective language ('user friendly', 'easy to use', 'cost-effective'); • vague pronouns ('it', 'this', 'that'); • ambiguous terms such as adverbs and adjectives ('almost always', 'significant', 'minimal') and ambiguous logical statements ('or', 'and/or'); • open-ended, non-verifiable terms (such as 'provide support', 'but not limited to', 'as a minimum'); • comparative phrases ('better than', 'higher quality'); • loopholes ('if possible', 'as appropriate', 'as applicable'); • terms that imply totality ('all', 'always', 'never', and 'every'); Unambiguity. It requires that only one semantic interpretation of the requirement exists. To evaluate the ambiguity of each requirement, we propose to use dictionaries with a set of words, which indicates ambiguity in the requirement [21] [22]. There are several types of them (Table 1).
• Connection words dictionary -("and", "or", "but", "however", "otherwise", "even", "although", etc.) the usage of such words not a problem in itself, but their too frequent use leads to a decrease in the quality of requirements, especially in terms of uniqueness and ambiguity.
• Negative adverbs dictionary -the negative particle is the word not used to indicate negation, denial, refusal, or prohibition. Repeated use of such words makes a sentence difficult to understand and decrease the ambiguity of the requirement.
• Anaphoric expressions dictionary -the use of expressions, the interpretation of which depends on other expressions previously encountered in the text, for example: "which", "he", "she", "it", "they", "where", "this", "that", etc. Requirements containing anaphora usually do not have characteristics of clarity and unambiguity.
• Undefined terms dictionary -In addition to connective words, the quality of requirements is significantly affected by the use of vague terms that lead to ambiguity. As the metric for assessing ambiguity, was used the following formula: Where -the number of words in the requirement, -the number of ambiguous words in the requirement. Singularity. Statement of the requirement must relate to only one unique requirement that does not overlap with others. The presence of several modal words tells us that the requirement contains several meanings and that the statement does not have the characteristic of singularity. These words may include could, may, might, can, should, will, shall, must, would, etc. The number of connective words may also indicate the presence of several requirements within one (mentioned above). As the metric for assessing singularity, was used the following formula: where -the number of words in the requirement, -the number of modal verbs which are not zero, -the number of connective words in the requirement. Subjectivity. This attribute indicates the presence of perspectives, feelings, or opinions entering the decision-making process. The leading causes of subjectivity in requirements can be: • dangerous plural with ambiguous reference, • combination of "and" and "or" that leads to unclear associativity, • unclear inclusion, • passive voice, • imprecise and inside behavior, • negative or too broad reason.
Readability. This attribute indicates how easily requirement text can be read and understood, it can be based on the number of syllables per word and number of words per sentence. It can be calculated by Flesch-Kincaid Grade Level [23], Coleman-Liau Grade Level [24], and Smog Grade [25]. The second one was chosen: = 0.0588 − 0.296 − 15.8 where L -average number of letters per 100 words, S -average number of sentences per 100 words. If CLI is around 10, text is easy to read, but if CLI > 15 text is too difficult for understanding. A mapping into percentage interpretation was made (if CLI index is more than 17.5, than readability is 0%) by following formula: Completeness. It requires that the requirement contain all necessary elements, including constraints and conditions, to enable the requirement to be implemented [7]. Example of structure template in Table 2 for this requirement: «In the Combat Zone, an HQ Switch, which is identical to a trunk node switch, shall be given two independent links to at least two other nodes in the network».
where -the number of elements in the structural template, -the number of elements from template that were identified in requirement sentence.3

Natural language Processing
Natural Language Processing is a field of computer science and computational linguistics that aims to analyze linguistic data from input using computational methods and techniques [26]. The natural language is very complicated, it is subject to syntactic and semantic rules. It studies the conceptual dimension that refers to «pragmatic» actions which are intended. The syntactic rules describe the major pattern of a sentence such as nouns, adjectives, and verbs [20]. The semantic rules refer to the meaning of each word in the sentence and relation between words when they are combined, which is «compositional semantic» [27]. Natural language texts are used to be analyzed in a sequential process. This process starts with lexical and structural elements. For its purpose text should be parsed in a search of the most suitable syntax tree. After that some complex techniques are applied for interpretation of the semantic content due to meaning understanding. Of course, such analysis does not allow to understand fully the content and get independent meaning of the sentence without any discrepancies.

Fig. 5. NLP workflow for requirement analysis
Several techniques were used in current model workflow: splitting sentence in syntax tree, part-ofspeech tagging, morphological analysis and calculationg word distribution and co-occurance by redefined dictionaries (Fig. 5). Inputed text document with requirements should go through several text preprocessing steps: sentence splitting, POS-tagging, and phrase-based shallow parsing. Sentence splitting. At first, the text is splitted into a set of sentences by using a sentence boundary detector. POS-tagging. Then, for each requirement sentence, the parser based on individual words and associated phrase information that used to obtain word lemma and POS tags such as noun, verb, adjective, adverb, etc. In current model the Stanford NLP library [28] was used for this. POS tagging helped to determine so-called substituting pronouns. A detailed description of POS tagging technical details is beyond the scope of this paper, but can be found, for example, in [29]. Given a sentence in natural language text, it determines the role and function of each single word in the sentence. The output is usually a so-called tag for each word, e.g. whether a word is an adjective, a particle or a possessive pronoun. These are pronouns that do not repeat the original noun and, thus, need a human's interpretation of its dependency. A syntax tree shows the main structure of the sentence (Fig. 6), where tree's leafs are the words of the sentence and inner nodes express the sentence's Timoshchuk E.V. Assessing the quality of the requirements specification by applying GQM approach and using NLP tools. Trudy ISP RAN/Proc. ISP RAS, vol. 32, issue 2, 2020. pp. 15-28 24 composition. In example «the channel selection» forms a nominal phrase (NP), as indicated by their common parent node NP. The additional information «of the headphones» is added as prepositional phrase (PP). The noun phrase and the prepositional phrase form a new nominal phrase, which is the object of the verb «changes».

Fig. 6. Syntax tree for NLP workflow
Morphological Analysis. Based on POS-tagging more detailed analysis of text was performed that determines its inflection. This step contains identifying a verb's tense or an adjective's comparison. The main outcome of this step is analysis for usage of adverbs and adjectives in their comparative or superlative form. Dictionaries. For describing different quality attributes were used several dictionatries with ambigious phrases and words based on quality standards and case study experience. Normalisation technique for dicitionry words called lemmatisation was applied, that reproduces the original form of a word. This technique is very similar to stemming, Porter Algorithm [30], that based on the POS tag as the word's morphological form instead of heuristics.

Prototype
To fully support the extraction of metrics for all before-mentioned quality attributes, the prototype should have several features [ Fig. 7]. The prototype is a software tool which main goal is to perform requirements quality measurement. Requirements can be of any type expressed in the text form: functional, non-functional, use-cases. The prototype is able to perform several functions: • integration with project management system to gather textual requirements from it (via is API).
• perform syntax and semantic analysis of said requirements (supporting Russian language [21] [22]). The core of the prototype is the Requirement Quality Model which contains a consistent set of requirements quality metrics and is expressed in algorithms on how to measure these metrics and how to draw conclusions (average quality of a requirement/set of requirements). The prototype provides a requirement engineer with a graphical user interface or command-line interface to obtain the results of requirements measurement. For NLP were used custom analogues of Python libraries Wordnet [31] and Spacy [32] with Russian language support. Analysys of ambiguity was implemented based on open-source microservice OpenReqEU.
To get results of the requirements analysis the prototype provides the requirements engineer with the either graphical or command line interface. Here are some of the interface functions that are available: • list all the requirements; • show quality metrics of the specific requirement; • show quality metric for all requirements analysed.
The dictionaries of ambigious words were translated into Russian language. Docker container for OpenReqEU microservice was rebuild and used as external API for further process of quality assessment. All ambigius words in requirements highlighted according to their category after service finished its work. Fig. 7. Prototype scope in ArhiMate [33]

notation
On the next step one more external API was used for evaluating readability indexes by service readability.io -text of every requirement was uploaded and resulting number was recieved from website. For graphical representation of evaluated quality results Visual Paradigm Diagram was used. Spider graph and stacked histogram were chosen as the most appropriate visualisaition of collected data. All the metrics that were calculated in prototype automatically synchronized with Visual Paradigm Service and published as a web-dashboard.

Results
After implementing the proposed solution on requirements, it was tested on the sample requirement text. As a result, the following distribution of weak words shown in Fig. 8 was got. These weak words were highlighted in GUI and classified by different types of ambiguity (Fig. 9).

Fig. 9. Hightlighted ambiguous words in every requirement
Final evaluation about overall quality was made (Fig 10).

Conclusion and discussion
Natural language still prevails in the majority of requirement documents. Software engineers need ways to cope with the ambiguity inherent in natural language requirements. In order to minimize their side effects at the early stages of the software development lifecycle, it is important to develop scalable automated solution to detect potential nocuous ambiguities in natural language requirement specifications.
The usage of quality metrics in a software development lifecycle requires considering three important aspects. Firstly, obtaining all mentioned-above measurements by hand would be misleading, therefore automated tools become required. Secondly, an automated prototype implementation should avoid the refusal of requirements engineers -this tool is created due to help in improvement of requirements elicitation process, but not for punishment and identifying failures. Finally, decisions about which attributes and metrics to apply should be wisely and gradually made: «Not everything that can be counted counts, and not everything that counts can be counted» -Albert Enshtein. Despite the fact that quantitative measurement is one of the foundations of modern empirical science, they should be used with caution and wisdom. Assessing the quality of requirements demands human judgment. This judgment can be assisted, but not replaced, by objective measurements. Automated tool that provides low-level quality indicators can provide valuable hints to improve high-level quality features of requirements. In this paper an automated approach for characterizing and identifing potentially nocuous ambiguities was described. Given a natural language requirements document, ambiguous instances contained in the sentences were first extracted. Identified ambiguities can be the reason of misunderstanding among different readers. The implementation can be usable by requirements analysts and will allow them to experiment with iterative identification of potential ambiguity moments in requirement documents.