The Procedure of Determining Degree of Similarity between Оbjects in Linguistic Examination
УДК 81-13 ББК 81.1
The article describes the research procedures for determining the degree of similarity between the objects of linguistic examination. The procedure is based on the combination of qualitative and quantitative methods and the use of computer tools. The main objective of this study is to describe ways of obtaining the most reliable results, which can be verified. For carrying out of research three sets are allocated: initial object, object compared with initial, and object-construct, which includes the characteristics common for the compared sets. The objects may be interrogation protocols, texts checked for plagiarism, trademark names, newspaper publications and other. Qualitative procedures imply singling out characteristics describing objects, while quantitative procedures allow normalization of parameters identified in the objects. The extraction of characteristics is carried out by expertise, relying on the classifications accepted in linguistics. The analysis of texts with identical content is carried out by selecting identical fragments and estimating their volumes. The analysis of the similarity of trademarks is carried out on the basis of phonetic, graphic, semantic and associative parameters. The analysis of texts for possible authorship is carried out with the help of lexical, morphological and syntactic data. On the basis of the obtained numerical indicators, similarity coefficients are calculated. Characteristics that involve a selection of a yes/no answer are marked with 1 for a positive answer and zero for a negative one. Characteristics that have a numerical expression are compared using correlation coefficients and are recognized as identical when the value is 0.7 or higher. Numeric analytical procedures are generally based on the use of computer services. The numeric data obtained are represented by the coefficients of Jaccard, Sørensen, Kulczynski and Ochiai.
