A Novel Mathematical Approach to Define the Genes/SNPs Conferring Risk or Protection in Sporadic Amyotrophic Lateral Sclerosis Based on Auto Contractive Map Neural Networks and Graph Theory.  
Jump to Full Text  
MedLine Citation:

PMID: 22934166 Owner: NLM Status: PubMednotMEDLINE 
Abstract/OtherAbstract:

Background. Complex diseases like amyotrophic lateral sclerosis (ALS) implicate phenotypic and genetic heterogeneity. Therefore, multiple genetic traits may show differential association with the disease. The Auto Contractive Map (AutoCM), belonging to the Artificial Neural Network (ANN) architecture, "spatializes" the correlation among variables by constructing a suitable embedding space where a visually transparent and cognitively natural notion such as "closeness" among variables reflects accurately their associations. Results. In this pilot casecontrol study single nucleotide polymorphism (SNP) in several genes has been evaluated with a novel data mining approach based on an AutoCM. We have divided the ALS dataset into two dataset: Cases and Control dataset; we have applied to each one, independently, the AutoCM algorithm. Six genetic variants were identified which differently contributed to the complexity of the system: three of the above genes/SNPs represent protective factors, APOA4, NOS3, and LPL, since their contribution to the whole complexity resulted to be as high as 0.17. On the other hand ADRB3, LIPC, and MMP3, whose hub relevancies contribution resulted to be as high as 0.13, seem to represent susceptibility factors. Conclusion. The biological information available on these six polymorphisms is consistent with possible pathogenetic pathways related to ALS. 
Authors:

Massimo Buscema; Silvana Penco; Enzo Grossi 
Related Documents
:

23783076  Optimization of selection for growth in menz sheep while minimizing inbreeding depressi... 23625196  The q192r polymorphism of the paraoxonase 1 gene is a risk factor for coronary artery d... 23107276  Association of mylip rs3757354 snp and several environmental factors with serum lipid l... 23742316  Rhd*weak partial 4.0 is associated with an altered rhce*ce(48c, 105t, 733g, 744c, 1025t... 22363536  Drb1*03:01 haplotypes: differential contribution to multiple sclerosis risk and specifi... 15358736  The developmental genetics of auditory hair cells. 
Publication Detail:

Type: Journal Article Date: 20120809 
Journal Detail:

Title: Neurology research international Volume: 2012 ISSN: 20901860 ISO Abbreviation: Neurol Res Int Publication Date: 2012 
Date Detail:

Created Date: 20120830 Completed Date: 20120830 Revised Date: 20130312 
Medline Journal Info:

Nlm Unique ID: 101543314 Medline TA: Neurol Res Int Country: United States 
Other Details:

Languages: eng Pagination: 478560 Citation Subset:  
Affiliation:

Semeion Research Center, Via Sersale 117, 00128 Rome, Italy. 
Export Citation:

APA/MLA Format Download EndNote Download BibTex 
MeSH Terms  
Descriptor/Qualifier:


Comments/Corrections 
Full Text  
Journal Information Journal ID (nlmta): Neurol Res Int Journal ID (isoabbrev): Neurol Res Int Journal ID (publisherid): NRI ISSN: 20901852 ISSN: 20901860 Publisher: Hindawi Publishing Corporation 
Article Information Download PDF Copyright © 2012 Massimo Buscema et al. openaccess: Received Day: 23 Month: 1 Year: 2012 Accepted Day: 5 Month: 4 Year: 2012 Print publication date: Year: 2012 Electronic publication date: Day: 9 Month: 8 Year: 2012 Volume: 2012Elocation ID: 478560 ID: 3425858 PubMed Id: 22934166 DOI: 10.1155/2012/478560 
A Novel Mathematical Approach to Define the Genes/SNPs Conferring Risk or Protection in Sporadic Amyotrophic Lateral Sclerosis Based on Auto Contractive Map Neural Networks and Graph Theory  
Massimo Buscema^{1}  
Silvana Penco^{2}*  
Enzo Grossi^{3}  
^{1}Semeion Research Center, Via Sersale 117, 00128 Rome, Italy 

^{2}Medical Genetics, Department of Laboratory Medicine, Niguarda Ca' Granda Hospital Piazzaza Ospedale Maggiore 3, 20100 Milan, Italy 

^{3}Medical Department, Bracco SpA, Via E. Folli 50, 20134 Milan, Italy 

Correspondence: *Silvana Penco: silvana.penco@ospedaleniguarda.it [other] Academic Editor: Changiz Geula 
Investigating the pattern of correlations among large numbers of variables in large databases is certainly a quite difficult task that is seriously demanding in both computational time and capacity. The statistically oriented literature has developed a variety of methods with different power and usability, all of which, however, share a few basic problems, among which the most outstanding are the nature of the a priori assumptions that have to be made on the datagenerating process, the near impossibility to compute all the joint probabilities among the vast number of possible couples and ntuples that are in principle necessary to reconstruct the underlying process' probability law, and the difficulty of organizing the output in an easily grasped, readytoaccess format for the nontechnical analyst. The consequence of the first two weaknesses is the fact that when analyzing poorly understood problems characterized by heterogeneous sets of potentially relevant variables, traditional methods can become very unreliable when not unusable. The consequence of the last one is that, also in the cases where traditional methods manage to provide a sensible output, their statement and implications can be so articulated to become practically unuseful or, even worse, easily misunderstood.
In this paper, we introduce a new methodology based on an Artificial Neural Network (ANN) architecture, the Auto Contractive Map (AutoCM) [^{1}], which allows for basic improvements in both robustness of use in badly specified and/or computationally demanding problems and output usability and intelligibility. In particular, AutoCMs “spatialize” the correlation among variables by constructing a suitable embedding space where a visually transparent and cognitively natural notion such as “closeness” among variables reflects accurately their associations. Through suitable optimization techniques that will be introduced and discussed in detail in what follows, “closeness” can be converted into a compelling graphtheoretic representation that picks all and only the relevant correlations and organizes them into a coherent picture. Such representation is not actually constructed through some form of cumbersome aggregation of twobytwo associations between couples of variables but rather by building a complex global picture of the whole pattern of variation. Moreover, it fully exploits the topological meaningfulness of graphtheoretic representations in that actual paths connecting nodes (variables) in the representation carry a definite meaning in terms of logical interdependence in explaining the data set's variability. We are aware of the fact that these techniques are novel and therefore not entirely understood so far in all of their properties and implications. However, we are convinced that their actual performance in the context of welldefined, wellunderstood problems provides an encouraging test to proceed in this direction. We applied this new approach in Amyotrophic Lateral Sclerosis (ALS), a fatal neurodegenerative condition causing progressive motor neuron loss, leading to death within a few years of onset. There is no effective treatment for this devastating disease, although riluzole is reported to have a mild effect in slowing its progression [^{2}, ^{3}]. This failure is likely to be related to the poor knowledge of the pathogenetic mechanisms of ALS, as well as to its heterogeneity. Genetic factors are known to play an important role; at least nine of genetically transmitted forms of ALS are known and several genes possibly influencing the occurrence and the phenotypical expression of ALS have been identified [^{4}, ^{5}]. However, the knowledge of the genes that may play a role and of the mechanisms by which they may cause the phenotype is still very incomplete. The identification of genes that underlie the sporadic and the genetic forms of ALS will be highly relevant since it will identify novel metabolic pathways involved in neurodegeneration.
Previously we approached the genetic of sporadic ALS (SALS) disease with artificial neural networks to identify a possible genetic background predisposing to the sporadic form. A dataset containing genetic data from 54 SALS cases and 208 controls was analyzed with three different analytical approaches: Linear Discriminant Analysis, Standard Artificial Neural Networks, and Advanced Intelligent Systems; with this latter approach the predictive accuracy to discriminate between cases and controls reached an average of 96% (range 94.4 to 97.6). In addition we identified seven genetic variants essential to differentiate cases from controls [^{6}].
The obtained results point out the need to employ systems really able to handle the disease complexity instead of treating the data with reductionistic approaches unable to detect multiple genes of smaller effect predisposing to the disease.
We report here the application of a new developed analytical approach to the SALS dataset, based on AutoCM system and Maximally regular graph theory.
The idea was to test the power of this new algorithm in a medical context such as SALS disease to shed light on the puzzling of the disease.
We used a previously described dataset [^{6}]. Briefly, genotypes derived from 60 biallelic polymorphisms within 35 genes that were selected from pathways of lipid and homocysteine metabolism, regulation of blood pressure, coagulation, inflammation, cellular adhesion, and matrix integrity. All subjects tested were of Caucasian origins, 54 were SALS and 208 were controls. SALS patients consist of 28 males (56.4 years; 46.9–65.8) and 26 females (62.9 years; 57.8–67.9), mean age at onset of disease 59.62 years; (range 53.7–65.5 years), clinical onset was spinal in 61.1% (33/54) and bulbar in 38.9% (21/54) of cases, mean disease duration at the time of observation was 3.2 years ( range 1–10 years).
Controls subjects were 144 males and 67 females; age range from 21 to 75 years, (average 38.94).
We begin our analysis with a relatively concise but technically detailed presentation of the ANN architecture that provides the basis for all of the subsequent analysis: the Auto Contractive Map (AutoCM) [^{7}, ^{8}]. The AutoCM is characterized by a threelayer architecture: an input layer, where the signal is captured from the environment, a hidden layer, where the signal is modulated inside the AutoCM, and an output layer, through which the AutoCM feeds back upon the environment on the basis of the stimuli previously received and processed.
Each layer contains an equal number of N units, so that the whole AutoCM is made of 3N units. The connections between the Input and the Hidden layers are monodedicated, whereas the ones between the hidden and the output layers are fully saturated, that is, at maximum gradient. Therefore, given N units, the total number of the connections, N_{c}, is given by Figure 1.
All of the connections of AutoCM may be initialized either by assigning a same, constant value to each, or by assigning values at random. The best practice is to initialize all the connections with a same, positive value, close to zero.
The learning algorithm of AutoCM may be summarized in a sequence of four characteristic steps:
 signal transfer from the input into the hidden layer;
 adaptation of the values of the connections between the input and the hidden layers;
 signal transfer from the hidden into the output layer;
 adaptation of the value of the connections between the hidden and the output layers.
Notice that steps 2 and 3 may take place in parallel.
We write as m^{[s]} the units of the input layer (sensors), scaled between 0 and 1; as m^{[h]} the units of the hidden layer as m^{[t]} the units of the output layer (system target). We moreover define v, the vector of monodedicated connections; w, the matrix of the connections between the hidden and the output layers; n, the discrete time that spans the evolution of the AutoCM weights, or, put another way, the number of cycles of processing, counting from zero and stepping up one unit at each completed round of computation: n ∈ T.
In order to specify the steps 1–4 that define the AutoCM algorithm, we have to define the corresponding signal forwardtransfer equations and the learning equations, as follows.

(a)
Signal transfer from the input to the hidden layer:
[Formula ID: EEq1]where C is a positive real number not lower than 1, which we will refer to as the contraction parameter (see below for comments), and where the (n) subscript has been omitted from the notation of the input layer units, as these remain constant at every cycle of processing. It is useful to set C=N2, where N is the number of variables considered.
(1)mi(n)[h]=mi[s](1−vi(n)C), 
(b)
Adaptation of the connections v_{i(n)}through the variation Δv_{i(n)}, which amounts to trapping the energy difference generated according to (1):
[Formula ID: EEq2]
(2)Δvi(n)=(mi[s]−mi(n)[h])·(1−vi(n)C),vi(n+1)=vi(n)+α·Δvi(n). 
(c)
Signal transfer from the hidden to the output layer:
[Formula ID: EEq4]
(3)Neti(n)=∑j=1Nmj(n)[h]·(1−wi,j(n)C), [Formula ID: EEq5]
(4)mi(n)[t]=mi(n)[h](1−Neti(n)C). 
(d)
Adaptation of the connections w_{i,j(n)}through the variation Δw_{i,j(n)}, which amounts, accordingly, to trapping the energy difference as to (4):
[Formula ID: EEq6]
(5)Δwi,j(n)=(mi(n)[h]−mi(n)[t])·(1−wi,j(n)C)·mj(n)[h],wi,j(n+1)=wi,j(n)+α·Δwi,j(n).
First of all, we need to specify that α is the learning coefficient of AutoCM. This coefficient has to be chosen taking into consideration 3 different condition:
 AutoCM weights are updated at every cycle;
 the order of selection of any record at each epoch is random (a epoch is the number of cycles we need to update every record of the dataset);
 after every cycle the AutoCM is closer to its converge point, T, and the amount of updating between n = 0 and n = T decreases up to zero.
For this reason it is necessary to set up the learning coefficient in a way that AutoCM can update its weights after a reasonable number of epochs, without to be influenced by the random order of the records at each cycle.
Consequently, we suggest to chose the learning coefficient taking into account the contractive factor, C, the number of variables, N, and the number of records, M, of the assigned dataset:
(6)
α=NM·C. 
There are a few important peculiarities of AutoCMs [^{9}–^{12}] with respect to more familiar classes of ANNs that need special attention and call for careful reflection.
 (i) AutoCMs are able to learn also when starting from initializations where all connections are set at the same value, that is, they do not suffer the problem of the symmetric connections.
 (ii) During the training process, AutoCMs always assign positive values to connections. In other words, AutoCMs do not allow for inhibitory relations among nodes, but only for different strengths of excitatory connections.
 (iii) AutoCMs can learn also in difficult conditions, namely, when the connections of the main diagonal of the second layer connection matrix are removed. In the context of this kind of learning process, AutoCMs seem to reconstruct the relationship occurring between each couple of variables. Consequently, from an experimental point of view, it seems that the ranking of its connections matrix translates into the ranking of the joint probability of occurrence of each couple of variables.
 (iv) Once the learning process has occurred, any input vector, belonging to the training set, will generate a null output vector. So, the energy minimization of the training vectors is represented by a function trough which the trained connections absorb completely the input training vectors. Thus, AutoCM seems to learn how to transform itself in a “dark body”.

(v)
At the end of the training phase (Δw_{i,j} = 0), all the components of the weights vector vreach up the same value:
[Formula ID: EEq8]The matrix w, then, represents the AutoCM knowledge about the whole dataset.
(7)limn→∞vi(n)=C.
One can use the information embedded in the w matrix to compute in a natural way the joint probability of occurrence among variables:
(8)
pi,j=wi,j∑j=1Nwi,j; 
(9)
P(mj[s])=∑iNpi,j=1. 
(10)
P(mi[t]mj[s])=pi,j. 

(i)
Alternatively, the matrix w may be transformed into a nonEuclidean distance metric (semimetric), when we train the AutoCM with the main diagonal of the w matrix fixed at value N. Now, if we consider N as a limit value for all the weights of the w matrix, we can write
[Formula ID: EEq12]The new matrix d is again a squared symmetric matrix, where the main diagonal entries are null (i.e., they represent the zero distance of each variable from itself), and where the offdiagonal entries represent “distances” between each couple of variables.
(11)di,j=N−wi,j.
Equation (11) transforms the squared weights matrix of AutoCM into a squared matrix of distances among nodes. Each distance between a pair of nodes may therefore be regarded as the weighted edge between these pair of nodes in a suitable graphtheoretic representation, so that the matrix d itself may be analyzed through the graph theory toolbox.
A graph is a mathematical abstraction that is useful for solving many kinds of problems. Fundamentally, a graph consists of a set of vertices, and a set of edges, where an edge is an object that connects two vertices in the graph. More precisely, a graph is a pair (V, E), where V is a finite set and E is a binary relation on V, to which it is possible to associate scalar values (in this case, the distances d_{i,j}).
At this point, it is useful to introduce the concept of Minimum Spanning Tree (M.S.T.) [^{9}–^{12}].
The Minimum Spanning Tree problem is defined as follows: find an acyclic subset T of E that connects all of the vertices in the graph and whose total weight (namely, the total distance) is minimized, where the total weight is given by:
(12)
d(T)=∑i=0N−1∑j=i+1Ndi,j, ∀di,j. 
(13)
Mst=Min{d(Tk)}. 
From conceptual point of view, the MST represents the energy minimization state of a structure. In fact, if we consider the atomic elements of a structure as vertices of a graph and the strength among them as the weight of each edge, linking a pair of vertices, the MST represents the minimum of energy needed so that all the elements of the structure preserve their mutual coherence. In a closed system, all the components tend to minimize the overall energy. So the MST, in specific situations, can represent the most probable state for the system to tend.
To determine the MST of an undirected graph, each edge of the graph has to be weighted. Equation (11) shows a way to weight each edge whose nodes are the variables of a dataset, and where the weights of a trained AutoCM provide the (weight) metrics.
Obviously, it is possible to use any kind of AutoAssociative ANN or any kind of Linear AutoAssociator to generate a weight matrix among the variables of an assigned dataset. But it is hard to train a twolayer AutoAssociative Back Propagation ANN with the main diagonal weights fixed (to avoid autocorrelation problems). In most cases, the Root Mean Square Error (RMSE) stops to decrease after a few epochs, and especially when the orthogonality of the records is relatively high, a circumstance that is frequent when it is necessary to weight the distance among the records of the assigned dataset. In this case, it is necessary to train the transposed matrix of the dataset. By the way, if a linear AutoAssociator is used to the purpose, all of the nonlinear association among variables will be lost.
Therefore, AutoCM seems to be the best choice to date to compute a complete and a nonlinear matrix of weights among variables or among records of any assigned dataset.
Now we introduce a new indicator: the degree of protection of each node in any a directed graph.
This indicator defines the rank of centrality of each node within the graph, when an iterative pruning algorithm is applied. The pruning algorithm was found and applied for the first time as a global indicator for graph complexity by Giulia Massini at Semeion Research Center in 2006 [^{13}]: (See Algorithm 1).
The higher the rank of a node, the bigger the centrality of its position within the graph. The latest nodes to be pruned are also the kernel nodes of the graph. In the present paper, this algorithm is generalized to measure the global complexity of any kind of graph.
The pruning algorithm can be used also to define the quantity of graph complexity of any graph. If we take μ as the mean number of nodes without any link, at each iteration, as the pruning algorithm is running, we can define the hubness Index, H_{0}, of a graph with N nodes. In order to properly define this quantity, we need to introduce a few preliminary concepts. We define a cycle or iteration of the pruning algorithm as a given round of application of the algorithm. At each cycle, there corresponds a gradient, which can be different from cycle to cycle. Insofar as two subsequent cycles yield the same gradient, they belong to the same pruning class. As the gradient changes degli one cycle to the other, the previous class ends and a new one begins. We are now in the position to define hubness as follows:
(14)
H0=μ·φ−1A; 0<H0<2; 
(15)
μ=1M∑iMNdi=NM; 
(16)
φ=1P∑jPSTGj. 
Equation (15) measures the mean gradient of the graph.
Equation (16) measures the dynamics of the gradient changes during the pruning process.
Equation (14) is a complex ratio between the mean gradient and the dynamics of this gradient, from one side and the global graph connectivity from the other side.
Using H_{0} as a global indicator, it is possible to define to what extent a graph is hub oriented.
The H indicator (14), (15), and (16) represents the global hubness of graph. When H = 0, the tree is a onedimensional line and its complexity is minimal. When H = 1, the tree presents only one hub, and its complexity is the maximum than a tree can attain. The complexity of a graph, in fact, is connected to its entropy. The quantity of information in a graph is linked to the graph diameter and to the connectivity of the vertices: given the number of vertices, the shorter the diameter, the bigger the entropy. Starting from the classical notion of entropy we can thus write
(17)
E=−K·∑iNpi·ln(pi). 
(18)
E(G)=−AM·∑iNCiA·ln(CiA); 0<E(G)<∞, 
The quantity C_{i}/A measures the probability that a generic node C_{j}, where j ≠ i, has to be directly linked to the node C_{i}. This means that the entropy of a graph, E(G), will increase when the number of vertices with a large number of links increases. Accordingly, the probability to arrange the links of N vertices, using a random process, into a linear chain is the lowest. Therefore, the higher the number of pruning cycles, M, needed for a graph, the smaller is graph entropy. Equation (18) shows clearly that a “hub tree” has more entropy than a “chain tree”. Consequently, when the H index of a tree increases, its redundancy increases as well.
Considering how the structure of a given graph is changed by a pruning process, it becomes natural to think of what happens to graphs, and in particular to MSTs, as one or more of their nodes are deleted. In which way will the graph has to be organized to continue to reflect as best as possible the underlying structure of relationships once one or more nodes are taken away? How will the other nodes rearrange their links on the basis of the underlying metric and constraints, to connect each other once again?
Define a H index for each one of N different MSTs, generated from the original distance matrix by deleting one different vertex at each step:
(19)
Hi=μi·φi−1A−1; 0<Hi<2;μi=1M∑jMNdj=NM;φi=1P∑kPSTGk . 
Each H_{i} represents the tree complexity of the same, original distance matrix when the i^{th} vertex is deleted. Consequently, the difference between the complexity of the whole MST (i.e., H_{0}) and the complexity of any of the MSTs that are obtained by deleting one of the graph vertices (H_{i}), is the measure of the contribution of that specific (i) vertex of the graph to the original graph's global complexity:
(20)
δHi=H0−Hi. 
The MST represents what we could call the “nervous system” of any dataset. In fact, summing up all of the connection strengths among all the variables, we get the total energy of that system. The MST selects only the connections that minimize this energy, that is, the only ones that are really necessary to keep the system coherent. Consequently, all the links included in the MST are fundamental, but, on the contrary, not every “fundamental” link of the dataset needs to be in the MST. Such limit is intrinsic to the nature of MST itself: every link that gives rise to a cycle into the graph (viz., that destroys the graph's “treeness”) is eliminated, whatever its strength and meaningfulness. To fix this shortcoming and to better capture the intrinsic complexity of a dataset, it is necessary to add more links to the MST, according to two criteria:
 the new links have to be relevant from a quantitative point of view;
 the new links have to be able to generate new cyclic regular microstructures, from a qualitative point of view.
Consequently, the MST treegraph is transformed into an undirected graph with cycles. Because of the cycles, the new graph is a dynamic system, involving in its structure the time dimension. This is the reason why this new graph should provide information not only about the structure but also about the functions of the variables of the dataset.
To build the new graph, we need to proceed as follows:
 assume the MST structure as the starting point of the new graph;
 consider the sorted list of the connections skipped during the derivation of the MST;
 estimate the H function of the new graph each time that you add a new connection to the MST basic structure, to monitor the variation of the complexity of the new graph at every step.
We will call Maximally Regular Graph (MRG) the graph whose H function attains the highest value among all the graphs generated by adding back to the original MST, one by one, the missing connections previously skipped during the computation of the MST itself. Starting from (14), the MRG may be characterized as follows:
(21)
Hi=f(G(Ai,N))/“Generic Function on a graph with Ai arcs and N nodes at i th test” Hi=μi·φi− 1Ai/“Calculation of H Function, where H0 represents MST complexity”/H∗=Max{Hi}/“Graph with highest H= MRG”/R∗=Max arg{Hi}/“Number of links added by MRG”/i∈[0,1,2,…,R]/“Index of H Function”/N−1<Ai<N·(N−1)2“interval of the number of graph arcs”/R∈[0,1,…,(N−1)·(N−2)2]/“Number of the skipped arcs during the MST generation”/ 
The R number is a key variable during the computation of the MRG. R could in fact be also null, when the computation of the MST calls for no connections to be skipped. In this case, there is no MRG for that dataset.
R, moreover, makes sure that the last, and consequently the weakest, connection added to generate the MRG is always more relevant that the weakest connection of the MST. The MRG, finally, generates, starting from the MST, the graph presenting the highest number of regular microstructures that make use of the most important connections of the dataset. The higher the value of the H Function at the connections selected to generate the MRG, the more meaningful the microstructures of the MRG.
The MRG calculation is also useful to define the MST compactness: less is the number of arcs skipped during the MST generation, more the MST is representative; in other terms:
(22)
compactness (Mst)ϕ=1.0−RP;P=(N−1)·(N−2)2. 

(i)
the MRG Hubness, H*, (21):
[Formula ID: EEq42]
(23)H•=H∗−H0H∗ 
(ii)
and the number of new links added by MRG generation:
[Formula ID: EEq43]
(24)R•=R∗R.
(25)
relevance (MRG)φ=(H•+R•)−(H•·R•). 
(26)
information (MRG)ψ=ϕ·φ. 
We have divided the ALS dataset into: the Cases dataset (58 records) and the Control dataset (207 records). Then we have independently applied to each one the AutoCM algorithm. The AutoCM algorithm generates two weighted MST and the Delta H function points out the key variables of the two datasets (see Figure 2(a) and Figure 2(b)). The two MSTs are different topologically and locally (different variables connections) and the Delta H function shows a very interesting situation (see Table 1).
 3 variables (APOA4_glu360his, NOS3_A_922_G, LPL_ser447term) seem to be the reason of the low complexity of the cases MST: when each one of them is removed, the MST increases its complexity, taking the same H value of the global MST of the control dataset (H Cases = 0.171429 versus H Control = 0.17193);
 3 variables (ADRB3_trp64arg, LIPC_C_480_T, MMP3_5A_6A) seem to be the reason of the high complexity of the control MST: when each one of them is removed, the MST decreases its complexity, taking the same H value of the global MST of the cases dataset (H Cases = 0.137427 versus H Control = 0.136905);
If these considerations should have a biological reason, the AutoCM algorithm and the Delta Function procedure have shown to be very capable to catch the hidden information into the medical datasets.
As a second step of this analysis, we have calculated the MRG of the two dataset (see Figures 3(a) and 3(b)). Also in this case the MRG shows a low complexity of hubbness and Links in the cases dataset and a very high complexity in the control dataset. This seems to confirm that in an ideal health condition the living organisms manifest a high ratio of complex regularity and redundancy of structures and functions.
Healthy physiologic function is characterized by a complex interaction of multiple control mechanisms that enable an individual to adapt to the exigencies and unpredictable changes of everyday life. The disease process appears to be marked by a progressive impairment in these mechanisms, resulting in a loss of dynamic range in physiologic function and, consequently, a reduced capacity to adapt to stress. The emerging concept is that loss of redundancy, entropy and complexity is an hallmark of disease and in particular of chronic diseases.
Defining and quantifying the complexity of variables interactions are very difficult tasks from a mathematical point of view. Complex network theory by establishing criteria to define hubs in a particular variables network provides a framework on which building up parameters corresponding to an increase or loss of complexity in relation to the presence or absence of a particular variable in a variables set.
In this paper we have applied a novel revolutionary methodology to establish which of polymorphisms potentially involved in SALS occurrence play a fundamental role in protecting or in increasing the vulnerability for the disease occurrence increasing or reducing the hubness of a graph encoding the dynamic relation among genotypes many to many.
Six genetic variants were identified which differently contributed to the complexity of the system: apolipoprotein AIV (APOAIV) glu360his (rs5110), nitric oxide synthase 3 (NOS3)922A/G (rs1800779), lipoprotein lipase (LPL) ser447term (rs328), adrenergic, beta3 receptor (ADRB3) trp64arg (rs4994), hepatic lipase (LIPC)480C/T (rs1800588) and matrix metallopeptidase 3 (MMP3)1171 5A/6A (rs3025058). Three of the above genes/SNPs represent protective factors, APOA4 glu360his, NOS3922A/G and LPL ser447term, since their contribution to the whole complexity resulted to be as high as 0.17 (see table 1). On the other hand ADRB3 trp64arg, LIPC480C/T, and MMP31171 5A/6A, whose hub relevancies resulted to be as high as 0.13, seem to represent susceptibility factors (see Table 1).
Among the genes/SNPs conferring risk or protection from the disease, we noted that four of these are involved in the lipid pathways, APOA4, LPL, LIPC, ADRB3 while two are involved also in oxidative stress, angiogenesis, and cellular cytoskeletal (NOS3 and MMP3).
The protective genes/SNPs here identified include the gene for apo AIV, mapping on chromosome 11q2 and coding a glycoprotein whose primary translation product is a 396residue preprotein which after proteolytic processing is secreted. Although its precise function is not known, apo AIV is a potent activator of lecithincholesterol acyltransferase in vitro and displays antioxidant and antiatherogenic properties in vitro, and the antiatherogenic properties of apoAIV suggest that this protein may act as an antiinflammatory agent [^{14}]. The second protective gene/SNPs still involved in lipid pathways is LPL ser447term; the gene maps on chromosome 8p22 and encodes a lipoprotein lipase, which is expressed in heart, muscle, and adipose tissue. LPL has the dual functions of triglyceride hydrolase and ligand/bridging factor for receptormediated lipoprotein uptake. Several DNA variants at the LPL gene locus have been found to be associated with the plasma lipid levels, in particular the Ser447ter has the potential to elevate the plasma highdensity lipoprotein (HDL) levels [^{15}]. The role of HDL in ALS disease is still controversial, hyperlipidemia was shown to be a significant prognostic factor for survival of patients with ALS, linked to a better outcome [^{16}]. However, recent findings in Italian ALS patients did not support this observation, even though some evidence emerged that respiratory impairment, but not a worse clinical status or a lower body mass index, was related to a decrease in blood lipids and LDL/HDL ratio [^{17}].
The last protective factor, NOS3922A/G variant, belongs to a gene localized to chromosome 7q36 and coding the cytosolic enzyme of endothelial cells, a key actor in the process of modulation of vascular tone by producing nitric oxide (NO), a vasodilator agent. Constitutive NO release from microvascular endothelium seems to be responsible to prevent leukocyte margination under physiological conditions by modulating oxidative metabolism in endothelial cells. In this mechanism NO act as antioxidant agent to prevent the formation of ironmediated hydroperoxide. Accumulating evidences indicate that ALS is associated with oxidative damage induced by free radicals. Enhancement of oxidative damage markers and signs of increased compensatory response to oxidative stress was found in patients with SALS [^{18}], and since different antioxidant systems seem to be involved in ALS compared to other neurodegenerative diseases, oxidative stress may be a cause rather than a consequence of the neuronal death [^{19}].
Considering now the vulnerability factors, the LIPC480C/T belongs to a gene located on chromosome 15q21–23 and coding a glycoprotein involved in metabolism of several lipoproteins. The C/T substitution at −480 of the promoter region of the gene has been shown to be significantly associated to lower lipase activity [^{20}] and it is also involved in antiinflammatory and antioxidant activity [^{21}]. Again, the lipid pathway is still involved.
The ADRB3 gene has been localized to chromosome 8p128p11.1 and it codes for a member of the adrenergic receptor group of Gproteincoupled receptors; it is located mainly in adipose tissue and is involved in the regulation of lipolysis and thermogenesis. Some β3 agonists have demonstrated antistress effects in animal studies, suggesting it also has a role in the CNS [^{22}]. In addition, the trp64arg polymorphism seems to be associated to an increased BMI [^{23}] and recently this polymorphism seems to be associated with elite endurance performance [^{24}]. This is quite interesting since the literature supported the concept of soccer, and consequent head trauma, and ALS being interrelated, with high levels of athleticism/physical activity perhaps playing an additive part in the pathogenesis of the disease [^{25}]. Even though the role of exposure to physical exercise, together with trauma, in ALS has been debated, in a recent pilot study comprising 61 patients and 112 controls the authors demonstrate that physical exercise but not with traumatic events [^{26}] is related.
Regarding the last at risk factor, MMP31171 5A/6, this belongs to a gene mapping on chromosome 11q22.3 and coding a protein of the matrix metalloproteinase family (MMPs). MMPs a family of zincdependent endoproteinases, are effector molecules in the breakdown of the bloodbrain and bloodnerve barrier, and promote neural tissue invasion by leukocytes in inflammatory diseases of the central and peripheral nervous systems. Moreover, MMPs play an important role in synaptic remodeling, neuronal regeneration, and remyelination [^{27}]. MMPs have been suggested to play an important role in ALS pathology and several studies are still ongoing both in animal models as well as in human to find evidence of that link.
We know that motor neuron death in ALS is the culmination of multiple aberrant biological process involving also nonneuronal cells such microglia and astrocyte, what emerge from our data is that lipid homeostasis, oxidative stress and cellular remodelling are strictly related to ALS. We have just previously commented the role of the specific here identified variants in the cellular/molecular pathways. A recent finding has been reported on how lipid molecules can induce the cytotoxic aggregation of Cu/Zn superoxide dismutase, the major gene linked to the familial and sporadic form of the disease, under physiological conditions suggesting that it might provide a possible mechanism for the pathogenesis of ALS [^{28}]. Recently, lower serum lipid levels are shown to be related to respiratory impairment in patients with ALS [^{29}].In addition, in amyotrophic lateral sclerosisparkinsonism dementia complex (ALSPDC) common in the western Pacific area and repeatedly linked to the use of seeds of various species of cycad, it has been demonstrated in vitro the effects of cholesterol βDglucoside, cholesterol and cycad phytosterol glucosides on respiration and reactive oxygen species generation in brain mitochondria [^{30}]. Indeed cholesterol homeostasis dysfunctions may lead to human brain disease such as Alzheimer's disease [^{31}] and Huntington's disease [^{32}], for example.
In a first work about ALS [^{6}], we showed an evolutionary method to select the most predictive variables able to distinguish between ALS patients and controls. In that work the question was which are the independent variables whose a priori probability distribution separates in a better way cases from controls? A set of seven variables showed to do this job in a suitable manner (an average accuracy in blind testing of 96%).
In this work we pose to the scientific community a different question: which genetic polymorphisms (variables) protect or make more vulnerable the ALS patients and the control subjects?
There is not a necessary intersection between these two questions: small differences in an organ at work can produce big differences in symptoms, because of the interactions with other organs. Therefore, some polymorphisms can work as more evident symptoms of a disease without to be the main reason of that disease. In the same way, the seven variables of the previous work can be optimal predictors of the ALS, without to be the main reason of the ALS syndromes: they are useful to recognize the ALS, but they are not a necessary explanation of the ALS.
The more predictive features in a disease are not necessary the same features able to explain better the dynamics of that disease; an example: in the case of alcohol addiction, the main reason to become an alcoholic could be a sociopsychological condition, but the more predictive features to understand if someone is an alcoholic can be the analysis of the functional state of his/her liver.
In the actual work, using a completely new adaptive algorithm, we have tried to understand which genetic polymorphisms explain better the deep difference between Cases and Controls. In other words how all the polymorphisms are arranged in different networks, with different links and connections strength, into the two subsamples.
We applied here a revolutionary methodology able to deal with complex disease such as sporadic ALS. This new approach allowed to identify genes/SNPs conferring susceptibility or protection to the disease, we were not able to discriminate which allele of the six variants identified is really involved, and this is due to how the database was realized. From the dataset here analyzed we extrapolate biological information coherent with possible pathogenetic pathways related to ALS. Our data clearly demonstrate the power of this new approach and it would be of great interest to test with other more complex ALS database to get more information.
M. Buscema performed the statistical analysis and developed the intelligent systems. S. Penco participated in the design of the study, coordinated and drafted the paper. E. Grossi participated in the design of the study, in the statistical analysis, coordinated and drafted the paper. All authors read and approved the final paper.
References
1.  Buscema M. A novel adapting mapping method for emergent properties discovery in data bases:experience in medical fieldProceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '07)October 2007Montreal, Canada 
2.  Tandan R,Bradley WG. Amyotrophic lateral sclerosis: part I. Clinical features, pathology, and ethical issues in managementAnnals of NeurologyYear: 19851832712802s2.000218091734051456 
3.  Miller RG,Mitchell JD,Lyon M,Moore DH. Riluzole for amyotrophic lateral sclerosis (ALS)/motor neuron disease (MND)Cochrane Database of Systematic ReviewsYear: 20071CD0014472s2.034247887576 
4.  BelezaMeireles A,AlChalabi A. Genetic studies of amyotrophic lateral sclerosis: controversies and perspectivesAmyotrophic Lateral SclerosisYear: 20091011142s2.06154912270819110986 
5.  Perry JJP,Shin DS,Tainer JA. Amyotrophic lateral sclerosisAdvances in Experimental Medicine and BiologyYear: 20106859202s2.07795588911220687491 
6.  Penco S,Buscema M,Patrosso M,Marocchi A,Grossi E. New application of intelligent agents in sporadic amyotrophic lateral sclerosis identifies unexpected specific genetic backgroundBMC BioinformaticsYear: 20089p. 2542s2.046949107361 
7.  Buscema MSquashing theory and contractive map networkSemeion Technical PaperYear: 200732 
8.  Buscema M,Helgason C,Grossi E. Auto contractive maps, H function and maximally regular graph: theory and applicationsProceedings of the IEEE Special Session on Artificial Adaptive Systems in Medicine: applications in the real world (NAFIPS '08)May 2008New York, NY, USA 
9.  Buscema M,Grossi EArtificial Adaptive Systems in MedicineYear: 2009Oak Park, Ill, USAeBooksBentham Science Publishers 
10.  Kruskal B. On the shortest spanning subtree of a graph and the traveling salesman problemProceedings of the American Mathematical SocietyYear: 1956714850 
11.  Karger DR,Klein PN,Tarjan RE. Randomized lineartime algorithm to find minimum spanning treesJournal of the ACMYear: 19954223213282s2.00029276335 
12.  Barabási AL. Network medicinefrom obesity to the ‘Diseasome’New England Journal of MedicineYear: 200735744044072s2.03454714698517652657 
13.  Massini G. Tree Visualizer, Semeion Software #40, ver. 8.0, 2006–2011, Rome. 
14.  Vowinkel T,Mori M,Krieglstein CF,et al. Apolipoprotein AIV inhibits experimental colitisJournal of Clinical InvestigationYear: 200411422602692s2.0404316572415254593 
15.  Lu Y,Dollé MET,Imholz S,et al. Multiple genetic variants along candidate pathways influence plasma highdensity lipoprotein cholesterol concentrationsJournal of Lipid ResearchYear: 20084912258225892s2.05964909936918660489 
16.  Dupuis L,Corcia P,Fergani A,et al. Dyslipidemia is a protective factor in amyotrophic lateral sclerosisNeurologyYear: 20087013100410092s2.04134912269718199832 
17.  Chiò A,Calvo A,Ilardi A,et al. Lower serum lipid levels are related to respiratory impairment in patients with ALSNeurologyYear: 20097320168116852s2.07344909778919917991 
18.  Siciliano G,D’Avino C,Del Corona A,et al. Impaired oxidative metabolism and lipid peroxidation in exercising muscle from ALS patientsAmyotrophic Lateral Sclerosis and Other Motor Neuron DisordersYear: 20023257622s2.0003638839912215226 
19.  Baillet A,Chanteperdrix V,Trocmé C,Casez P,Garrel C,Besson G. The role of oxidative stress in amyotrophic lateral sclerosis and Parkinson’s diseaseNeurochemical ResearchYear: 20103510153015372s2.07795756299320535556 
20.  Fan YM,Raitakari OT,Kähönen M,et al. Hepatic lipase promoter C480T polymorphism is associated with serum lipids levels, but not subclinical atherosclerosis: the Cardiovascular Risk in Young Finns StudyClinical GeneticsYear: 200976146532s2.06765090125419558527 
21.  McCaskie PA,Cadby G,Hung J,et al. The C480T hepatic lipase polymorphism is associated with HDLC but not with risk of coronary heart diseaseClinical GeneticsYear: 20067021141212s2.03374584554416879193 
22.  Stemmelin J,Cohen C,Yalcin I,Keane P,Griebel G. Implication of β_{3}adrenoceptors in the antidepressantlike effects of amibegron using Adrb3 knockout mice in the chronic mild stressBehavioural Brain ResearchYear: 2010206231031219744528 
23.  Kadowaki H,Yasuda K,Iwamoto K,et al. A mutation in the β_{3}adrenergic receptor gene is associated with obesity and byperinsulinemia in Japanese subjectsBiochemical and Biophysical Research CommunicationsYear: 199521525555607487991 
24.  Santiago C,Ruiz JR,Buxens A,et al. Trp64Arg polymorphism in ADRB3 gene is associated with elite endurance performanceBritish Journal of Sports MedicineYear: 20114521471492s2.07955150259119553224 
25.  Piazza O,Sirén AL,Ehrenreich H. Soccer, neurotrauma and amyotrophic lateral sclerosis: is there a connection?Current Medical Research and OpinionYear: 20042045055082s2.0194250276215119987 
26.  Beghi E,Logroscino G,Chiò A,et al. Amyotrophic lateral sclerosis, physical exercise, trauma and sports: results of a populationbased pilot casecontrol studyAmyotrophic Lateral SclerosisYear: 20101132892922s2.07795216758820433412 
27.  Renaud S,Leppert D. Matrix metalloproteinases in neuromuscular diseaseMuscle and NerveYear: 20073611132s2.03434735123917410592 
28.  Choi I,In Yang Y,Song HD,et al. Lipid molecules induce the cytotoxic aggregation of Cu/Zn superoxide dismutase with structurally disordered regionsBiochimica et Biophysica ActaYear: 201118121414820837142 
29.  Chiò A,Calvo A,Ilardi A,et al. Lower serum lipid levels are related to respiratory impairment in patients with ALSNeurologyYear: 20097320168116852s2.07344909778919917991 
30.  Panov A,Kubalik N,Brooks BR,Shaw CA. In vitro effects of cholesterol βdglucoside, cholesterol and cycad phytosterol glucosides on respiration and reactive oxygen species generation in brain mitochondriaJournal of Membrane BiologyYear: 20102372371772s2.07864996052120938651 
31.  Adibhatla RM,Hatcher JF. Altered lipid metabolism in brain injury and disordersSubCellular BiochemistryYear: 2008492412682s2.05814930884418751914 
32.  Valenza M,Cattaneo E. Cholesterol dysfunction in neurodegenerative diseases: is Huntington’s disease in the list?Progress in NeurobiologyYear: 20068041651762s2.03375139920217067733 
Article Categories:

Previous Document: The Role of HCN Channels on Membrane Excitability in the Nervous System.
Next Document: Sertraline may improve language developmental trajectory in young children with fragile x syndrome: ...