/* Filename: DistanceMatrix.sas Purpose: SAS code to compute distance matrices. Choice of Euclidean, Manhattan (~Brainerd-Robinson) or Nei's genetic distance. Requires IML. Last Update: FDN 4.8.02 */ proc iml; ************************************************************************** Function: EUCDIST Description:Compute Euclidean distance between two row vectors. Arguments: X1,X2: the two row vectors. pct: set to 1 if you want X1 and X2 coverted to relative frequencies first. Returns: Euclidean distance between X1 and X2. ; start eucdist(x1,x2,pct); if pct=1 then do; x1=x1/sum(x1); x2=x2/sum(x2); end; dist= sqrt((x1-x2)*(x1-x2)`); return(dist); finish eucdist; ****************************************************************************; ************************************************************************** Function: NEIDIST Description:Compute Nei's (1972) distance between two row vectorsof counts. Arguments: X1,X2: the two row vectors. Returns: Nei's (genetic) distance between X1 and X2. ; start neidist(x1,x2); *assumes input as counts; x1=x1/sum(x1); x2=x2/sum(x2); dist=1-((x1*x2`)/sqrt(x1*x1`)#(x2*x2`)); *you can change "-log" to "1-"; return(dist); finish neidist; **************************************************************************; ************************************************************************** Function: MANHATDIST Description:Computes the Manhattan distancee between two row vectors. Arguments: X1,X2: the two row vectors. pct: set to 1 if you want X1 and X2 converted to relative frequencies first. Returns: Manhattan distance between X1 and X2. ; start manhatdist(x1,x2,pct); if pct=1 then do; x1=x1/sum(x1); x2=x2/sum(x2); end; dist= sum(abs(x1-x2)); return(dist); finish manhatdist; **************************************************************************; ************************************************************************** Function: DISTMAT Description: Compute a distance matrix for the rows of an input dataset. Arguments: XMAT: the input data set ROWNAME: a character variable that names the rows DISTYPE: 1 get you euclidean, 2 for manhattan, 3 for nei Returns: Dmat: the distance matrix FromMat: A matrix of row labels for the distances ToMat Another matrix of row labels; START distmat(xmat,rowname,disttype,DMAT,FROMMAT,TOMAT); n=nrow(xmat); dmat=j(n,n,0); FromMat= cshape(' ',n,n,8); ToMat=FromMat; do i=1 to n by 1; do j=1 to n by 1; if disttype=1 then dmat[i,j]= eucdist(xmat[i,],xmat[j,],1) ; if disttype=2 then dmat[i,j]= manhatdist(xmat[i,],xmat[j,],1); if disttype=3 then dmat[i,j]= neidist(xmat[i,],xmat[j,]) ; FromMat[i,j]=rowname[i,]; ToMat[i,j]=rowname[j,]; end; end; FINISH distmat ; ******************************************************************************************************; ****************************************************************************************************** Function: MATTOCOl Description:Puts the lower triangle of a symmetric matrix in a col vector Arguments: Dmat -- the symmetric matrix Returns: A column vector; START MATTOCOL (dmat); n=nrow(dmat); distvec= dmat[{1},{2}:n]`; do i={2} to (n-{1}); distvec=distvec//( dmat[i,(i+{1}):n])`; end; return (distvec); FINISH MATTOCOL; *******************************************************************************************************; /* ok -- let's do the business */ use braun1; read all into xmat [rowname=group colname=motifs] where (tclass='mm10_14'); *read the data into IML -- note the where clause; call distmat(xmat,group,1,DMat,FromMat,ToMat); *get the distance matrix; distvec= mattocol(dmat); *unwind the distances into a col vector; fromvec= mattocol(fromMat); *get some labels; toVec= matTocol(ToMat); fromtovec= concat(fromvec,tovec); *concatenate them into a single colvec; create distvec from distvec [colname='distance' rowname=fromtovec] ; append from distvec [rowname=fromtovec] ; close distvec; proc print; run;