PCO: MS-DOS program for principal co-ordinate analysis

PCO ver. 2.0

Last update 20 May, 2005

Home

Descriptions

-------------------------------------------
I'm sorry that the PCO ver.1.0 has a mistake in its calculation process. If you have ever downloaded and installed PCO ver.1.0 on you computer, please replace it by the current version of PCO (i.e., PCO ver.2.0).
-------------------------------------------

PCO ver.2.0: MS-DOS program for principal coordinate analysis

Description

A small MS-DOS program for conducting principal co-ordinate analysis proposed by Gower (1966).
This program is written by C language based on Tanaka and Tarumi (1995).

Download

pco.zip (47kb)

This file contains the following files:

pco.exe ... Executable file of PCO
pco.c, matrix.c ... source files (C language)
infile.dat ... sample input file (tab delimited text format)
outfile.csv ... sample output file (CSV format)

Installation

Extract all files contained in "pco.zip", and copy them into a folder which locates anywhere you want.

An example of folder name）

c:\Apps\pco

Execution

Open a MS-DOS command prompt window.
type
cd [the folder (directory) name which you install pco files]
type
pco [input file name] [output file name]
That's all! Please enjoy PCO analysis!　

A screen shot of the MS-DOS command prompt window)

Input file for PCO

Input file is a tab delimeted text format which has a data structure described in the following.
You can input these data by MS-Excel, and save it as a tab delimeted text format file.
-----------------------------

[the number of samples]
[similarity (s) between the 1st and 1st samples] [s between the 1st and 2nd samples] ... [s between the 1st and n-th samples][s between the 2nd and 1st samples] ...                             [s bewteen 2nd and nth samples]
[s between the n-th and 1st samples] ...                             [s between n-th and n-th samples]

----------------------------
You can obtain a similarity matrix from your distance matrix in several ways. For example, you can calculate similarity between the i-th and j-th samples (the i x j th element of similarity matrix) as follows.

    e_ij = -d_ij^2 / 2

    e_ii = 0
where d_ij^2 means the squared distance between the i and j th samples (the i x j th element of squared distance matrix) (Tanaka and Tarumi 1995, P188).

If you can directly obtain the similarity matrix, please use it directly in PCO analysis (for example, it is a case of the Nei's genetic similarity matrix).
-------Caution-------
Please input a "Similarity" (not "Distance") matrix!
--------------------

A screen shot of data input work on Excel)

Output file of PCO

The output file is formated as a csv (camma separated value) file.
Its contents are as follows:

---------------------------------------
[SIMILARITY MATRIX]
,0.000000,-1.000000,-5.000000,-17.000000,-20.000000,-25.000000,-13.000000,-9.000000
,-1.000000,0.000000,-4.000000,-16.000000,-25.000000,-32.000000,-20.000000,-16.000000
,-5.000000,-4.000000,0.000000,-4.000000,-13.000000,-20.000000,-16.000000,-20.000000
,-17.000000,-16.000000,-4.000000,0.000000,-9.000000,-16.000000,-20.000000,-32.000000
,-20.000000,-25.000000,-13.000000,-9.000000,0.000000,-1.000000,-5.000000,-17.000000
,-25.000000,-32.000000,-20.000000,-16.000000,-1.000000,0.000000,-4.000000,-16.000000
,-13.000000,-20.000000,-16.000000,-20.000000,-5.000000,-4.000000,0.000000,-4.000000
,-9.000000,-16.000000,-20.000000,-32.000000,-17.000000,-16.000000,-4.000000,0.000000

[DOUBLE CENTERING MATRIX]
,10.000000,12.000000,4.000000,-4.000000,-10.000000,-12.000000,-4.000000,4.000000
,12.000000,16.000000,8.000000,0.000000,-12.000000,-16.000000,-8.000000,0.000000
,4.000000,8.000000,8.000000,8.000000,-4.000000,-8.000000,-8.000000,-8.000000
,-4.000000,0.000000,8.000000,16.000000,4.000000,0.000000,-8.000000,-16.000000
,-10.000000,-12.000000,-4.000000,4.000000,10.000000,12.000000,4.000000,-4.000000
,-12.000000,-16.000000,-8.000000,0.000000,12.000000,16.000000,8.000000,0.000000
,-4.000000,-8.000000,-8.000000,-8.000000,4.000000,8.000000,8.000000,8.000000
,4.000000,0.000000,-8.000000,-16.000000,-4.000000,0.000000,8.000000,16.000000

[EIGEN VALUE],PCO1,PCO2,PCO3,PCO4,PCO5,PCO6,PCO7,PCO8
,58.246211,41.753789,0.000000,0.000000,0.000000,-0.000000,-0.000000,-0.000000
[CONTRIBUTION],PCO1,PCO2,PCO3,PCO4,PCO5,PCO6,PCO7,PCO8
,0.582462,0.417538,0.000000,0.000000,0.000000,-0.000000,-0.000000,-0.000000
[CONTRIBUTION],PCO1,PCO2,PCO3,PCO4,PCO5,PCO6,PCO7,PCO8
,0.582462,0.417538,0.000000,0.000000,0.000000,-0.000000,-0.000000,-0.000000

[EIGEN VECTOR],PCO1,PCO2,PCO3,PCO4,PCO5,PCO6,PCO7,PCO8
,-0.374131,0.210324,-0.000000,0.000000,0.000000,0.000000,0.000000,0.903211
,-0.520188,0.075635,-0.023405,0.509192,0.000000,-0.057395,-0.637367,-0.233087
,-0.292113,-0.269379,-0.830017,-0.331105,0.000000,0.198909,-0.024201,-0.058272
,-0.064038,-0.614392,0.268143,-0.062099,0.707107,0.107478,-0.132401,0.116543
,0.374131,-0.210324,-0.320687,0.753554,0.000000,0.263075,0.185208,0.203951
,0.520188,-0.075635,-0.239004,-0.120162,-0.000000,-0.535427,-0.557773,0.233087
,0.292113,0.269379,0.082212,-0.202726,0.000000,0.760360,-0.461200,0.058272
,0.064038,0.614392,-0.268143,0.062099,0.707107,-0.107478,0.132401,-0.116543

[PCO SCORE],PCO1,PCO2,PCO3,PCO4,PCO5,PCO6,PCO7,PCO8
,-2.855339,1.359057,-0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
,-3.970030,0.488733,-0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
,-2.229382,-1.740649,-0.000000,-0.000000,0.000000,0.000000,0.000000,0.000000
,-0.488733,-3.970030,0.000000,-0.000000,0.000000,0.000000,0.000000,0.000000
,2.855339,-1.359057,-0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
,3.970030,-0.488733,-0.000000,-0.000000,-0.000000,0.000000,0.000000,0.000000
,2.229382,1.740649,0.000000,-0.000000,0.000000,0.000000,0.000000,0.000000
,0.488733,3.970030,-0.000000,0.000000,0.000000,0.000000,0.000000,0.000000

------------------------------------------

[SIMILARITY MATRIX] is a similarity matrix inputted by a user.
[DOUBLE CENTERING MATRIX] is a double centering matrix A. When the i x j th element of the similarity matrix is indicated as e_ij,

The i x j th element of the double centering matrix is calculated as
a_ij = e_ij - e_i. - e_.j + e_..

where e_ij is the i x j th element of the similarity matrix. e_i., e_.j, and e_.. are the averages of elements of i th row, j th colum, and overall of the similarity matrix, respectively.

[EIGEN VALUE] are the eigen values of matrix A

[CONTRIBUTION] and [CUMULATIVE CONTRIBUTION] are the contribuitons and cumulative contribution of eigen vectors of matrix A, respectively.

[EIGEN VECTOR] are the eigen vector of matrix A
[PCO SCORE] are the score of the principal coordinate obtained from your similarity matrix!
You can visualize the location of each sample on a principal co-ordinate plane using this matrix. The i x j th element of this matrix corresponds to the j th co-ordinate value of the i th sample. For example, the 1st, 2nd, and 3rd samples locate on (-2.85, 1.36), (-3.97, 0.49), (-2.23, -1.74) on the 1st and 2nd principal co-ordinate plane, respectively. Cumulative contribution reaches 1.0 at the 2nd co-ordinate, indicated all the information contained in the similarity matrix is explained by the 1st and 2nd principal co-ordinates.

References

Gower, J.C.　(1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53: 325-38.

Tanaka and Tarumi （1995） Handbook of statistical analysis for Windows (in Japanese)．Kyoritu-shuppan, Tokyo．