POSA Home | FATCAT | Other Servers | Godzik Lab | Sanford Burnham Prebys Medical Discovery Institute | Contact Us



References

Multiple flexible structure alignment using partial order graphs. Yuzhen Ye and Adam Godzik. Bioinformatics, 2005 21(10):2362-2369. (color figures)
POSA: a user-driven, interactive multiple protein structure alignment server. Z Li, P Natarajan, Y Ye, T Hrabe, A Godzik. Nucl. Acids Res. (2014) doi: 10.1093/nar/gku394

POSA terms

POSA: Partial Order Structure Alignment
POA: Partial Order Alignment
POG: Partial Order Graph
MPStrA: Multiple Protein Structure Aalignment

POSA alignment output

POSA was developed to generate partial order alignments of protein structures, meaning that the output alignments are partial order graphs. NO alignments in the typical column-row format will be provided.
POSA server provides two ways for displaying the partial order alignments: (1) browse the alignment through our iterative webpage, or (2) download the alignment files.

How to read POA text file

POSA generates partial order alignments, which means we cannot provide the typical column-row alignments in whatever formats. Instead we provide TWO kinds of alignment output that might be useful for some users.
Here we use the POSA result of 3 calmodulin-like proteins to demonstrate. This page provides links to the two alignment outputs shown below.
(1) Partial Order Graph (POG) along with amino acids
This file (show me the demonstration file) records the list of proteins between <PRO>...</PRO>: d1ncxa_ (index = 0), d2sasa_ (index = 1) and d1jfja_ (index = 2).

Between <NODE> and </NODE>, each line represents a node in the POG, recording residues aligned at this node. For examples,
"1 0 A.2.S" records node 1 (note: NODE indexes are from 0), meaning residue "S" in position 2 (position is defined as "resSeq number"+"insersion code", it is read from column 23-27 in ATOM section) of protein 0 (i.e., d1ncxa_) in Chain "A". Protein indexes are from 0. Protein names and indexes can be found in <PRO></PRO> tags. "A.2.S" is the residue identifier that defines which residue from protein 0.
"20 0 A.21.E 1 A.9.K 2 A.1.M" records node 20, which represents an aligned position common to all 3 input structures (i.e., protein 0 residue A.21.E, protein 1 residue A.9.K and protein 2 residue A.1.M).
Between <EDGE> and </EDGE>, each line records an edge connecting two nodes; e.g., "0 0 1" records edge 0 connecting node 0 and node 1.
Between <CORE> and </CORE>: parameters about the common core for the input structures. For example:
<CORE>0 3 132 2.80 1 2.32</CORE> Length or size of the POG is enclosed in the <LEN> tag. For example:
<LEN>201 208</LEN> Users may prepare alignments they desire by extracting alignment information from this file.

(2) Alignment of amino acids

This file (show me the demonstration file) explicitly lists the residues of aligned regions of at least 5-residues long shared by all input structures (but not aligned positions of a subset of the input structures), showing the lengths of the other spanning regions between {}.

The first line in the file shows that there are 6 such aligned regions, labeled as S1 - S6 (same as in the simplified POG graph).
The second line "d1ncxa_ {20}EFKAAFDM{0}FDADGGGDIST{1}.." shows residues 21-28 of 1ncxa_ (EFKAAFDM), 29-39 (FDADGGGDIST) and so on.

In addition, the qualified positions of the residues that are aligned in all input structures are recorded in the <ALN></ALN> tags. For (example),
"0 0 A.21.E 1 A.9.K 2 A.1.M" records the 1st aligned residues from protein 0, 1, 2. The first number "0" is the aligned residues index which starts from 0.

PDB inputs in a tar/zip file

POSA server accepts structures (in PDB format) packed in either a tar (gzipped or not) or a zip file as input. Be sure to pack (a) clean PDB files (e.g., use a filtered PDB file that only contains a specific chain you want to use instead of the original PDB file with multiple chains); and (b) all files are placed in the current directory (instead of files in a subdirectory or mixed).

E.g., you have three structures, AAAA.pdb, BBBB.pdb and CCCC.pdb in current directory; you can archive these pdbs in three ways that POSA server can recognize.

> tar cvf pdb.tar AAAA.pdb BBBB.pdb CCCC.pdb
> tar zcvf pdb.tar.gz AAAA.pdb BBBB.pdb CCCC.pdb
> zip pdb.zip AAAA.pdb BBBB.pdb CCCC.pdb

PDB inputs with user-defined constrains

To assist users to enter constrains on the input protein structures, the POSA server interface provides a convenient and flexible protein structure definition table.

There are 5 columns in the table. The first 2 columns PDB and Chain are required, which specify the amino acid chain to be aligned from each input protein structure (in PDB format). The input protein structure can be provided by a PDB code or SCOP code filled in PDB column or by uploading a local coordinate file in PDB format. User defined constrains are provided in the Segments, Reference and Other Chains columns.

Please use the example below to learn how to specify input structures with constrains and how they are going to be interpreted.
Please use this format for your input, use 'na' as a placeholder for undefined values. Use a dot(.) for un-specified chain Id, then the 1st chain will be used; If the chain id is a space, please enter 2 dots(..)

#PDB    Chain   Segments        Reference    OtherChains
1a9n    B       NA      No      Q
1urn    A       11-16,40-46,55-60,76-86 Yes     P
2err    A       NA      No      B
1yty    A       NA      No      C
2kg1    A       NA      No      B
If you copy and paste above text to the text area provided using the second input method or manually enter above values in the input table of the first input method in POSA homepage, the result will be as Job Home; Visualization of protein structure comparison in Jmol

#PDBChainSegmentsReferenceOtherChains
1a9nBNANoQ
1urnA11-16,40-46,55-60,76-86YesP
2errANANoB
1ytyANANoC
2kg1ANANoB


ColumnDescription
PDBInitial protein structures in PDB format or given by PDB or SCOP codes. Required
A PDB file of column1.pdb should be given or will be downloaded from PDB’s site or SCOP\'s site
ChainChain Id from column1.pdb that will be used for structure alignment. Required
SegmentsSegments defined by “resSeq-reqSeq” separated by comma, where "resSeq" is the residue sequence number field(column 23-26) in ATOM line.
Optional. Default=“NA”
It defines which Segments from the chain in column2 that will be used for structure alignment.
When absent, the whole chain will be used
ReferenceYes/No; Optional; default=“No”
Is this structure used as the reference, which means, all other structures will be superimposed onto this structure.
Only one reference is allowed (if more than one are given, the first reference structure will be used). If present and at this case, rigid pairwise structure alignment method will be used. Otherwise, POSA multiple structure alignment method will be used.
OtherChainsChain Ids separated by comma.
Optional. Default=“NA”.
These chains are not used when calculating structure alignment but are added in the post process for users to view using the same transformation matrix in the structure alignment.
NOTETo input structures using the second input method, please follow these format rules:
Comment Line: start with "#"
Columns use spaces or tabs (default) as delimiters
There is no space in each cell content
If a cell is empty, please use "na" or "NA" as a placeholder.

POSA structure display page

There are 2 frames(example). The top frame shows the corresponding protein sequence information for the protein structures superimposed. The bottom frame shows the three-dimensional superposition of proteins. The protein sequence information in the top frame could be protein sequence(example) or protein structure alignment in text format(example) or Simplified Partial Order Graph. There is a big structure view and several small structure views of a single protein structure in the bottom frame. The big view depicts all superimposed structures. Individual structures in the big view can be switched on and off. Moreover, any chains, segments or ligands can be switched on and off for better structure comparison. Structures in the other smaller views rotate synchronously when the "Synchronize" box (bottom of frame) is checked.

The PDB file names and chain Ids in the structure windows and in the top frame are named and colored correspondingly. The PDB files names are consistent with the "user's initial PDB file name or PDB id", plus "chain Id used as structure superposition", plus "segments id if user selected to use some segments rather than the full chain" (optional), plus "one or more chain Ids that user wants to included in the structure display" (these chains are not used as structure superpostion but are added later after the structure comparison is done on user-selected chains).

Example: protein structures of 3 antibodies(PGT135, VRC01 and VRC23) were compared. Antibody "PGT135" with PDB id 4jm2 has A,B,C,D,E chains. Antibody "VRC01" with PDB id 3ngb has A,D,G,I,B,E,H,J,C,F,K,L chains. Antibody "VRC23" with PDB id 4j6r has G,H,L chains. Although each antibody protein has more than one chains, the structure superposition is only allowed on a single chain (full or parts).
Note: if one prefers to use 2 or more chains from a PDB file to do structure superposition, then it is recommended to edit the PDB file to rename the chain ids of selected chains to be one single unique chain id.

Here the structure comparison were executed on the single antigen chains from each antibody (chain E from PGT135, chain G from VRC01 and chain G from VRC23). After structrue comparison is done on selected chains (or segments), one can choose to include any other chains or all chains from each antibody in the final superimposed structures for the purpose of display. These chains are added to the final display by structure transformation based on the transformation matrix used in the structure superpostion. In the this example, one of the heavy chains(4jm2:A; 3ngb:H; 4j6r:H) and one of the light chains(4jm2:B; 3ngb:L; 4j6r:L) were added in the final display.

How to visualize superimposed 3D structures

Multiple superimposed structures are visualized iteratively on our website. POSA currently supports only Jmol to display protein structures. In order to run Jmol, please make sure that Java is installed in your system and Java Applet Plug-in and Javascript are enabled in your web browser.