The characterization of HIV-1 k-mer space - a prerequisite to algorithmic vaccine design
(Download the jar file here).

Introduction
In Archer et al. [1] we present an algorithm that generates artificial sequence constructs that can be used to quantify the extent of HIV-1 k-mer space; a prerequisite to algorithmic vaccine design strategies for HIV-1. The algorithm selects high-frequency k-mers in the sequence alignment to be covered and pieces them together in optimized permutations that maintain their original position within the gene. An implementation of the algorithm in the form of an executable jar file is available on this website.

Screen Shots
coverage tool interface Figure 1: User interface.
fasta formated aligned amino acid sequences Figure 2: Sample data - aligned amino acid residues in fasta format (P17 region of HIV-1 - subtype B).
generated constructs Figure 3: Generated constructs (top) and associated fragment frequencies (bottom).
cover from randomly selected sequences Figure 4: Cover provided by randomly sampled sequences.
per site residue frequencies Figure 5: Per site residue frequencies.

The Software
Download the jar file here. The jar should be launched via the command line using java -jar -Xmx1000M CoverageTool_v0.1.jar. Double clicking the jar will also work but for large datasets there may be insufficient memorry allocated. If the file does not execute it is most likely as a result of not having the latest java runtime environment (jre) installed. This can be downloaded here.

Input Data
A sample alignment of 250 p17 sequences from HIV-1 group M (subtype B) is available within the software using the SAMPLE ALIGNMENT button. User defined input can be loaded using the File menu. Input data must be in the form of aligned amino acid sequences in fasta format.

Parameters
Various parameters can be set within the program through the parameter pop up box that is accessible via the Parameter menu.

  • No. Of Constructs: This is the number of constructs to generate. The coverage provided is cumulative in descending order.

  • Low Frequency Threshold (%): If the user wants to replace low frequency residues within the input alignment with what is present in the consensus then they can set the threshold frequency here. The default is 0% i.e. low frequency residues are not replaced.

  • Epitope Fragment Length: This is the length of the fragments used in generating the constructs. In [9] a length of nine is used as this is the optimum length for MHC class I presentation.

  • Sample Size (Random Analysis Only): This is the number of random tests to perform. The default is 10 which means that the applet will pick 1 random sequence 500 times and generate the mean coverage, then repeat for 2, 3, 4 up to 10 random sequences. If this is set to 100 the numbers will go from 1 to 100 with a sample size of 500 each time.

  • Residues As: When outputting amino acid residue frequencies they can be as an count, as a percent or as a probability.

References
1. Archer, J., Williams, S.G., et al. The characterization of HIV-1 k-mer space €“ a prerequisite to vaccine design. Submitted.

Citing
For now cite this url.