Home Contents Search

Previous Status
Up ] Current Status ] [ Previous Status ]

 

NT-Based Software Development and Training for Genome Analysis Status Report  -  Q2 1998

To date we have received and set up 1 server and 2 workstation machines. The remaining 2 servers and 8 workstations were received last week and are being set up. Software packages needed on the new machines to support the instructional project have been identifed and are being installed. Two members of our database group attended the NT user training, and two members attended the NT administrator training.

Research

Hugh Gauch is using one of the INTEL computers for analyzing QTL experiments of several professors and graduate students. He is also developing software to handle QTL by environment interactions. This work involves simulations and permutation tests with heavy computational demands, so the Pentium II computer has been ideal.

We are the primary alpha tester site for the NT port of the client/server version of the ACEDB genome database software. We have reported a number of bugs, including that the NT-ported server can only connect to a client run by the same user on the local domain as the one who is running the server. Just one more little hurdle for Mr. Gates to negotiate before he buries Unix.

Development of a multi-user, NT version of the GeneFlow software program will begin this Fall. A multi-user version of the Sybase SQL Anywhere database engine will be purchased for this purpose.

Instruction

The full set of machines will be used in two new graduate level 4-week course modules that will debut in Fall 1998. Preparation for these new courses has included the commitment for a new classroom from the Department of Plant Breeding, $32,000 from the College of Agriculture and Life Sciences for the renovation/adaptation of the classroom, and funds for the development of the new curriculum, also from the College. Renovation is now largely complete (dropped ceiling, ventilation, carpeting, paint) and the network installation and furniture are expected this month.

Meetings with faculty and staff representing a number of different disciplines were held in June, for the purpose of creating detailed course outlines. Excellent feedback on both course content and course mechanics were obtained. A draft day-by-day outline for the first course is given below. The second course outline is being finalized, and a list of bibliographic and electronic references is being compiled.

Course descriptions

PL BR 607- Electronic Information Resources & Bioinformatics
Instructors E. Paul, S. McCouch

This course will focus on how to access information in public databases such as GenBank, GRIN, and SWISS-PROT, and on tasks as BLAST searching, sequence alignment, primer design, and phylogeny analysis. The biological background of issues will be presented in lectures and extensive on-line exercises will provide students with experience in accessing and analyzing diverse information in the computerenvironment.

PL BR 608 - Comparative Genomics
Instructors E. Paul, S. McCouch, M. Sorrells

This course will emphasize how to access and integrate different kinds and sources of data, using computer databases and a variety of querying mechanisms. Students will learn to integrate information derived from analysis of phenotypes, biochemical and metabolic pathways, DNA sequences and genetic and physical maps using Plant Genome Databases and a variety of software packages.

Draft Outline for PL BR 607

Session 1
Topic - Introduction, mechanics of class, machine instructions,
Online activities - check out general molecular biology resources/lists on the web, bibliographic resources on the web, BIOSIS, compare search engines. JQTL, Science online, area with RGNs
Assignment - download a reference with abstract (if possible) from AGRICOLA, BIOSIS, NAL newsletter, and an online journal

Session 2
Topic - Germplasm data
Online activities - Look at GRIN database, how to access observation data. Look at CIMMYT database (from CD), see if IRRI has anything online. Databases for various strains and organisms (AGR, MAFF for ordering clones, American type culture collection, RiceGenes for ordering clones)
(http://biotech.chem.indiana.edu/lib/orgstrain.html)
Assignment - get GRIN record of favorite species, visit several sites for ordering genetic resources, compare presentation and functionality

Session 3
Topic - LAB
Online activities - give them descriptions of a mutant phenotype, they need to find out what's known about that phenotype (mapped, gene action, who works on it, cloned, trisomic stock, can order it) - expect them to look at newsletters, Agricola, GRIN to search observations. exercise 2 - say you're interested in flowering, want to get all the flowering mutants in arab. have them filter it, come up with 10 stocks to order.

Session 4
Topic - DNA sequence data, what is it, where is it
Online activities - Check out GenBank, go through the various fields, history of the sequence database. Note email servers. Review sequence submission. Assignment - given sequence x, and top level web pages, find the sequence in embl, ddbj, and genbank, (maybe GSDB) the write a short (<1 page) critique comparing these sequence sites

Session 5
Topic - BLAST searching, what it is, what the parameters are for, when should you adjust them, how to interpret output, program background.
Note email server. cDNA vs. genomic
Online activities - Have them try submitting the same sequence using old BLAST, GAP BLAST, the various BLASTx programs, discuss results and changes in results, what does the score mean,
Assignment - given sequence x perform BLAST search(es), adjusting parameters, (window, threshold) discuss results and effects of adjusting parameters.

Session 6 - LAB
Topic - LAB
Online activities - given a short/long some sequence, blast against full db species-specific dbs, contrast results. do old and new blast, note difference, adjust parameters, various blast programs (x,n,t)

Session 7
Topic - Multiple sequence alignment.
Online activities - with gaps vs without. principles/theory, discuss parameters of CLUSTALW, compare 2-way and multiple-way alignment (blast always 2point), multi allows phylogeny and design of consensus primers, heterologous cloning, allele diversity
Assignment - take x cDNA sequences (enzyme/isozyme) and align them.
Adjust parameters and see what happens

Session 8
Topic - Alignment review
Online activities - open (electronic PCR), consensus primer design
Assignment - TBD

Session 9
Topic - LAB
Assignment - Given a seq, get blast hits. take top x hits and do multiple seq alignment. use the dros case (get good examples from jeff)

Session 10
Topic - Protein data.
Look at primary sites for protein and enzyme information (SWISS-PROT, ENZYME, PIR, dbEST). BLAST searches using amino acid sequences.
Online activities - Examine sample records from these databases. Submit a BLAST search against a protein database. when to do DNA vs AA, protein sorting (PSORT), Mendel, how it was derived
Assignment - given an amino acid sequence (that is a short segment of a known protein), do a blast, look at the matching hit, what can tell about it, anything.

Session 11
Topic - Protein analysis - protein domains, structures
Online activities -(Need to get RasMol installed on student machines) Look at ProDom (domain database), ProSite (patterns and sites), PDB (structure*). Look at enzyme structure db (http://www.biochem.ucl.ac.uk/bsm/enzymes/index.html)
Assignment - continuing with above AA seq, find others with similar domains, what can you tell about those.

NT-Based Software Development and Training for Genome Analysis Status Report -Q1, 1998

To date we have received and set up 1 server and 2 workstation machines. Two members of our database group attended the NT user training, and two members attended the NT administrator training. The machines are  currently being used by staff researchers for the development of statistical software for the simulation and analysis of agricultural datasets.

The full set of machines will be used in two new graduate-level course that will debut in fall 1998. The preparation for these new courses has included the commitment for a new classroom from the Department of Plant Breeding, $32,000 from the College of Agriculture and Life Sciences for the rennovation/adaptation of the classroom, and funds for the development of the new curriculum, also from the College.

Faculty, post-docs and graduate students have been polled to identify topics they feel should be addressed in these first-round bioinformatics courses. This information is currently being synthesized into course outlines for 2 intensive, one-month long modules aimed at graduate students in the biological sciences. In these courses students will use the computers to explore information and perform analyses that they may typically be called upon to do in their professional carreers. These initial modules will focus more heavily on using existing tools; future modules may be focused on teaching students to develop new tools. The course descriptions for the new modules are included at the end of this report.  The upcoming quarter will see the development of detailed course outlines and the identification of software packages that need to be installed on the machines. An NT version of the ACEDB software program is under development, and will be installed and tested when available. The NT version of the GeneFlow software package will also be installed and tested.

Course descriptions:

Course #1
This course will focus on how to access information in public databases such as GenBank, GRIN, and SWISS-PROT, and on tasks as BLAST searching, sequence alignment, primer design, and phylogeny analysis. The biological background of issues will be presented in lectures and extensive on-line exercises will provide students with experience in accessing and analyzing diverse information in the computerenvironment.

Course #2
This course will emphasize how to access and integrate different kinds and sources of data, using computer databases and a variety of querying mechanisms. Students will learn to integrate information derived from analysis of phenotypes, biochemical and metabolic pathways, DNA sequences and genetic and physical maps using Plant Genome Databases and a variety of software packages.

 

 

Back Home Up

Last modified on: 10/12/99