The Data Administrator is the person who loads data into the SIFTER database. There are two kinds of data to be loaded into the data - maps and results. Loading a result will require you specify the map, so you'll make your life easier by loading maps first and then results.
The process for loading either type of data is very similar. In each case there is a data file (map or result) and a configuration file which specifies the attributes for this data. Your biggest task in most cases will be to properly define the attributes and then create valid configuration files.
SIFTER provides basic Perl scripts to add a map or result and the associated configuration file into the SIFTER database. You will most likely want to construct other scripts to automate loading your maps and the many results your project has.
There is a full set of scripts as well as maps and results in the perl/samples directory. Feel free to look these over and borrow these scripts for your own needs. The scripts will likely not exactly meet your needs, but they should give you some ideas.
There are two separate sets of attributes, one for maps and another for analysis results. For each of these there are two types of attributes:
Adding, modifiying or deleting attributes is an administrative function. See the administration pages for more details on this.
A configuration file is an ASCII text file of keyword=value lines, like this:
# SIFTER Configuration file # Generated Tue Nov 6 00:51:05 2001 uniqname=1005025865.2837 analyst=tony chromosome=22 vars=_marker pos z0 z1 _ig2 _ig3 lod datafile=chr22.out.4 date=2001-11-06 ismultipoint=0 mapname=chr11-2001.10.29 population=F1 project=FUSION statistic=lod subtype=possible triangle weighted type=Linkage units=cM event=Nov2001Mtg |
As you can see these may contain comments and empty lines. Anything else must be in the form of keyword=value lines and begin in column one. The keyword must be a primary or secondary attribute which is already defined to SIFTER.
When an attribute is defined to SIFTER, you must specify its datatype. If the attribute is of datatype 'enum', then the value must be found in the enumerated list associated with the attribute. For instance, chromosome may take only certain values (two digit numbers from 1 to 22 as well as 'X' and 'Y').
Each primary attribute also servers as a SIFTER configuration file keyword (with a few exceptions noted below). What follows is a complete list of the default map and result primary attributes. Some are required in your configuration file, others are not.
To add a map to the SIFTER database, create a configuration file with the proper set of attributes and values and invoke the addmap.pl command like this:
addmap.pl -realm MYPROJ map3.cfg map3.data
You might want to create a separate configuration file for each map. On the other hand, the configuration file for maps is generally pretty simple as there are few attribues. You may find that rather than create a separate configuration file, you might want to dynamically create a configuration file, add the map and then delete it. An example of exactly this can be found in the script perl/samples/addmap2sifter.sh.
To add a result to the SIFTER database, create a configuration file with the proper set of attributes and values and invoke the addanalysis.pl command like this:
addanalysis.pl -realm MYPROJ result5.cfg
You will likely want to create a separate configuration file for each result, as there are quite a few attributes you may want to specify. In some cases, you may be able to determine all the result attributes and create a configuration file dynamically, as shown in the perl/samples directory for maps. Note that the addanalysis.pl command takes only the configuration file name and expects the result to be provided in the configuration file (compared to addmap.pl).
If your analyses are created in an automated fashion using shell scripts, you'll find it very convenient to create the SIFTER configuration file when the analysis results are created. It is likely your configuration files for results will be far more complex than for maps and you will find it useful to create static configuration file for each analysis.
In the perl/samples/Results directory you can find a static configuration file for each analysis. Each directory contains several results and a configuration file for each. In the demo, each result has a configuration file and the perl/samples/adddemoresults.sh script finds each configuration file and loads it using addanalysis.pl. You will likely want to do something similar.
SIFTER supports the common formats for genetic map files as listed below. In each case there is Perl documentation available using perldoc which describes these in more detail. The command is provided below.
These formats should cover most common cases. In the Simple format mentioned above, we assume you have some program which can convert your map information into a simple columnar format. If none of these formats work, you may define your own format. The details for this are described by perldoc perl/modules/Sifter/AddMap.pm for complete details.
SIFTER supports only one format for results files. The data is expected to be in simple columns of data. This is described in detail using Perl documentation format - see perldoc perl/modules/Sifter/Analysis/Simple.pm for complete details.
This format should cover most common cases. If this format does not work, you may define your own format. The details for this are described by perldoc perl/modules/Sifter/AddAnalysis.pm for complete details.
Version=$Id: dataload.html,v 1.5 2002/09/13 16:53:28 tpg Exp $