Documentation of the EVN data archive at JIVE


The EVN data archive at JIVE has a total capacity of 33 TB divided over 3 RAID filesystems with a capacity of 11 TB each.

The archive consists of a list of experiments, where each experiment has a number of subdirectories, containing fitsfiles, standard plots and pipeline results.

The input, maintenance and output of the archive is controled by a series of scripts, which were written in perl and php.

The main parts of the archive are:

  1. Input: Support scientists can use script "archive" to send their products into the archive buffer on the archive machine: jop83
    The preconditions for the archive script to run properly are set by the experiment administration tool: jexp_gui.pl.
  2. Maintenance: Archive itself, personified by a daemon, gets the products from the archive buffer and
    copies them into the archive. Besides some checks are done at midnight.
  3. Output: The archive contents is published on the WWW, which has only readaccess to the archive.
    Restrictions are implemented for data which is only accessible for the owner of the data (the P.I.).

We can distinct different groups of scripts after their functionality:

  1. Scripts used for or by vlbi stations to give feedback

    The feedbackpages are protected with a name and password which can be changed before each next session. (used at dop288)

    evlbi.org/feedback/accessfbe.pl - Set name and password.

    Use parameters: - def        Set default name and password for session
                    - set        Set name and password for session
                    - unset      Unset name and password for session
                    - s session  e.g. oct09.
                    - n name     name
                    - p pass     password 
    
    Examples:
        set  :   Set   session. e.g.: accessfbe.pl -set -s session -n name -p pass
        unset:   Unset session. e.g.: accessfbe.pl -unset -s session
        def  :   Set   session. e.g.: accessfbe.pl -def -s session
    

    www/evlbi.org/feedback/jivefbe.php

    Here the stations can enter their feedback during the session.

  2. Scripts used by support scientists:

            archive.pl             Send data into the archive buffer on jop83
                                         (directory: /jop83_0/jops/feed
                                                                  /fits
                                                                  /pipe
                                                                  /standplots)
                                            
            admin.php              specify authentication details for pipeline data.
            
  3. Scripts used by user archive on jop83:

            archdaemon.pl         If data is waiting in archive buffer to be copied,
            send2archive.pl       then copy data from archive buffer into archive.
            createpng.pl          If standard plots create also plots in png format.
            convertPipe.pl        If pipe, control position of protected files.  
            cpfeedback.pl         If feed, copy feedback pages per experiment to archive.
            setaccess.pl          If auth, set name and password authentication for P.I.
    
            initNewExp.php        Update archive using jexp table.
            updateFitsDb.php      Update databasetable fitsPars for fitsfinder.php.
            archproducts.pl       Collect filenames for Drupal in table.
            backup2trantor.pl     Synchronize with trantor in Westerbork.
            dailyUpdate.pl        Update archive tables etc.
    
            00 22 * * *  /jop83_1/archive/scripts/util/dailyUpdate.pl by crontab at midnight.
            
  4. Scripts used by user archive on jop83 (in crontab command), see: scripts/util/crontab.archive

            
            # daily updates
            00 22 * * *   /jop83_1/archive/scripts/util/dailyUpdate.pl > /dev/null
    
            # archdaemon executes commands from archive.pl
            00,30 * * * *   /jop83_1/archive/scripts/util/checkDaemon.pl -d /jop83_1/archive/scripts/util/archdaemon.pl > /dev/null
    
            # weblog
            00 04 * * * /jop83_1/archive/scripts/weblog/weblog2db.pl > /dev/null
            15 04 * * * /jop83_1/archive/scripts/weblog/addIpInfo.pl > /dev/null
            
  5. Scripts used by user jops on jop83 (in crontab command), see: scripts/util/crontab.jops

    	10 00 * * * /jop83_1/archive/scripts/util/cleanupArchiveBuffer.pl -stnd -e all -a 0.5
            11 00 * * * /jop83_1/archive/scripts/util/cleanupArchiveBuffer.pl -pipe -e all -a 0.5
            12 00 * * * /jop83_1/archive/scripts/util/cleanupArchiveBuffer.pl -fits -e all -a 0.5
    
            cleanupArchiveBuffer.php Clear archivebuffer. 
    
    	# Backup to LTO
    	55 01 * * * /jop15_1/archive/scripts/util/backup/archtime.pl > /dev/null
    	01 02 * * * /jop83_1/archive/scripts/util/backup/backupArchive.pl > /dev/null
    
    	# Update disk database with experiment status. 
    	5 06 * * * /export/jive/jops/bin/disks/update/setDatabaseCommand.pl -c /export/jive/jops/bin/disks/update/experUpdate.pl
    
    	# Get station logfiles, gpsfiles from vlbeer and NRAO.
    	# Get eop information from maia.usno.navy.mil
    
    	30 06 * * * /export/jive/jops/bin/ftplog/updateftp.pl -all > /dev/null
    
            
  6. Scripts used for publishing the archive on WWW.

            listarch.php            Show webpage with experiments
            fitsfinder.php          Find sources in fitsfiles.
            
  7. Scripts for incidental use in case of manual check or intervention.

            fitsverify.php          Check fitsfile.
            convertPipeBack.pl      activate original experiment.html 
            redoCreationOfPng.pl    remove and recreate png plots.
            moveExp.pl              Move experiment to different disk when current disk is almost full.
            newLink.pl              Create links of experiments on current disk to disk jop83_1 after after a restore.  
            initNewExp.php          Manually change diskpointer for archiving, eg.: jop83_0, jop83_1 or jop83_2.
    (to be changed in top of script: $destination = "jop83_2"; // When disk is full change to Jop83_0 or jop83_1.)
  8. Use of Authentication settings

    1. When a new experiment is added to the list, automatically a default protection is set for the fitsfiles; the fitsfiles on the webpage are not linked to their position.
    2. Experimentnames starting with "N" or "F" do not get a default protection.
    3. Assigning the access to the P.I. must be done via -auth option of archive.pl, it will make the fitsfiles be linked to their position.
    4. If authentication for P.I. is set, then also a selection of pipeline plots can be set for P.I. access only via webprogram admin.php.
    5. If for some reason an experiment should be set free to the world, then the authentication can be removed by the -remauth option of script archive.pl.
    6. Normally the experiment automatically will be set free when the publication date has been reached
  9. Status of the archive and backup procedures

    1. backup2trantor.pl
    2. backup status
    3. backup manual
    4. recovery from disaster
  10. Implementations of archive in the Drupal content management system.

    Drupal shows the content of the archive using the original
    archive files with small modifications in the content and filenames.
    
    drupal_name                    archive_name            Description
    ---------------------------------------------------------------------------------------------
    Archive info                   info.php                Show info about the archive
    Browse catalogue               listarch.php            Show catalogue of current experiments.
    Calibrated UVfits files        fitsplots.php           Show calibrated fitsfiles in pipeline
    catfile                        catfile.php             Show a textfile
    Fitsfiles                      showFits.php            Show the fitsfiles
    Pipeline                       toPipe.php              Show the pipeline pages
    Select experiment              portal.php              Show initial archive page
    standard plots                 getstnd.php             Show standard plots
    Station feedback               getfeed.php             Show station feedback
    Station logfiles               showLogs.php            Show station logfiles
    Web statistics                 archuse.php             Show web statistics about fitsfiles
    
    The drupal files are saved in /jop83_1/archive/scripts/drupal/... (also under cvs).
    
  11. Access in scripts

         
    The way of access in scripts is described in /jop15_1/archive/scripts/doc/access.txt
    


archive.pl - Send data into the archive buffer on jop83

  archive - help show filename convention for fitsfiles
          - fits    -e expname_yymmdd fitsfile1 fitsfile2 ... exp.README
          - stnd    -e expname_yymmdd plotfile1 plotfile2 ... exp.piletter
          - pipe    -e expname_yymmdd [*]
          - feed    -e all *.hdr
          - auth    -e expname_yymmdd -n name -p password
          - remauth -e expname_yymmdd 

  where:

  expname  = experimentname.
  yymmdd   = observationdate (for unique expname)
  plotfile = zipped postscriptfile
  fits     = send fitsfiles to archive
  stnd     = send standardplots to archive
  pipe     = send pipeline files to archive from current dir.
  feed     = send station feedback pages to archive
  auth     = set authentication for P.I on experiment.
  remauth  = remove authentication from experiment.

  Notes:
  - Step in the directory of your files, then start archive script under jops.
  - Archive script uses sftp, it creates a batchfile which is filled
    with copy commands and then executed.
  - Archive script also uses a mysql table: daemon to prepare commands for user archive on jop83.
    It tells user archive on jop83 which data type of which experiment
    has to be copied from data buffer into the archive.
  - Archive user on jop83 polls by means of archdaemon.pl table daemon
    to see if files should be copied into the archive.

updateJexp.pl - Update table jexp in mysql database JIVE on jop83

   Description:
   - Administration fields are copied from ~jops/Admin/Jexp/exp.jex files
     to mysql table jexp on jop83. 
     This table is a resource for the webinterface pages.
   - The items are: expcode, piname, wavel, obsdate, distribed,
                    status, completed, publarchiv, support, 
                    obstype, schedarray, exprdescr, schedsrc.
   - All the expfiles are checked for update, if an experiment
     entry does not yet exist in the table, it is created.
   - Timefields are converted from ddmmyy to yymmdd.
   - This script is scheduled when something is changed in the
     ~jops/Expadmin/Jexp/*.jex files by program: jexp_gui.pl

archdaemon.pl - Check if data is waiting in archive buffer to be copied.

  
   Description
     Archdaemon.pl is a daemon which is continuously running on the
     archive, polling database table daemon each 5 seconds to see if a command
     is waiting.

     - If send2archive is scheduled, then files are copied
       from data buffer: /jop83_0/jops/stnd/*
                                      /pipe/*
                                      /fits/*
                                      /feed/*
       to the archive : /jop83_1/archive/exp/*
     - If setAccess.pl is scheduled, the authentication is set or unset.
     - If convertPipe.pl is scheduled then in the pipeline files are
       moved to subdir prot according the settings in tables archExp and
       archSrc. Besides pointers in the .html files are corrected.
 

send2archive.pl - Copy data from archive buffer into archive

   send2archive.pl -stnd | -fits | -pipe | -feed -e exp | all

   where: -stnd = standardplots
          -fits = fitsfiles
          -pipe = pipeline files
          -feed = station feedback pages
          -e    = exp or all.
                
   Description:
   - send2archive.pl copies files
     from the data buffer      : /jop83_0/jops/stnd/*
                                              /pipe/*
                                              /fits/*
                                              /feed/*
     to the archive destination: /jop83_1/archive/exp/*

   - If a file already exists it will be overwritten by the 
     newer file

   - If standardplots were copied, script createpng.pl is called
     to create also png files from the postscript files. Besides
     a gzipped tarfile is created from all the postscript files.

   - If pipeline files were copied, script convertPipe.pl is called.
     It converts the experiment.html file in a way that all the
     filereferences are checked and point to the correct directory.
     Protected pipeline files are moved to subdir prot.
  
   - After fitsfiles are copied, a checksum file will be created,
     using unix utility: md5sum 
     
   - If feedbackfiles were copied, script cpfeedback.pl copies each
     experiment.hdr file to its destination.

createpng.pl - Create also plots in png format.

     createpng.pl -e Experiment [-d directory] [-r]
     where:
     -d = directory (Default directory = current directory).
     -r = Rotate +90 degr. if name contains "-ampphase" or "-weight"

   Description: - Create png files from postscript files.
                - Create also a zipped tarfile of all postscript files. 

redoCreationOfPng.pl - Remove and recreate plots in png format.

   Description:
     Remove in experiment/standplots/.png and ps.tar.gz
     Call createpng.pl again. 

convertPipe.pl - Control position of protected pipline files.

     Description:   
     - It converts the experiment.html file in a way that all the
       files are pointed to the correct directory.
     - Protected pipelinefiles are moved to subdir prot.
     - The original experiment.html is saved as experiment.html.orig
     - The left border of the page is adjusted.

convertPipeBack.pl - Activate the original exp.html

     Description:
     - Remove experiment.html
     - Rename experiment.html.orig => experiment.html

cpfeedback.pl - Copy feedback pages per experiment to archive.

   cpfeedback.pl

     feedback.pl -e experiment
     where experiment = all or exp

   Description:   
     cpfeedback.pl copies each experiment.hdr file to its destination.

fitsverify.php - Verify fitsfile. -> try it

     Program fitsverify of NASA was implemented in webtool fitsverify.php.
     It has a selection option to select each separate fitsfile in the
     archive.

     fitsverify 4.1 (CFITSIO V2.470)
       fitsverify - test if the input file(s) conform to the FITS format.

       Usage:  fitsverify filename ...   or   fitsverify @filelist.txt

       where 'filename' is a filename template (with optional wildcards), and
             'filelist.txt' is an ASCII text file with a list of
              FITS file names, one per line.

      Optional flags:
          -l  list all header keywords
          -q  quiet; print one-line pass/fail summary per file
          -e  only test for error conditions; don't issue warnings

      Help:   fitsverify -h


setaccess.pl - Set authentication for access via webpages.


    Use parameters: - def      Set default name and password for experiment
                               Only fits files are protected. (pipefiles via admin.php)
                    - set      Set name and password for experiment
                    - unset    Unset name and password for experiment
                    - e exp    exp_yymmdd.
                    - n name   name of Pi.
                    - p pass   password of pi

    Examples:
     set  :   Set   exp. e.g.: setacces.pl -set -e exp -n name -p pass
     unset:   Unset exp. e.g.: setacces.pl -unset -e exp
     def  :   Set   exp. e.g.: setacces.pl -def -e exp

    Description:

      The script controls access for the fitsfiles and pipeline files.
        - def   .htaccess and .htpassw are created in fits directory
                with default name and password.
 
        - set   .htaccess and .htpassw are created in fits directory
                and in pipe/prot directory with specified name and
                password.
                In database table archExp name and password are also
                filled in, which makes the experiment entry visible
                in admin.php where access of pipeline files can be specified.                

        - unset .htaccess and .htpassw are removed from fits directory
                and protected pipelinefiles are moved back to pipe
                dir. Pointers in html files are set back to pipe dir.      
                Settings in database tables archExp and archSrc are removed.

initNewExp.php - Update archive using jexp table.

    Description:
    - Create new expdirs if necessary and create initial protection,
      (i.e. indexfiles and default .htaccess and passwd files.)
    - Update authentication tables: archExp and archSrc.
    - Delete protection of experiment with "setAccess -unset -e exp" if publ.date reached.
    - Create entries for new fitsfiles in database table fitsPars, which is
      used by fitsfinder.php.

cleanupArchiveBuffer.pl - Remove old data from archive buffer.

    cleanupArchiveBuffer.pl -stnd | -fits | -pipe -e exp | -e all -a maxAge

    where: -stnd  = standardplots
           -fits  = fitsfiles
           -pipe  = pipeline files
           -feed  = feedback files
           -e exp = exp or all.
           -a age = max. age in days

archproducts.pl - Collect archive files for Drupal.

    Drupal needs to know which files are in the archive.
    /jop83_1/archive/scripts/util/archproducts.pl collects these files in a
    database table: archproducts, which is updated every night in dailyUpdate.pl.
    Besides the script is used in send2archive.pl whenever data has been sent
    to the archive.

    Update is done from directories: feedback, standplots, pipe and fits.

    archproducts.pl -h                show this help.
                    -e experiment     collect info for experiment. (fast)
                    -create           recreate the table archproducts.
    Without parameters:               update table. (slow)

backup2trantor.pl - Synchronize with trantor in Westerbork.

#  Synchronize VLBI results on trantor with jop83.
#
   use strict;
   use Date::Parse;
   use Date::Format qw(time2str);
   use Getopt::Long;

   my $jop83Dir   = "/jop83_1/archive/exp";
   my $trantorDir = "trantor:/jive/archive/exp";
#  my $options    = "-vrptL --dry-run";
   my $options    = "-vrptL";

#
#    -t, --times                 preserve modification times
#    -v, --verbose               increase verbosity
#    -r, --recursive             recurse into directories
#    -p, --perms                 preserve permissions
#    -t, --times                 preserve modification times         
#    -L, --copy-links            transform symlink into referent file/dir
 
   doRsync("rsync $options $jop83Dir/* $trantorDir");

updateJexp.pl - Update table jexp.

This script is scheduled by jexp_gui.pl each time when an experiment.jex file
is saved. Besides it is scheduled by crontab once a day. 

updateJexp.pl copies the content of the experiment.jex files to database table jexp.
Table jexp is a resource for the archive.

+--------------------+------------------+------+-----+---------+----------------+
| Field              | Type             | Null | Key | Default | Extra          |
+--------------------+------------------+------+-----+---------+----------------+
| id                 | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| expCode            | varchar(30)      | YES  |     | NULL    |                |
| shared_exp_logfile | varchar(80)      | YES  |     | NULL    |                |
| piName             | varchar(30)      | YES  |     | NULL    |                |
| wavel              | varchar(30)      | YES  |     | NULL    |                |
| obsDate            | date             | YES  |     | NULL    |                |
| completed          | date             | YES  |     | NULL    |                |
| released           | date             | YES  |     | NULL    |                |
| distributed        | date             | YES  |     | NULL    |                |
| publarchiv         | date             | YES  |     | NULL    |                |
| status             | varchar(30)      | YES  |     | NULL    |                |
| support            | varchar(30)      | YES  |     | NULL    |                |
| obsType            | varchar(30)      | YES  |     | NULL    |                |
| schedArray         | varchar(100)     | YES  |     | NULL    |                |
| exprDescr          | blob             | YES  |     | NULL    |                |
| schedSrc           | blob             | YES  |     | NULL    |                |
+--------------------+------------------+------+-----+---------+----------------+

/jop83_1/archive/scripts/user/updateJexp.pl -h

Update table jexp from Jexpdirs.
updateJexp.pl               : All experiments will be updated
updateJexp.pl -e Experiment : Only experiment will be updated.

moveExp.pl - Move experiment to different disk.

Move experiment to different disk when current disk is almost full.

/jop83_1/archive/scripts/util> moveExp.pl
 
Use parameters: -s sourcedisk (e.g.: jop83_2)
                -d destinationdisk (e.g.: jop83_3)
                -e experiment (e.g.: EP037A_000920)

The experiment will be copied to the new disk and deleted from the old disk.
The old link at jop83_1 will be removed and a new link from the new disk at
jop83_1 will be created.

newLink.pl - Create links of experiments to disk jop83_1 after a restore.


Create links of experiments on current disk to disk jop83_1 after experiments were restored.  

Step into the directory of experiments you want to link to jop83_1.
run /jop83_1/archive/scripts/util/newLink.pl

Now all the links will be created of the experiments on this disk that did not
already have a link to jop83_1.

dailyUpdate.pl - Update archive tables etc..

    This script is scheduled by crontab once a day.

    baseDir = "/jop83_1/archive/scripts";
 
    Update table jexp with parameters from /export/jops/Expadmin/Jexp/*.jex
    system("$baseDir/user/updateJexp.pl");

    Update experiment directories and check permissions.
    system("$baseDir/util/initNewExp.php");

    Update table archproducts (collect all exising filenames for drupal)
    system("$baseDir/util/archproducts.pl");

    Collect length and archivetimes for backup to LTO tapes (backupArchive.pl)
    system("$baseDir/util/backup/archtime.pl");

    Backup data to backupsystem Trantor.
    system("$baseDir/util/backup/backup2trantor.pl");

    Create plot that shows amount of data in archive.
    system("$baseDir/util/createGrowplot.php");

    Update database for fitsfinder.php.
    system("$baseDir/avo/updateFitsDb.php");

listarch.php - Show archive to WWW. -> try it

    Registrate use of archive in database table: archuser.
    
    Description:
    - Show all experiments in the archive.
    - show per experiment the archive products.
    - show feedback pages.
    - show station logfiles from EVN archive in Bologna.
    - show show standard plots.
    - show pipeline plots.
    - show fitsfiles.

archuse.php - Show when and by whom the archive was used. -> try it

    Description:
    - Show registrated fields of archive use.
    - Give selection possibillities of the data.
    - A number of search engines is omitted from the list.

admin.php - Control access to fitsfiles and pipeline data. -> try it

    Description:
    In mysql tables archExp and archSrc access information is kept about
    groups of filenames that contain a parameter mentioned in the list below.
    
    admin.php controls the contents of these tables:    
    - Access can be set separately for each experiment and for each source.  
    - Columns (parameter names) can be manually added to or deleted from a table.

    List of parameters which can be set:

      Fitsfiles       pipeline experiment plots        pipeline source plots
      ---------       -------------------------        ---------------------
      fits            BANDPASS                         CLPHS
                      CPOL                             ICLN
                      FRING_DELAY                      IMAP
                      FRING_PHAS                       IMAPN
                      FRING_RATE                       IMAPU
                      GAIN                             UVCOV 
                      POSSM_AUTOCORR                   UVPLT
                      POSSM_CAL                        VPLOT_MODEL
                      POSSM_UNCAL                      CALIB_AMP2
                      SENS                             CALIB_AMP4
                      SNR                              CALIB_PHAS1
                      TSYS                             CALIB_PHAS3
                      VPLOT_CAL
                      VPLOT_UNCAL

     
     When settings are sent to the database tables, also a command:
     "convertPipe.pl -e experiment" is executed, which means that
     files that need protection are moved from pipe to pipe/prot.
     The links in the .html files are adjusted to pipe/prot.

fitsfinder.php - Find sources in fitsfiles according specifications. -> try it

    Description:
 
    fitsfinder.php is a webbased userinterface which finds and displays source
    entries from mysql table: fitsPars. 
    Table fitsPars was filled with data by script: updateFitsDb.php.

    Fitsfinder.php contains extended selection and sort options to format a wanted
    listing.

    It also contains a plotoption which displays annotated source positions in
    a plotwindow.

updateFitsDb.php Update databasetable fitsPars for fitsfinder.php.

    Description of updateFitsDb.php:

    - Update database table fitsPars with fitsinfo.
    - Check for each experiment: Are fitsfiles available?.
    - Check for each fitsfile:
        Already entries found in databasetable for this fitsfile?
        Does the timestamp of entry match the timestamp of file?
        If not: delete entry.

    - Read fitsfile using NASA's cfitsio programs:

        Program      Fitstable       Parameters
        ---------    --------------  ---------------------------------------------------
        listhead     UV_DATA         NO_STKD, STK_1 ,NO_BAND, NO_CHAN, REF_FREQ, CHAN_BW
        tabfreq      FREQUENCY       a list of freqparameters.
        tablist      SOURCE          Source, RaEpo, DecEpo, Equinox
        tablist      ARRAY_GEOMETRY  2 letter antenne names
        tabmap       UV_DATA         arrayId,  freqId,    sourceId, intTime, totalTime,
                                     startDay, startTime, endDay,   endTime.

    - Create entry in the database with a unique combination of: arrayId, freqId and sourceId
        
      So for a fitsfile there can be more than one entry with the same sourcename 
      but with a different frequency and/or a different array.
   
    - Read only IDI fits format. (UVF format cannot yet be read by cfitsio programs).


Last update: 21-11-2013, Email to kramer@jive.nl.