dbGaP File Preparation Quick Information
- Submitting the initial set of dbGaP files for your study is a central component to the early stages of TOPMed activities, as many steps depend on this completion.
- The initial submission of dbGaP files via the dbGaP Submission Portal needs to include the Study Config file (a TOPMed-specific example will be provided) as well as the following 3 data files and their data dictionaries:
- SampleAttributes
- Subject (consent file)
- SubjectSampleMapping
- The most up-to-date example templates of these files can be downloaded from dbGaP.
- All shipped DNA samples (e.g. all NWD_IDs in your sample manifests) should be included in this initial submission of dbGaP files, irrespective of whether the samples ultimately pass sample or sequencing QC. Doing so allows the Michigan IRC to stream BAM files to the NCBI SRA without delay. After completion of QC activities, the ACC can advise on how to update your initial submissions to exclude failed samples and to reflect resolutions of sample identity issues, etc.
- Whenever you submit files via the dbGaP Submission Portal, add a second copy of all files to the Exchange Area at the same time or share your files with the ACC using Liquid Files. (Ask for a Liquid Files invitation from your contact at the ACC.)
- The ACC will perform some QC checks of your files and let you know if anything needs to be updated and re-uploaded.
- Once the dbGaP curator accepts your file submissions, they will let you know that your sample ID files have been loaded and the system is ready to receive SRA submissions. Please email the ACC/IRC (topmed-admin@westat.com, tblackw@umich.edu) when you've reached this stage as your dbGaP registration is now fully complete!
- Tom Blackwell (Michigan IRC) will handle these SRA submissions (BAM file uploads).
Variable-level detail
- NWD_ID (NHLBI WGS DNA-level identifier) is the SAMPLE_ID in the dbGaP file set.
- SUBJECT_ID is a de-identified subject-level identifier. If your study has previously posted data to dbGaP, please use the same SUBJECT_ID used in those postings. Otherwise, please discuss with the ACC before uploading files via Submission Portal.
- Relationship between sample and subject IDs: multiple DNA aliquots (samples) may come from a single individual (subject), such as in the case of replicates.
- If non-required variables are blank for every row in your file, you will need to omit those variables before uploading. The dbGaP Study Submission Guide lists required variables for each file.
Question: What is the Submission Portal and how do I use it?
Answer: Once the dbGaP registration paperwork is complete, the PI receives an automated email invitation from dbGaP to upload files via the “Submission Portal” web link. A submitter may be able to submit to dbGaP using credentials such as eRA Commons or by clicking “See more 3rd party sign in options” (your university or institution credentials may already work as 3rd party). Otherwise, look for an automated email addressed to the PI with subject line “Submission Portal dbGaP Study invitation (PI)” and follow the instructions to a assign submitter(s) for the study.
Else, contact dbGaP-help: dbgap-sp-help@ncbi.nlm.nih.gov.
Question: Where do I go for help?
Answer: You might find the answer to your specific questions in the dbGaP Study Submission Guide. dbGaP and Submission Portal technical support questions should be directed to dbgap-help: dbgap-sp-help@ncbi.nlm.nih.gov.