Pooled analysis of radiation hybrids to identify genes for cell growth and paclitaxel action
datasetposted on 18.12.2019 by Arshad H. Khan, Andy Lin, Richard T. Wang, Joshua S. Bloom, Kenneth Lange, Desmond Smith
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Data from bulk segregant analysis of radiation hybrids to identify genes for cell growth and paclitaxel action.
The LaTeX documents "master.tex", "main.tex" and "supp.tex" are located in the "Words" directory and provide the key to navigate the scripts and data. These documents can be found by search. The pdf output file of the latex documents is "master.pdf".
The scripts necessary to analyze the data are in the "Data_Figs" directory and can also be found by search. The vast majority of the data files needed to run the scripts are in the directory "RH_pools_workspace_1". A few miscellaneous data files are in the "Data_Figs" directory.
FINDING RELEVANT SCRIPTS
For example, the second paragraph in the "Results" section ("master.pdf") commences with the phrase "We created six independent RH pools...", and provides some basic statistics on the RH pools. Searching for this phrase in "main.tex" reveals the names of two R scripts "clone_sem_1.R" and "graph_Human_retent_2.R" above the paragraph. These scripts provide the corresponding results.
Inspection of the script "clone_sem_1.R" shows that it uses the data file "clone.txt", while the script "graph_Human_retent_2.R" uses the data files "RH_pool_human_total_align.txt", "RH_pool_hamster_total_align.txt", "RH_human_gseq.txt", "RH_hamster_gseq.txt", "gencode_gtf_ensembl_ucsc_v31.txt", "clone.txt" and "cell_label_info.txt". All these data files can be found in the directory "RH_pools_workspace_1".
FINDING RELEVANT PARTS OF SCRIPTS
Statistics in the LaTeX documents have been quoted to unrealistic levels of precision, but are rounded in the pdf output file, "master.pdf". However, the redundant digits can be useful. For example, in the last paragraph on pg S12 of the "Supporting Information" (section "Overlap of interaction loci with growth and paclitaxel loci"), we are informed that "There were 15 genes that overlapped between the interaction loci and the 859 unique growth loci (odds ratio = 22.6, P = 8.3 × 10−15, Fisher’s Exact Test)..."
Reference to "supp.tex" reveals that the pertinent script is "g_d_comb_fish_2.R" and that the exact P value is "8.281682e-15". Searching "g_d_comb_fish_2.R" for "8.281682e-15" takes the reader to the relevant part of the script.
A cost-effective platform for high-precision genome analysis of mammalian cells
National Human Genome Research InstituteFind out more...
Select an IC:
- HG - National Human Genome Research Institute (NHGRI)