{"id":964,"date":"2021-11-23T20:47:58","date_gmt":"2021-11-23T20:47:58","guid":{"rendered":"https:\/\/hgblog1.gi.ucsc.edu\/blog\/?p=964"},"modified":"2022-04-22T16:37:49","modified_gmt":"2022-04-22T16:37:49","slug":"genark-hubs-part-1","status":"publish","type":"post","link":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/2021\/11\/23\/genark-hubs-part-1\/","title":{"rendered":"GenArk Hubs Part 1 &#8211; Accessing the data"},"content":{"rendered":"\n<p>This blog post is the <a href=\"https:\/\/hgblog1.gi.ucsc.edu\/blog\/?s=GenArk\">first of three<\/a> to discuss the Genome Archive (GenArk) assembly hubs. This first post discusses accessing the GenArk hubs, the second post gives examples of using the data, and the third post describes the technical infrastructure behind the hubs.&nbsp;<\/p>\n\n\n\n<p>Let\u2019s start with a real-world story: imagine you are a researcher working on zebrafish, but you are using an alternative strain with unique polymorphic properties. You have a desire to do CRISPR on your particular zebrafish and you already have a FASTA file for the genome assembled into chromosomes, but have no annotations or way to visualize the data yet.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/_kINgt_omMEmeaM3e2zzuUfUAnm8ojB7yVM1caUCbhschfie66kG0nt2WIZETmzDowbBvRNE1z-kkm-qu4QkX48NldZZqYebVq478zEnxXrId0Qp75f8Dcc85ubX7lXTO11zQzqD\" alt=\"\"\/><\/figure>\n\n\n\n<p>One option to visualize your FASTA would be to independently create a UCSC Assembly Track Hub to work on your zebrafish. Or now that UCSC has developed the Genome Archive (GenArk) system, when you submit your assembly into NCBI\u2019s assembly database, you could contact us directly and request we generate the browser for you behind the scenes. This happened for a specific lab, where they submitted their specific TD5 zebrafish assembly to NCBI, <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/assembly\/GCA_018400075.1\">https:\/\/www.ncbi.nlm.nih.gov\/assembly\/GCA_018400075.1<\/a>, and the result after contacting us was a new assembly hub that could be easily loaded at UCSC with the following link:<a href=\"https:\/\/genome.ucsc.edu\/h\/GCA_018400075.1\"> https:\/\/genome.ucsc.edu\/h\/GCA_018400075.1<\/a>&nbsp; In this case, the team at UCSC even helped generate liftOver alignment files between the UCSC zebrafish in this new TD5 zebrafish GenArk Public Hub addition, aiding identification of lifting annotations to the new browser.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/Cr-BvNJ-RDmcFvgd5m0APPvbMDDnOzUqg8uoiaMHQoUR0IOV4tXacRKDDQXr6UpHE1fX6ERzK0cSyq0vl3JGDqYeOaNPPD342D7jFfwoyDpNhXOBlzC-uxn4gR-3gLtXobzeDmO9\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">So what are the GenArk Assembly hubs?<\/h2>\n\n\n\n<p>GenArk hubs are a collection of data files externally hosted from the main UCSC data website enabling browsing new genomes. GenArk genomes have <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/assembly\">NCBI Genbank assembly accessions<\/a> starting with either GCA or GCF and the browsers allow visualizing and attaching laboratory-generated data. New software also enables UCSC to dynamically turn on query servers to search GenArk hubs with DNA sequences or test PCR primer pairs. GenArk hubs are part of the <a href=\"https:\/\/genome.ucsc.edu\/cgi-bin\/hgHubConnect?hubSearchTerms=GCF\">UCSC Public Hubs list<\/a> where UCSC can update the data files with pipelines.&nbsp;<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">ACCESSING THE DATA<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">How do I access GenArk Assembly Hubs?<\/h2>\n\n\n\n<p>There are multiple ways to access the GenArk hubs, including searching the UCSC Genome Browser\u2019s main gateway page, building direct links to NCBI GCA\/GCF assembly accessions, searching the UCSC Public Hubs page, and navigating directly to individual taxonomic group pages.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Browser Gateway Page<\/h3>\n\n\n\n<p>The easiest way to find GenArk hubs is to search the species name on the Browser <a href=\"http:\/\/genome.ucsc.edu\/cgi-bin\/hgGateway\">Gateway<\/a>.<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/HvDC1aNi-wrr5a8N19WZ45hWMl03M0K97nZgxeuOk2I2618ymaqyrOD5aaTEQGCrNZBmRip1kANuawygM_oXY6--9F5LtTtdRkTjSLER40Xv8Fdg5fRjUvcmTl_AE2fGHoky11Kv\" width=\"624\" height=\"216\"> On the Gateway page in the top left box you can search a term such as \u201cdog\u201d and find all the genomes both hosted in our internal databases and in external Public Hubs that have dog in the name. In this image, a search for &#8220;dog&#8221; returns a top \u201cDog\u201d match (UCSC database) as well as results for several species in Assembly Track Hubs that match on the term \u201cdog\u201d with the specific labrador dog breed selected from the GenArk Mammal Assemblies Hub (GCF_014441545.1).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Direct GCA\/GCF Accession Links<\/h3>\n\n\n\n<p>In the situation where you may know the GCF\/GCA identifier for an assembly, you can also search that term on the Gateway page or build a short link to directly load the hub.&nbsp; Links to UCSC with a hub (\u201c\/h\/\u201d) address, such as <a href=\"https:\/\/genome.ucsc.edu\/h\/GCF_000698965.1\">https:\/\/genome.ucsc.edu\/h\/GCF_000698965.1<\/a> will attempt to find and attach a matching final GCF-value,&nbsp; which originates from the NCBI accession, in this case, for an African ostrich assembly. If you don\u2019t find a match, read more below about contacting us.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Public Hubs Page<\/h3>\n\n\n\n<p>Another place to find GenArk hubs is on the Public Hubs page where you can enter various terms, like \u201costrich\u201d: <a href=\"https:\/\/genome.ucsc.edu\/cgi-bin\/hgHubConnect?hubSearchTerms=ostrich\">https:\/\/genome.ucsc.edu\/cgi-bin\/hgHubConnect?hubSearchTerms=ostrich<\/a>,&nbsp; You can expand the \u201cSearch details\u201d to examine matching results. To load a desired hub, use a right-click to display an \u201cOpen this assembly\u201d pop-up, or an option to configure individual track settings.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/KON2zKRkRlaLc_qjFRsnEEpsGOLKp_ZKUhHa7fhgOLqXWj_fMOpyplLqVOmF8_XTyXI1yOO9s6MR9lWvUpNuH3Skjve6Cv4ACTAcyKNCQLIOhW8LSXGryQo-cXtpnwqcmEJIlSRu\" alt=\"\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Genomes Menu<\/h3>\n\n\n\n<p>Another option to gain an overview of all the GenArk hubs is to click the \u201cGenome Archive GenArk\u201d link available under the \u201cGenomes\u201d menu.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/eGIRH-ntgilCfkbjloF-xwawbeBIWoPwyNTgH-WSPU-Tsnp1LOkSWUlp0YkppzMTJckYF1rqU1Qti1amQWf17vTjETSPZgnPY_zjhIX4WDzqWI2HxWylwAwA25D7FACEh_9qxOjv\" alt=\"\"\/><\/figure>\n\n\n\n<p>This Genomes menu link will open the GenArk homepage. On the GenArk homepage, a variety of links exist including the line,\u201cPlease note: text file listing of 1,600 NCBI\/VGP genome assembly hubs.\u201d Clicking that link will open a single text file that lists all available hubs allowing a quick overview: <a href=\"https:\/\/hgdownload.soe.ucsc.edu\/hubs\/UCSC_GI.assemblyHubList.txt\">https:\/\/hgdownload.soe.ucsc.edu\/hubs\/UCSC_GI.assemblyHubList.txt<\/a>&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Individual Taxonomic Pages<\/h3>\n\n\n\n<p>The GenArk homepage also has links to specific taxonomic groupings hub pages, such as mammals, fishes, or fungi. For instance, a \u201cbirds\u201d link, <a href=\"https:\/\/hgdownload.soe.ucsc.edu\/hubs\/birds\/index.html\">https:\/\/hgdownload.soe.ucsc.edu\/hubs\/birds\/index.html<\/a>, brings you to a webpage with links to launch browsers, along with links to other details for each assembly.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/qcIEDuZrsqL7YtztrnJ_FUC-doqPF_DGspZxR1jZYfs-zEJ8dq97NS6ID5Ilk7R7qQX-Od7e_FDRAdkAZbIkB5pXQM54bsk-N1o56_3yJfRI0shgPdqd8mZAjq6zqngG24lntXSu\" alt=\"\"\/><\/figure>\n\n\n\n<p>These taxonomic group pages, such as this image of the bird\u2019s page, have links to launch the browser (2nd column: common name and view in browser) and links to the source files (4th column: NCBI assembly).&nbsp;<\/p>\n\n\n\n<p>Access to these taxonomic group pages is also available from the Public Hubs page. <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/yJXTfv97C5kLHhYofptbO3ENjGK9vx7DwoOynGG7bRLbZ30DVx3KijjGk14V3gRjsXU5RCOSwS8E8GFel7f_NixuVn-_LMHleLB8yUb0gFheZyfvV5wfxkWUEkfVBga148_udAfL\" width=\"624\" height=\"72\"><\/p>\n\n\n\n<p>By going to the Description column on the <a href=\"https:\/\/genome.ucsc.edu\/cgi-bin\/hgHubConnect?hubSearchTerms=GCF\">Public Hubs page<\/a> you can click a link (Bird genome assemblies) to end up at the related taxonomic grouping page. Also of note that on the Public Hubs page, you can click a&nbsp; [+] plus button to expand the list of Assemblies and click one of the GCA\/GCF accession links to directly load an assembly.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What if I don\u2019t find my Assembly in the GenArk collection?<\/h2>\n\n\n\n<p>If the assembly of interest is not found, please visit our <a href=\"http:\/\/genome.ucsc.edu\/assemblyRequest.html\">assembly request page<\/a>. Search that page for your assembly, if there is a &#8220;view&#8221; link you can launch the existing genome browser. Otherwise, click the &#8220;request&#8221; button to fill out a form to add your genome of interest.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"805\" height=\"328\" src=\"https:\/\/hgblog1.gi.ucsc.edu\/blog\/wp-content\/uploads\/2022\/04\/image.png\" alt=\"\" class=\"wp-image-1014\" srcset=\"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2022\/04\/image.png 805w, https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2022\/04\/image-300x122.png 300w, https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2022\/04\/image-768x313.png 768w, https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2022\/04\/image-624x254.png 624w\" sizes=\"auto, (max-width: 805px) 100vw, 805px\" \/><\/figure>\n\n\n\n<p>The assembly request page does require there to already be an existing GCA\/GCF identifier. You can also always email us at our public mailing-list <a href=\"mailto:genome@ucsc.soe.edu\">genome@ucsc.soe.edu<\/a> to request we add the assembly to the GenArk collection. This archived mailing-list is searchable from links on our contacts page, <a href=\"http:\/\/genome.ucsc.edu\/contacts.html\">http:\/\/genome.ucsc.edu\/contacts.html<\/a>. Alternatively, if you don\u2019t want your request to be public, you can email our private internal mailing-list at <a rel=\"noreferrer noopener\" href=\"mailto:genome-www@soe.ucsc.edu\" target=\"_blank\">genome-www@soe.ucsc.edu<\/a>.\u00a0<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What if my assembly doesn\u2019t have a GCA\/GCF NCBI accession?<\/h2>\n\n\n\n<p>If NCBI does not have a GCA\/GCF accession for your assembly then our scripts will not be able to pull the data and generate the GenArk hub. You will need to deposit the assembly at NCBI and notify us once the assembly has become available. You can find directions at NCBI for how to submit new genomes: <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/assembly\/docs\/submission\/\">https:\/\/www.ncbi.nlm.nih.gov\/assembly\/docs\/submission\/<\/a>&nbsp;<\/p>\n\n\n\n<p>The next blog post in this series will provide examples of using the GenArk hubs, such as the BLAT and PCR tools that are available, or how you can send DNA of any Assembly Hubs to External Tools for processing.&nbsp; The final post examines the infrastructure behind the hubs.<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>This entry written by Brian Lee. If after reading this blog post you have any public questions, please email genome@soe.ucsc.edu. All messages sent to that address are archived on a<a href=\"https:\/\/groups.google.com\/a\/soe.ucsc.edu\/forum\/#!forum\/genome\"> publicly accessible forum<\/a>. If your question includes sensitive data, you may send it instead to genome-www@soe.ucsc.edu.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog post is the first of three to discuss the Genome Archive (GenArk) assembly hubs. This first post discusses accessing the GenArk hubs.<\/p>\n","protected":false},"author":2,"featured_media":967,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[20,6,19],"class_list":["post-964","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-assembly-hub","tag-browser","tag-genark"],"jetpack_featured_media_url":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2021\/11\/gatewaySearchDog.png","_links":{"self":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts\/964","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/comments?post=964"}],"version-history":[{"count":7,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts\/964\/revisions"}],"predecessor-version":[{"id":1015,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts\/964\/revisions\/1015"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/media\/967"}],"wp:attachment":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/media?parent=964"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/categories?post=964"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/tags?post=964"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}