{"id":969,"date":"2021-12-06T18:56:15","date_gmt":"2021-12-06T18:56:15","guid":{"rendered":"https:\/\/hgblog1.gi.ucsc.edu\/blog\/?p=969"},"modified":"2021-12-14T01:08:08","modified_gmt":"2021-12-14T01:08:08","slug":"genark-hubs-part-2","status":"publish","type":"post","link":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/2021\/12\/06\/genark-hubs-part-2\/","title":{"rendered":"GenArk Hubs Part 2 &#8211; Using the data"},"content":{"rendered":"\n<p>This blog post is the <a href=\"https:\/\/hgblog1.gi.ucsc.edu\/blog\/?s=GenArk\">second of three<\/a> to discuss the Genome Archive (GenArk) assembly hubs. This second post discusses examples of using the GenArk hubs\u2019 data, with the <a href=\"https:\/\/hgblog1.gi.ucsc.edu\/blog\/2021\/11\/23\/genark-hubs-part-1\/\">first post<\/a> about accessing the data, and the third shares technical infrastructure behind the hubs.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Before launching into using the new GenArk hubs, let\u2019s go quickly over how the first blog post examined the multiple ways to access the GenArk hubs. The easiest way to find a GenArk hub is by searching the UCSC Genome Browser\u2019s main <a href=\"https:\/\/genome.ucsc.edu\/cgi-bin\/hgGateway\">Gateway page<\/a> with a name like \u201chummingbird\u201d and clicking on the GCA\/GCF identifier to attach it.&nbsp; Another is to build direct links to NCBI GCA\/GCF assembly accessions when you know them to instantly arrive at the main Browser view, such as <a href=\"https:\/\/genome.ucsc.edu\/h\/GCF_005190385.1\">https:\/\/genome.ucsc.edu\/h\/GCF_005190385.1<\/a> for narwhal. Yet another is searching the UCSC Public Hubs page or going to the main <a href=\"https:\/\/hgdownload.soe.ucsc.edu\/hubs\/\">GenArk homepage<\/a> where you can in turn navigate directly to individual taxonomic group pages, such as for <a href=\"https:\/\/hgdownload.soe.ucsc.edu\/hubs\/birds\/index.html\">birds<\/a>.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">USING THE DATA&nbsp;<\/h1>\n\n\n\n<h2 class=\"wp-block-heading\">What can you do with a GenArk Assembly hub?<\/h2>\n\n\n\n<p>The new GenArk hubs come with the ability to perform BLAT DNA queries and PCR primer searches, as well as send the genome\u2019s DNA to external tools.<\/p>\n\n\n\n<p>As an example, let\u2019s say you are curious if we have a specific bat genome. The first step would be to go to the Gateway page and search \u201cbat\u201d and discover multiple hits.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/QTMvf8mw2qW-OUXSmdReJ03sTYRNKMWCou0ThCRcjPvs6DBCym_mU7dT0unx_pCoeOWmT0uX3_6ubalolcC-PEC2I0Y3LeBZonKL2XLz1KpnlleCEuT853frC2o1rHOWAfNs2KCl\" alt=\"\"\/><\/figure>\n\n\n\n<p>Looking at search results you see your desired specific \u201clittle brown bat\u201d assembly and click on it so that hub is now selected, where under \u201cFind Position\u201d on the right there would now be \u201cMammal assemblies Hub Assembly\u201d attached with \u201clittle brown bat\u201d displayed and a specific GCF_000147115.1 NCBI accession. Clicking the \u201cGo\u201d button would bring you to the main Browser display. The same result happens from clicking this short direct GenArk \/h\/ hub link: <a href=\"https:\/\/genome.ucsc.edu\/h\/GCF_000147115.1\">https:\/\/genome.ucsc.edu\/h\/GCF_000147115.1<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">BLAT DNA Search<\/h3>\n\n\n\n<p>With this bat genome displaying if you had a short DNA sequence you wanted to search, you could paste it right in the top search box. For instance, after clicking the above link, try pasting on the main browser display CATTAGGCAAATATATGCATATAAGTTCTTTGTTTAATCTCT and hit \u201cgo\u201d.&nbsp; The result, shown after a few seconds, will be sequence matches across the little brown bat genome. You can also go to the top Tools menu and then select \u201cBlat\u201d, and do the same step of pasting DNA sequence, required when searching especially longer strings. The Blat Tool page also allows you to search alternative sequences.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">BLAT Protein Search<\/h3>\n\n\n\n<p>On the Tools &gt; Blat page you can put in protein sequences to search. This is especially interesting if you want to find the location of a known protein from another species in your genome of interest. For example, if again you are on the little brown bat genome and you go to the Blat page, try to blat this portion of the human SOD1 protein:, LSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGN<\/p>\n\n\n\n<p>You will find a match (again note for protein searches, be sure to go to the Tools &gt; Blat page). When viewing the results you can either click a \u201cbrowser\u201d link to see the matching spots on the genome. Or if you click a \u201cdetails&#8221; link you will see the side-by-side alignment like this image below.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/SOTUqeyAkKe6dhc2kNsVoItSr_5Kti49Hdb_2EN-xN_WeJxwSMRWGLvWnKPUHDe8m5DMLYnYOe8BSWJjNWPiW1je2Xygv0pUN8Qj1csT0DgJjm_tnapcqM1-0msaqJ8lmcCWdibV\" alt=\"\"\/><\/figure>\n\n\n\n<p>Besides DNA and protein searches, BLAT also allows translated RNA and translated DNA searches. Also the results from BLAT searches can be saved as custom tracks. This allows you to download and save these annotations, or save them in Sessions making the results more permanent and shareable. See this other blog post about sharing sessions for more information: <a href=\"https:\/\/bit.ly\/UCSC_blog_sharing\">https:\/\/bit.ly\/UCSC_blog_sharing<\/a>&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">PCR Primer Search<\/h3>\n\n\n\n<p>GenArk also provides PCR primer searches, by going to the Tools menu and selecting PCR. With the same \u201clittle brown bat\u201d genome loaded, for instance, go to the Tools menu and select \u201cIn-Silico PCR\u201d to arrive on the PCR page. Then enter these two primers, forward primer: AGTCATGGTCTCAGGAACCG and reverse primer:&nbsp; GTTACTAGGGCTCAGACCTC&nbsp; (there is no need in this example to click any other settings).&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/VzoI-Or7kpCY6tg0OT8nTKOu2VPn90umqGAS2dBtSl9mx_3UXgLu95St1fzm6MOrD3x3eIU3Fa2Yi2jdQq6QO06LQrpJYYKnedBrYTB32TsjfReWTe__ABY7wWOk4lZDqu6rQhlq\" alt=\"\"\/><\/figure>\n\n\n\n<p>Then click \u201csubmit\u201d to search the \u201clittle brown bat\u201d genome for matches. <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/ELsUGvQsFcTNXwPE7IiHMldMwptPdUVjmpNv1XH-ecVYv8T0k9wU_VMsRw9wkxv5diEnyFprZV7Nd8NLbVr5Gruwl4wqfZuXr6JT2SJQtKfJAtAWiALzKlF1XyY0glLyeb_3QY6J\" width=\"624\" height=\"315\"><\/p>\n\n\n\n<p>The results will be two hits, in part because this assembly has 11,654 scaffolds with some identical sequences (to see all the scaffolds click the \u201cview sequences\u201d link on the Gateway page described later).&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Send DNA to External Tools<\/h3>\n\n\n\n<p>Another way to use GenArk hubs is to send the current DNA in the viewing window to external sites. By going to the View menu you can select the \u201cIn External Tools\u201d option and export the current DNA for processing outside of the UCSC Genome Browser.<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/HJZKCM8l3lLaXX7McPKkV1vecBYPdDyybBqqEZEYz25YNqcsw7MSl-Pg8aTmJ3OS_ybms0rImxJ2TnTz3qZ2xdSZ5V-vITWWIRAJyse_79qPCRBtWK61HnGwvXGlWf97fC-SsKK2\" width=\"624\" height=\"240\">In this image a 7,477 bp region will be selected to be sent to external sites where selecting \u201cIn External Tools\u201d under the View menu will result in a pop-up of various options.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/-g-s-Ur3B8y1nZkX2heLTMhI5-pj7s0U02RQHu77J6gSxY1pjsNNM9Idabf5UT4uBszqNW0Fx7o4v_rP-od5WexFAr02AVnrX_3pNzyeQp_3v_3LJ10Bj7LIZaU279Vj_AME9hvq\" alt=\"\"\/><\/figure>\n\n\n\n<p>In this case all of the options are presented as available for this 7,477 bp span, except for RNAfold, which requires the viewer to zoom in to less than 5 kpb, before sending the DNA to that external tool.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Send DNA to External Tools -Primer Design: Primer-BLAST<\/h4>\n\n\n\n<p>If you were interested in PCR Primer design in this region you could use the Primer3Plus or&nbsp; Primer-BLAST links. The Primer-BLAST link starts a job at NCBI, where after some time the results at NCBI will be optimal PCR Primers for this stretch of DNA. Here are example results sending the 7,477 bp&nbsp; span of the NW_005878708v1 little brown bat scaffold to NCBI.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/xcoFKy60ucyKJcu4uG0zmdHZixhUhwE0BS4k0E7Fhc6mZ9rmiovFBe0d1h1ETeEJ9_IWDoI06nsf9ut_GfQq-NU3WK-A9VXP-LvX106R9XPGTVXMR1JcBNzrTzicDUT0cPXRuRL1\" alt=\"\"\/><\/figure>\n\n\n\n<p>With these results, one can return to the UCSC PCR Tool to test each result in order to discover if these primers will have potential off-target results beyond the desired chromosome.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Send DNA to External Tools -Primer Design: Primer3Plus<\/h4>\n\n\n\n<p>Another PCR Primer design option in the \u201cIn External Tools\u201d menu is Primer3Plus. Here are example results sending the same 7,477 bp span of the NW_005878708v1 little brown bat scaffold to Primer3Plus.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/Y7AHBAthT6w3Tq9gYumGOkJG3VA9JhdWfnUJSruY2ebAVfC9zzzL2ZqK9KD31nhMGdiUjN-QD3uH3Q5IA0-moFHo_Ew2NWuf82D_uYK7mpmZeJ12GPuc-G8MLsW0Ur4ale-AtI5F\" alt=\"\"\/><\/figure>\n\n\n\n<p>Primer3Plus has the added benefits of a \u201cReturn to Genome Browser\u201d button (top left) that if clicked will dynamically generate a custom track of the results to be seen back on the UCSC Genome Browser.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/ffxJQV4o62MQyM1mZX12LaoiVWInz-brmIYJaQZxCoP_fgOQMYcfq7AUTftCqcWyiDyatymdOWPLLgTzCAuPNcOvOXeAXDrsF-UYEUMBeA7nJVltkw1g0iu2DG9eqkIBVsACmYRH\" alt=\"\"\/><\/figure>\n\n\n\n<p>Above the Primer3Plus custom track identifies the input region that was sent (top grey bar), and then the individual left and right matching primer pair locations. At UCSC the primers can then be tested again with the UCSC PCR Tool where a highlight for the Primer3Plus suggested \u201cPrimer 5\u201d is highlighted in the above image.&nbsp;<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Send DNA to External Tools for oligo-analysis&nbsp;<\/h4>\n\n\n\n<p>Another tool you can export DNA of interest to is Regulatory Sequence Analysis Tools (<a href=\"http:\/\/rsat.sb-roscoff.fr\/index.php\">RSAT<\/a>) Metazoa for motif discovery. For instance, when looking at a GenArk assembly for Zebu Cattle, <a href=\"https:\/\/genome.ucsc.edu\/h\/GCF_000247795.1\">https:\/\/genome.ucsc.edu\/h\/GCF_000247795.1<\/a>, using the View menu and In External Tools option one could select the RSAT link. RSAT provides a way to analyze the DNA sequence for transcription factor binding sites and over-represented oligo-nucleotides. Because RSAT requests your organism, in this example Bos taurus was used as a relative to zebu cattle, allowing for proceeding to request examination of&nbsp; the region. The DNA being sent in this example was near a region for the start of a gene predicted by Augustus. One of the RSAT results was a predicted motif, <strong><em>aaacttatagata<\/em><\/strong>, just upstream of the transcription start site for the predicted gene.<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/oInvqTDI4jl9O_DUozoKYrO384T2WsrC80Wk04LO01QEtHISGGNFn6LCaMRY2zJ391Y4852g_qW4wZYBOjxfR5m3aKN48NT4suezei-Vr8csq2iPCd1mijcs_Caje5I1Gro8JB4B\" width=\"624\" height=\"100\"><\/p>\n\n\n\n<p>By going back to the UCSC Genome Browser and clicking into the Short Match track (under the top Mapping section) and pasting in the motif sequence, <strong><em>aaacttatagata<\/em><\/strong>, a display in the GenArk hub of where these matches occurred could be visualized.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/JCeqRh-vBBqLVqao6ZKJkNtj0LHcUJRN57zedZ03xL3iCQLPgCNEcDondvbu-kRS5l8Not3VDHbCadtz5eGsEG616E8Y7LZZFnUkHW9gjlMkStSSj7H120KusW0IzuxTmBcqsAOj\" alt=\"\"\/><\/figure>\n\n\n\n<p>The Short Match track\u2019s ability to visualize the motif identifies the potential binding sites of transcription factors, predicted by RSAT.&nbsp; This Browser view of the Zebu Cattle GenArk assembly hub can be viewed with this<a href=\"https:\/\/genome.ucsc.edu\/cgi-bin\/hgPublicSessions?search=Zebu\"> Public Session link<\/a>.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Can I add custom tracks to a GenArk Assembly hub?<\/h2>\n\n\n\n<p>Yes, users can add tracks to their data by going to the My Data menu and then selecting Custom Tracks to paste in information. Simple text-based tracks can be loaded, or more complicated binary-indexed files such as BAMs or VCFs or bigBeds can be loaded as well.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How do I name my sequences for my custom tracks?<\/h2>\n\n\n\n<p>Another special feature of GenArk hubs is that they are loaded with a special chromAlias file allowing for multiple alias names. When building custom tracks the scaffold names for sequences need to match the names in the assembly, but many options exist. For instance, with the Zebu Cattle genome, <a href=\"https:\/\/genome.ucsc.edu\/h\/GCF_000247795.1\">https:\/\/genome.ucsc.edu\/h\/GCF_000247795.1<\/a>, if you type \u201cv s&#8221; to view sequences, or click the top \u201cGenomes\u201d name and then the \u201cview sequences\u201d button, you will end up on a page where all the scaffolds of a genome are displayed.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/CZNMVduFvgfMoZDnhrejJ-pdSXAn1QoO6ofKL6sh5QQ9VLA-EX4qNA-GRMX7GaOcs3gPjTnqxTQm5pNbp9o1C2zhDEaxplUqB_l8Pe9eM2_QbJ8ERie-NlKhNBO4JFdxVjhrEwv1\" alt=\"\"\/><\/figure>\n\n\n\n<p>Scrolling down you on the resulting page you will see a link titled \u201c<a href=\"https:\/\/hgdownload.soe.ucsc.edu\/hubs\/GCF\/000\/247\/795\/GCF_000247795.1\/GCF_000247795.1.chromAlias.txt\">GCF_000247795.1.chromAlias.txt<\/a>\u201d which will have results like this:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># sequenceName    alias names    assembly: GCF_000247795.1_Bos_indicus_1.0\nchr1    1    CM003021.1    NC_032650.1\nchr10    10    CM003030.1    NC_032659.1\nchr11    11    CM003031.1    NC_032660.1\n...<\/code><\/pre>\n\n\n\n<p>What this chromAlias.txt file displays is how \u201cchr1\u201d, or \u201c1\u201d, or \u201cCM003021.1\u201d or \u201cNC_032650.1\u201d can be used to create custom tracks on chromosome one for this assembly (i.e., BED custom tracks \u201cchr1 300 500\u201d = \u201c1 300 500\u201d = \u201cCM003021.1 300 500\u201d = \u201cNC_032650.1 300 500\u201d).&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Can I add a Track Hub to a GenArk Assembly Hub?<\/h2>\n\n\n\n<p>Yes, after loading a hub you can user go to the My Data menu and paste in the location of a hub to display on any of the GenArk assembly hubs. The one special detail is that your Track Hub\u2019s genomes.txt genomes line only needs to have the GCA\/GCF number such as \u201cgenome GCF_001984765.1\u201d. See this example<a href=\"https:\/\/data.cyverse.org\/dav-anon\/iplant\/home\/brianlee\/examples\/hub.txt\"> hub.txt<\/a> file for an idea of how a hub could be loaded on a GenArk hub.&nbsp; Here is a link that will load that hub on a GenArk hub for American beaver: <\/p>\n\n\n\n<p><a href=\"https:\/\/genome.ucsc.edu\/h\/GCF_001984765.1?position=NW_017869957v1:1,285,000-1,793,000&amp;hubUrl=https:\/\/data.cyverse.org\/dav-anon\/iplant\/home\/brianlee\/examples\/hub.txt\">https:\/\/genome.ucsc.edu\/<strong>h\/GCF_001984765.1?position=<\/strong>NW_017869957v1:1,285,000-1,793,000&amp;<strong>hubUrl<\/strong>=https:\/\/data.cyverse.org\/dav-anon\/iplant\/home\/brianlee\/examples\/hub.txt<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Can I share data on a GenArk assembly hub?<\/h2>\n\n\n\n<p>Yes, you can make a session and share the URL with others. Even better, publish your session to the Public Session page to make it more discoverable. See this previous blog about sharing data for more information:&nbsp; <a href=\"https:\/\/bit.ly\/UCSC_blog_sharing\">https:\/\/bit.ly\/UCSC_blog_sharing&nbsp;&nbsp;&nbsp;<\/a><\/p>\n\n\n\n<p>The next blog post in this series will provide some technical details about the GenArk hub architecture. The <a href=\"https:\/\/hgblog1.gi.ucsc.edu\/blog\/2021\/11\/23\/genark-hubs-part-1\/\">first post<\/a> focussed on how to discover and access the hubs.\u00a0<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>This entry written by Brian Lee. If after reading this blog post you have any public questions, please email genome@soe.ucsc.edu. All messages sent to that address are archived on a<a href=\"https:\/\/groups.google.com\/a\/soe.ucsc.edu\/forum\/#!forum\/genome\"> publicly accessible forum<\/a>. If your question includes sensitive data, you may send it instead to genome-www@soe.ucsc.edu.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog post is the second of three to discuss the Genome Archive (GenArk) assembly hubs. This second post discusses examples of using the GenArk hubs\u2019 data with BLAT, PCR, and External Tools. <\/p>\n","protected":false},"author":2,"featured_media":971,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[25,22,6,23,19,21,24],"class_list":["post-969","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-assembly-hubs","tag-blat","tag-browser","tag-external-tools","tag-genark","tag-pcr","tag-primer-design"],"jetpack_featured_media_url":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2021\/11\/BlogPost2.png","_links":{"self":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts\/969","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/comments?post=969"}],"version-history":[{"count":11,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts\/969\/revisions"}],"predecessor-version":[{"id":989,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts\/969\/revisions\/989"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/media\/971"}],"wp:attachment":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/media?parent=969"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/categories?post=969"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/tags?post=969"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}