{"id":197,"date":"2015-06-30T15:32:45","date_gmt":"2015-06-30T22:32:45","guid":{"rendered":"http:\/\/genome.ucsc.edu\/blog\/?p=197"},"modified":"2021-10-30T22:17:10","modified_gmt":"2021-10-30T22:17:10","slug":"new-default-gene-set-on-grch38-gencode-basic-genes","status":"publish","type":"post","link":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/2015\/06\/30\/new-default-gene-set-on-grch38-gencode-basic-genes\/","title":{"rendered":"New default gene set on GRCh38: GENCODE Basic genes"},"content":{"rendered":"<div id=\"attachment_198\" style=\"width: 370px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/genome.ucsc.edu\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-29-at-3.32.45-PM.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-198\" class=\"  wp-image-198\" src=\"http:\/\/genome.ucsc.edu\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-29-at-3.32.45-PM-300x100.png\" alt=\"Screen Shot 2015-06-29 at 3.32.45 PM\" width=\"360\" height=\"120\" srcset=\"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-29-at-3.32.45-PM-300x100.png 300w, https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-29-at-3.32.45-PM-1024x343.png 1024w, https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-29-at-3.32.45-PM-624x209.png 624w, https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-content\/uploads\/2015\/06\/Screen-Shot-2015-06-29-at-3.32.45-PM.png 1572w\" sizes=\"auto, (max-width: 360px) 100vw, 360px\" \/><\/a><p id=\"caption-attachment-198\" class=\"wp-caption-text\">Genome Browser screen shot of the GRCh38 (hg38) human assembly showing the GENCODE Basic track opened in the PTEN region on chromosome 10.<\/p><\/div>\n<p style=\"text-align: left;\"><span class=\"im\">As of Monday, July 29, 2015, the UCSC Genome Browser will use&nbsp;the GENCODE v22 comprehensive gene set as its default gene set on the human genome assembly GRCh38 (hg38), replacing the previous default&nbsp;set of genes created here at UCSC using code written by Jim Kent.&nbsp;<\/span>This track, which is labeled as &#8220;GENCODE Basic&#8221; in the Genes and Gene&nbsp;Predictions track group, replaces UCSC Genes track as the default gene set.&nbsp; We&#8217;re making this change in&nbsp;<span class=\"im\">recognition of the value of reducing the number of competing gene sets&nbsp;<\/span>used by the bioinformatics community.&nbsp; With this change we will be&nbsp;using the same set of genes as Ensembl, reducing the potential for confusion, especially in clinical settings.<br \/>\n<span class=\"im\"><br \/>\nWe&#8217;ve kept the same familiar UCSC Genes schema for the new gene set,&nbsp;using nearly all the same table names and fields that appeared in&nbsp;earlier versions of UCSC Genes. Hopefully this will make the&nbsp;transition to the new GENCODE models easier. Every transcript in the&nbsp;new set has both a UCSC ID and a GENCODE transcript ID. There are a&nbsp;couple of new tables: knownCds, which has the coding frame numbers for&nbsp;each gene, and knownToMrna, which captures the association to GenBank&nbsp;mRNAs. A couple tables are no longer present: knownGeneTxMrna and knownGeneTxPep.<\/span><\/p>\n<p>By default, we display only the transcripts tagged as \u201cbasic\u201d by the&nbsp;GENCODE Consortium. However, all the transcripts in the GENCODE&nbsp;comprehensive set are present in the tables. You can view them in the&nbsp;browser by selecting &#8220;show comprehensive set&#8221; in the &#8220;Show&#8221; section&nbsp;of the track\u2019s description page. On that same page, you can also&nbsp;<span class=\"im\">configure the browser to label the genes with the GENCODE transcript&nbsp;<\/span>IDs by selecting &#8220;GENCODE Transcript ID&#8221; label option.<br \/>\n<span class=\"im\"><br \/>\nThe new gene set has 195,178 total transcripts, compared with 104,178&nbsp;in the previous UCSC Genes version. The total number of canonical&nbsp;genes, now defined using the GENCODE gene loci ( ENSG* identifiers), has increased&nbsp;from 48,424 to 49,534.<\/span><\/p>\n<p>Comparing the previous gene set with the new version:<\/p>\n<ul>\n<li>9,459 transcripts are identical.<\/li>\n<li>22,088 transcripts were not carried forward to the new version.<\/li>\n<li>43,681 have consistent splicing, but changes in the UTR.<\/li>\n<li>28,950 transcripts overlap with those in the previous set, but have<br \/>\nat least one different splice.<\/li>\n<\/ul>\n<p>We plan to continue using the previous UCSC computational pipeline to&nbsp;generate the default gene set on the mouse assembly, GRCm38 (mm10),&nbsp;for the foreseeable future. We will also periodically update the old&nbsp;UCSC-computed gene set on the human GRCh38 assembly as an ancillary&nbsp;track (&#8220;Old UCSC Genes&#8221;) without the rich set of link-outs we maintain for the default gene set.<\/p>\n<hr>\n<p>If after reading this blog post you have any public questions, please email <a href=\"mailto:genome@soe.ucsc.edu\" target=\"_blank\" rel=\"noopener\">genome@soe.ucsc.edu<\/a>. All messages sent to that address are archived on a <a href=\"https:\/\/groups.google.com\/a\/soe.ucsc.edu\/forum\/#!forum\/genome\">publicly accessible forum<\/a>. If your question includes sensitive data, you may send it instead to&nbsp;<a href=\"mailto:genome-www@soe.ucsc.edu\" target=\"_blank\" rel=\"noopener\">genome-www@soe.ucsc.edu<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As of Monday, July 29, 2015, the UCSC Genome Browser will use&nbsp;the GENCODE v22 comprehensive gene set as its default gene set on the human genome assembly GRCh38 (hg38), replacing the previous default&nbsp;set of genes created here at UCSC using code written by Jim Kent.&nbsp;This track, which is labeled as &#8220;GENCODE Basic&#8221; in the Genes [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-197","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts\/197","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/comments?post=197"}],"version-history":[{"count":12,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts\/197\/revisions"}],"predecessor-version":[{"id":948,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/posts\/197\/revisions\/948"}],"wp:attachment":[{"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/media?parent=197"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/categories?post=197"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/genome-blog.gi.ucsc.edu\/blog\/wp-json\/wp\/v2\/tags?post=197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}