If you think dogs can’t count, try putting three dog biscuits in your pocket and then giving Fido only two of them.
~Phil Pastoret
“Counting is easy. Right?”
I say this with my hand out, my thumb and 4 fingers spread out. With my other hand’s pointer finger, I simply count each digit, “one, two, three, four, five.” Easy.
But what happens when you start counting at 0 instead of 1? You can see that you have 5 digits (4 fingers and a thumb), but how do you calculate the size of your range?
With your hand in mind as an example, let’s look at counting conventions as they relate to bioinformatics and the UCSC Genome Browser genomic coordinate systems.
The UCSC Genome Browser uses two different systems:
Table 1. UCSC Genome Browser coordinate systems summary
0-start, half-open (0-based) | 1-start, fully-closed (1-based) |
“BED” format (Browser Extensible Data): chr1 127140000 127140001 Note: Spaces, not punctuation When using BED format, browser & utilities assume coords are 0-start, half-open. |
“Position” format: chr1:127140001-127140001 Note: Punctuation used, no spaces When using “position” format, browser & utilities assume coords are 1-start, fully-closed. |
Stored in UCSC Genome Browser tables | Positioned in UCSC Genome Browser web interface |
To convert to 1-start, fully-closed: add 1 to start, end = same |
To convert to 0-start, half-open: subtract 1 from start, end = same |
Section 1: Interval types
0-start vs. 1-start : Does counting start at 0 or 1?
Synonyms:
Sometimes referred to as “0-based” vs “1-based” or “0-relative vs “1-relative.”
Interval Types
For a counted range, is the specified interval fully-open, fully-closed, or a hybrid-interval (e.g., half-open)?
Ok, time to flashback to math class!
You might recall that specifying an interval type as open, closed (or a combination, e.g., “half-open”) refers to whether or not the endpoints of the interval are included in the set. For further explanation, see theinterval math terminology wiki article. Figure 1 below describes various interval types.
Figure 1. (To enlarge, click image.) Description of interval types.
Section 2: Interval types in the UCSC Genome Browser
UCSC Genome Browser web interface = “1-start, fully-closed”
A common counting convention is a system that we all used when we first learned to count the fingers on our hands; this is referred to as the “one-based, fully-closed” system (Figure 2, below). Note that an extra step is needed to calculate the range total (5).
The “1-start, fully-closed” system is what you SEE when using the UCSC Genome Browser web interface. However, all positional data that are stored in database tables use a different system.
Figure 2. (To enlarge, click image.) 1-start, fully-closed interval. Most common counting convention. Used within the UCSC Genome Browser web interface (but not used in UCSC Genome Browser databases/tables). We calculate that we have 5 digits because 5 (pinky finger, range end) – 1 (the thumb, range start) = 4. We then need to add one to calculate the correct range; 4+1= 5.
UCSC Genome Browser tables = “0-start, half-open”
While the commonly-used “one-start, fully-closed” system is more intuitive, it is not always the most efficient method for performing calculations in bioinformatic systems, because an additional step is required to calculate the size of the base-pair (bp) range.
To increase efficiency, the UCSC Genome Browser uses a “hybrid-interval” coordinate system for storing coordinates in databases/tables that is referred to as “0-start, half-open” (see Figure 3, below).
Although coordinates in the web browser are converted to the more human-readable “1-start, fully-closed” system, coordinates are stored in database tables as “0-start, half-open.” You may have heard various terms to express this 0-start system:
Synonyms for “0-start, half-open”
- 0-based, half-open
- 0-based start, 1-based end
- Note: This is not technically accurate, but conceptually helpful. A “1-based end” refers to the end of the range being included, as in the common “1-based, fully-closed” system.
- 0-start, hybrid-interval (interval type is: start-included, end-excluded)
Figure 3. (To enlarge, click image.) The UCSC Genome Browser coordinate system for databases/tables (not the web interface) is “0-start, half-open” where start is included (closed-interval), and stop is excluded (open-interval). We calculate that we have 5 digits because 5 (range end after pinky finger) – 0 (the thumb, range start) = 5.
Another example which compares 0-start and 1-start systems is seen below, in Figure 4. This figure describes the differences in defining and calculating the range for a specified sequence highlighted in yellow, “T, C, G, A.”
Figure 4. (To enlarge, click image.) Calculation of genomic range for comparing “1-start, fully-closed” vs. “0-start, half-open” counting systems.
Section 3: Formatting
Coordinate formatting indicates interval type
The UCSC Genome Browser and many of its related command-line utilities distinguish two types of formatted coordinates and make assumptions of each type.
The “Position” format (referring to the “1-start, fully-closed” system as coordinates are “positioned” in the browser)
- Written as: chr1:127140001-127140001
- No spaces.
- Includes punctuation: a colon after the chromosome, and a dash between the start and end coordinates.
- When in this format, the assumption is that the coordinate is 1-start, fully-closed.
The “BED” format (referring to the “0-start, half-open” system)
- Written as: chr1 127140000 127140001
- Spaces between chromosome, start coordinate, and end coordinate.
- No punctuation.
- When in this format, the assumption is that the coordinates are 0-start, half-open.
Section 4: Examples
SNP example
What we SEE in the Genome Browser interface itself is the “1-start, fully-closed” system. However, these data are not STORED in the UCSC Genome Browser databases and tables in the same way. The UCSC Genome Browser databases store coordinates in the “0-start, half-open” coordinate system.
Table 2. SNP coordinates in web browser (1-start) vs table (0-start)
rs782519173 (hg38) | Start | End |
Positioned in web browser: 1-start, fully-closed | 133255708 | 133255708 |
Stored in table: 0-start, half-open | 133255707 | 133255708 |
LiftOver examples and coordinate formatting
Let’s take a look at the two types of coordinate formatting (“BED” and “position”) when using the UCSC Genome Browser web-based and command-line utility liftOver tools.
1) Web-based LiftOver example
Below is an example from the UCSC Genome Browser’s web-based LiftOver tool (Home > Tools > LiftOver). Depending on how input coordinates are formatted, web-based LiftOver will assume the associated coordinate system and output the results in the same format.
Table 3. UCSC Genome Browser web-based LiftOver and “position” coordinate formatting
Input: | Assembly = panTro3 chr1:127140001–127140001 |
Output: | Lifts to this position in hg19: chr1:110255313–110255313 |
Notes: | If your input is entered with the “position” formatted coords (1-start, fully-closed), the browser will also output the same “position” format. (Note positional format includes “:” and “-” and no spaces.) |
Table 4. UCSC Genome Browser web-based LiftOver and “BED” coordinate formatting
Input: | Assembly = panTro3 chr1 127140000 127140001 |
Output: | Lifts to this position in hg19: chr1 110255312 110255313 |
Notes: | If your input is entered with the “BED” formatted coords (0-start, half-open), the browser will also output the same “BED” format. (Note BED format contains no punctuation and includes spaces.) |
* Note that the web-based output file extension is misleading in this case; while titled “*.bed” the positional output is not actually in “0-start, half-open” BED format, because the 1-start, fully-closed “positional” format was used for input.
2) Command-line liftOver utility example
When using the command-line utility of liftOver, understanding coordinate formatting is also important. Just like the web-based tool, coordinate formatting specifies either the “0-start half-open” or the “1-start fully-closed” convention. For example, if you have a list of 1-start “position” formatted coordinates, and you want to use the command-line liftOver utility, you will need to specify in your command that you are using “position” formatted coordinates to the liftOver utility.
To view the liftOver utility usage statement and options, enter “liftOver” on your command-line (with no other arguments, and without the quotes).
Table 5. UCSC Genome Browser command-line liftOver and “position” coordinate formatting
Input: (panTro3.txt) |
chr1:127140001–127140001 |
Command: | liftOver -positions panTro3.txt liftOver/panTro3ToHg19.over.chain.gz mapped unMapped |
Output: | chr1:110255313–110255313 via “mapped” file for hg19 |
Notes: | Note: Must specify “-positions” for 1-start “position” format in command-line liftOver |
Table 6. UCSC Genome Browser command-line liftOver and “BED” coordinate formatting
Input: (panTro3.bed) |
chr1 127140000 127140001 |
Command: | liftOver panTro3.bed liftOver/panTro3ToHg19.over.chain.gz mapped unMapped |
Output: | chr1 110255312 110255313 via “mapped” file for hg19 |
Notes: | Note: No special argument needed, 0-start “BED” formatted coordinates are default. |
Wiggle Files
The wiggle (WIG) format is used for dense, continuous data where graphing is represented in the browser. Wiggle files of variableStep or fixedStep data use “1-start, fully-closed” coordinates. Like all other UCSC Genome Browser data, these coordinates are positioned in the browser as “1-start, fully-closed.”
Note: Many other formats outside of the UCSC Genome Browser use 1-start coordinate systems, such as GTF/GFF.
Table 7. UCSC Genome Browser wiggle files & coordinate systems
File Type | Wiggle file | Coordinate system as positioned in UCSC Genome Browser |
bedGraph -> bigWig | 0-start, half-open | 1-start, fully-closed |
wiggle variableStep -> bigWig | 1-start, fully-closed | 1-start, fully-closed |
wiggle fixedStep -> bigWig | 1-start, fully-closed | 1-start, fully-closed |
Section 5: Resources
- Sequence Coordinates: 0- vs 1-base, Bob Milius, PhD [pdf]
- Cheat Sheet For One-Based Vs Zero-Based Coordinate Systems [Biostars Forum]
- Database/browser start coordinates differ by 1 base [UCSC Genome Browser, FAQ]
- Genome wiki: Coordinate Transforms [UCSC Genome Browser Wiki: “genomewiki”]
- UCSC Genome Browser: wiggle format help page
- Kent WJ, Zweig AS, Barber G, Hinrichs AS, Karolchik D. BigWig and BigBed: enabling browsing of large distributed data sets. Bioinformatics. 2010 Sep 1;26(17):2204-7. Epub 2010 Jul 17.
If after reading this blog post you have any public questions, please email genome@soe.ucsc.edu. All messages sent to that address are archived on a publicly accessible forum. If your question includes sensitive data, you may send it instead to genome-www@soe.ucsc.edu.