Supplementary Information for "Initial Sequencing and Analysis of the Human Genome"
Nature, V412, 565

 

International Human Genome Sequencing Consortium.

  1. Additional author credits
  2. Full author list
  3. Methods, including supplementary information for Table 24
  4. Errata, including an additional literature citation
  5. Additional acknowledgements
  1. for unpublished sequence data
  2. for published sequence data

1. Additional author credits

Six additional investigators should have been included as authors of this paper. They are Pieter de Jong, Joseph J. Catanese, and Kazutoyo Osoegawa (Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, New York 14263 USA; current address: Children's Hospital Oakland Research Institute, 747 52nd Street Oakland, CA 94609) and Hiroaki Shizuya, Sangdun Choi, and Yu-Juin Chen (Division of Biology, California Institute of Technology, Pasadena, California 91125 USA). These investigators and their laboratories constructed the high-quality BAC libraries that were the major sources of large-insert clones used in the sequencing of the human genome, as indicated in Table 1. These libraries were central to the project and the work had not previously been published. We apologize to our valued colleagues for this omission.

 

2. Full Author List

Genome Sequencing Centers. The centers are listed in order of total genomic sequence contributed.

Whitehead Institute for Biomedical Research, Center for Genome Research, Nine Cambridge Center, Cambridge, MA 02142, USA: Emmanuel Adekoya, Mostafa Ait-Zahra, Nicole Allen, Mechele Anderson, Scott Anderson, Faina Anufriev, Jeff Armbruster, Kifle Ayele, Jodi Baker, Jennifer Baldwin, Nicole Barna, Vertilda Bastien, Serafim Batzoglou, Reem Beckerly, Felicienne Beda, John Bernard, Bruce Birren, Bruce Birren, Brendan Blumensteil, Leonid Boguslavsky, Boris Boukghalter, Adam Brown, Greg Burkett, Jody Camarata, Amy Campopiano, Herman Carneiro, Zhuan Chen, Yama Choephal, Mary Colangelo, Sonya Collins, Alville Collymore, Patrick Cooke, Christopher Davis, Tenzin Dawoe, Kurt DeArellano, Keri Devon, Ken Dewar, J. Sebastian Diaz, Sheila Dodge, Elizabeth Donelan, Kunsang Dorjee, Michael Doyle, Antionise Dube, Alan Dupes, Matt Endrizzi, Abderrahim Farina, Susan Faro, Diallo Ferguson, Pat Ferriera, Heather Fischer, William FitzHugh, Ken Flaherty, Karen Foley, Roel Funke, Diane Gage, James Galagan, Stephanie Gardyna, Diane Gilbert, Samir Ginde, Antonio Gomes, Mary Goyette, Joseph Graham, Leslie Graham, Edward Grandbois, Nerline Grand-Pierre, George Grant, Dave Gregoire, Roth Guerrero, Birhane Hagos, Katrina Harris, David Hart, Beah Hatcher, Andrew Heaford, Lloyd Horton, Catherine Hosage-Norman, John Howland, Bill Hulme, Ilian Iliev, Robin Johnson, Charlein Jones, Marie Joseph, Mathew Judd, Lisa Kann, Aysen Karatas, Damian Kelley, Merrilee Kelly, Dawa Lama, Jenny Lamazares, Eric S. Lander, Thomas Landers, Addie Lane, Keri LaRocque, Heidi LeBlanc, Jean-Pierre Leger, Jessica Lehoczky, Rosie LeVine, Doreen Lewis, Tammy Lewis, Charlien Lieu, Lauren Linton, Grace Liu, Xiaohong Liu, Kim Locke, Yeshi Lokyitsang, Pen Macdonald, Rogelio Martinez, Kebede Maru, Megan McCarthy, Paul McEwan, Tina McGhee, Brian McGing, Aisling McGurk, Kevin McKernan, Jacque McLaughlin, Robert McPheeters, James Meldrim, Louis Meneus, Jill Mesirov, Tanya Mihova, Cher Miranda, Val Mlenga, Michelle Modeski, Geoff Montello, William Morris, Jenn Morrow, Leon Mulrain, Thomas Murphy, Josef Mychaleckyj, Jerome Naylor, Christian Newes, Tsering Ngodup, Cindy Nguyen, Thu Nguyen, Chou Dolma Norbu, Nyima Norbu, Chad Nusbaum, Tara O�Connor, Paula O'Donnell, Yousef Okaf, Dominic O'Neil, Jon O'Shea, Sahal Osman, Matt Paresi, Boris Pavlin, K.M. Peterson, Pema Phunkang, Nadia Pierre, Victor Pollara, Christina Raymond, Melanie Rieback, Beckie Riley, Cecil Rise, Peter Rogov, Joe Roman, Magaly Roman, Mark Rosetti, Deborah Rothman, Alice Roy, Karen Roycroft, Ralph Santos, Steven Schauer, Rebecca Schupbach, Steven Seaman, Andrew Sheridan, Cherylyn Smith, Carrie Sougnez, Thomas Speece, Brian Spencer, Nicole Stange-Thomann, Nikola Stojanovic, Casey Stone, Nathaniel Strauss, Aravind Subramanian, Jessica Talamas, Pierre Tchuinga, Mark Temelko, Pema Tenzin, Senait Tesfaye, Joumathe Theodore, Andrea Tirrell, Imani Torruella-Miller, Tee Trac, Mary Travers, Niki Travis, James Trigilio, Elsa Tsao, Helen Vassiliev, Rose Veil, Andy Vo, Alan Wagner, Jamie Walsh, Tsering Wangdi, Jamey Wierzbowski, Bennet Wilson, Xaioyun Wu, Dudley Wyman, Wen Juan Ye, Shane Yeager, Rahel Retta Yeshitela, Geneva Young, Joanne Zainoun, Andrew Zimmer and Michael C. Zody

The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1RQ, United Kingdom: Zahra Abdellah, Alireza Ahmadi, Shahana Ahmed, Matthew Aimable, Rachael Ainscough, Jeff Almeida, Andrew Ambler, Karen Ambrose, Kerrie Ambrose, Daniel Andrews, Neil Andrews, Hazel Arbery, Beth Archer, Gareth Ash, Kevin Ashcroft, Jennifer Ashurst, Robert Ashwell, Deborah Atkin, Andrea Atkinson, John Attwood, Keith Aubin, Terry Avis, Anne Babbage, Joanne Bacon, Claire Bagguley, Jonathan Bailey, Andrew Baker, Simon Bardill, Darren Barker, Karen Barlow, Laurent Baron, Anika Barrett, Rebecca Bartlett, David Basham, Victoria Basham, Alex Bateman, Karen Bates, Caroline Baynes, Lisa Beard, Susan Beard, David Beare, Alastair Beasley, Oliver Beasley, Stephan Beck, Emma Bell, Damian Bellerby, Tristram Bellerby, Richard Bemrose, James Bennett, David Bentley, Mary Berks, Michael Berks, Graeme Bethel, Christine Bird, Ewan Birney, Helen Bissell, Suzanne Blackburne-Maze, Sarah Blakey, Ralph Bonnett, Richard Border, Nicola Brady, Jason Bray, Sarah Bray-Allen, Anne Bridgeman, Jonathan Brook, Shane Brooking, Andrew Brown, Clive Brown, Jacqui Brown, Margaret Brown, Mary Brown, Richard Bruskiewich, Jackie Bryant, David Buck, Veronica Buckle, Claire Budd, Jill Burberry, Deborah Burford, Joanne Burgess, Wayne Burrill, Christine Burrows, John Burton, Phil Butcher, Adam Butler, Murray Cairns, Bruno Canning, Carol Carder, Paul Carder, Nigel Carter, Tamara Cavanna, Ka Chan, Joanna Chapman, Rachel Charles, Tom Chothia, Connie Chui, Michele Clamp, Anthea Clark, Graham Clark, Kevin Clark, Sarah Clark, Sue Clark, Betty Clarke, Eddie Clarke, Kay Clarke, Chris Clee, Sheila Clegg, Karen Clifford, Julia Coates, Victoria Cobley, Alison Coffey, Penelope Coggill, Lotte Cole, Rachael Collier, Simon Collings, John Collins, Philip Collins, Richard Connor, Jennie Conquer, Donald Conroy, Doug Constance, Leanna Cook, Jonathan Cooper, Rachel Cooper, Robert Cooper, Teresa Copsey, Nicole Corby, Linda Cornell, Ruth Cornell, Amanda Cottage, Alan Coulson, Gez Coville, Anthony Cox, Tony Cox, Robert Coxhill, Matthew Craig, Tom Crane, Matt Crawley, Victor Crew, James Cuff, Karl Culley, Auli Cummings, Kirsti Cummings, Paul Cummings, Adam Curran, Valery Curwen, Jeffrey Cutts, Rachael Daniels, Lucy Davidson, Jonathon Davies, Joy Davies, Nicholas Davies, Robert Davies, John Davis, Elisabeth Dawson, Rebecca Deadman, Peter Dean, Simon Dear, Frances Dearden, Marcos Delgado, Panos Deloukas, Janet Dennis, Pawandeep Dhami, Catherine Dibling, Ruth Dobbs, Richard Dobson, Catherine Dockree, Daniel Doddington, Steven Dodsworth, Norman Doggett, Andrew Dunham, Ian Dunham, Anne Dunn, Matthew Dunn, Richard Durbin , Jillian Durham, Ruth Dwyer, Mark Earthrowl, Timothy Eastham, Carol Edwards, Karen Edwards, Andrew Ellington, Matthew Ellwood, Becky Emberson, Helen Errington, Gareth Evans, John Evans, Katie Evans, Richard Evans, Theresa Feltwell , Stephen Fennell, Robert Finn, Tina Flack, Kerry Fleming, Jonathan Flint, Mark Flint, Yvonne Floyd, Simon Footman, John Fowler, Deborah Frame, Matthew Francis, Stephen Francis, John Frankland, Audrey Fraser, David Fraser, Lisa French, Daniel Frost, Jackie Frost, Lorna Frost, Carole Frost , Liam Fuller, Kathryn Fullerton, Alison Gardner, Patrick Garner, Jane Garnett, Leigh Gatland, Lindsay Gatland, Jilur Ghori, Ben Gibbs, Diane Gibson, James Gilbert, Lisa Gilby, Christopher Gillson, Matthew Gorton, Darren Grafham, Michael Grant, Susan Grant, Iain Gray, Lisa Green, James Greenhalgh, Joe Greenhill, Philippa Gregg, Simon Gregory, Coline Griffiths, Ed Griffiths, Mark Griffiths, Ian Guthrie, Rhian Gwilliam, Rebekah Hall, Karen Halls, Gretta Hall-Tamlyn, John Hamlett, Sian Hammond, Julie Hancock, Adam Harding, Joanne Harley, David Harper, Georgina Harper, Grant Harradence, Charlene-Lou Harrison, Ruth Harrison, Daniel Hassan, Natalie Hawkins, Kellie Hawley, Kerry Hayes, Paul Heath, Rosemary Heathcott, Cathy Hembry, Tim Herd, Stephen Hewitt, Douglas Higgs, Guy Hillyard, Russell Hinkins, Sara-jane Ho, David Hodgson, Michael Hoffs, Jane Holden, Janet Holdgate, Ele Holloway, Ian Holmes, Sarah Holmes, Simon Holroyd, Alison Hooper, Lucy Hopewell, Ben Hopkins, Gary Hornett, Geoff Hornsby, Tony Hornsby, Sharon Horsley, Roger Horton, Philip Howard, Philip Howden, Gareth Howell, Timothy Hubbard, Elizabeth Huckle, Jaime Hughes, Jennifer Hughes, Louisa Hull, Holger Hummerich, Sean Humphray, Matthew Humphries, Adrienne Hunt, Paul Hunt, Sarah Hunt, David Hyde, Michael Ince, Judith Isherwood, Janet Izatt, Monica Izmajlowicz, Niclas Jareborg, Bijay Jassal, Grant Jeffery, Kim Jeffery, Colin Jeffrey, Kerstin Jekosch, Lee Jenkins, Tina Johansen, Cheryl Johnson, Christopher Johnson, David Johnson, Keith Jolley, Abigail Jones, Claire Jones, Juliet Jones, Matthew Jones, Michael Jones, Steven Jones, Shirin Joseph, Ann Joy, Linsey Joy, Victoria Joy, Gillian Joyce, Mark Jubb, Kanchi Karunaratne, Michael Kay, Danielle Kaye, Lyndal Kearney, Simon Kelley, Joanna Kershaw, Ross Kettleborough, Cathy Kidd, Peter Kierstan, Andrew Kimberley, Andrew King, Simon Kingsley, Gillian Klingle, Andrew Knights, Anders Krogh, Philip Laidlaw, Michael Laing, Gavin Laird, Christine Lambart, Ralph Lamble, Cordelia Langford, Timun Lau, Stephanie Lawlor, Sampsa Leather, Minna Lehvaslaiho, Steven Leonard, Daniel Leongamornlert, Margaret Leversha, Julia Lightning, Sarah Lindsay, Matthew Line, Sally Linsdell, Peter Little, Christine Lloyd, David Lloyd, Victoria Lock, William Lock, Anne Lodziak, Ian Longden, Howard Loraine, Rachel Lord, Jamie Lovell, Georgina Lye, Neil Marriott, Anna Marrone, Paul Marsden, Victoria Marsh, Matthew Martin, Sancha Martin, Gareth Maslen, Debbie Mason, Lucy Matthews, Paul Matthews, Madalynne Maynard, Owen McCann, Joseph McClay, Craig McCollum, Louise McConnachie, Bill McDonald, Louise McDonald, Jennifer McDowall, Carole McKeown, Stuart McLaren, Kirsten McLay, James McLean, John McMurdo, Amanda McMurray, Des McMurray, Natalie McWilliams, Nalini Mehta, Noel Menuge, Simon Mercer, Asab Miah, Gos Micklem, Simon Miles, Sarah Milne, Dippica Mistry, Shailesh Mistry, Jake Mitchell, Jeff Mitchell, Maryam Mohammadi, Christophe Molina, Paul Mooney, Madeline Moore, Andrea Moreland, Beverley Mortimore, Richard Mott, Jim Mullikin, Brian Munday, Elaine Munday, Andy Mungall, Clare Murnane, Kerry Murrell, Alison Myers, David Negus, David Niblett, Jonathan Nicholson, Tim Nickerson, Sukhjit Nijjar, Zemin Ning, James Nisbet, Christopher Odell, Daniel O'Donovan, Francess Ogbighele, Tom Oinn, Hayley Oliver, Karen Oliver, Helena Orbell, Anthony Osborn, Joan Osborne, Emma Overton-Larty, Christopher Parkin, Kim Parkin, Ginny Parry-Brown, Dina Patel, Ritesh Patel, Alexandra Pearce, Danita Pearson, Anna Peck, Richard Peck, John Peden, Chantal Percy, Andrew Perito, Isabelle Perrault, Anna Peters, Roger Pettett, Ben Phillimore, Kim Phillips, Samantha Phillips, Darren Platt, Emma Playford, Bob Plumb, Matthew Pocock, Keith Porter, Christopher Potter, Simon Potter, Don Powell, Radhika Prathalingham, Michael Quail, Chris Quince, Matloob Qureshi, Helen Ramsay, Yvonne Ramsey, Sally Ranby, Richard Rance, Vikki Rand, Joanne Ratford, Lewis Ratford, Daniel Read, Donald Redhead, Christine Rees, Mary Reid, Astrid Reinhardt, Alex Rice, Catherine Rice, Peter Rice, Suzzanne Richard, Susan Richardson, Kerry Ridler, Lyn Riethoven, Melanie Robinson, Rebecca Rochford, Jane Rogers, Lisa Rogers, Hugh Ross, Mark Ross, Angela Rule, James Rule, Ben Russell, Jayne Rutter, Kamal Safdar, Natalie Salter, Javier Santoyo-Lopez, David Saunders, Carol Scott, Deborah Scott, Ian Scott, Fiona Seager, Margaret Searle, Paul Searle, Harminder Sehra, Jason Shardelow, Greg Sharp, Teresa Shaw, Charles Shaw-Smith, Jennifer Shearing, Karen Sheppard, Richard Sheppard, Elizabeth Sheridan, Ratna Shownkeen, Richard Silk, Matthew Sims, Sarah Sims, Shanthi Sivadasan, Carl Skuce, Luc Smink, Andrew Smith, Laura Smith, Lorraine Smith, Michelle Smith, Russell Smith, Stephanie Smith, Hannah Sneath, Cari Soderlund, Victor Solovyev, Erik Sonnhammer, Elizabeth Sotheran, Lee Spraggon, Janet Squares, Suzanna Squares, Michael Stables, James Stalker, Steve Stamford, Melanie Stammers, Helen Steingruber, Yvonne Stephens, Charles Steward, Aengus Stewart, Michael Stewart, Mo Stock, Lisa Stoppard, Philip Storey, Carol Strachan, Greg Strachan, Claire Stribling, John Sturdy, John Sulston, Chris Swainson, Mark Swann, Neil Sycamore, Matthew Tagney, Steven Tan, Elizabeth Tarling, Amy Taylor, Gillian Taylor, Kate Taylor, Ruth Taylor, Ruth Taylor, Sam Taylor, Susan Taylor, Louise Tee, Julieanne Tester, Andrew Theaker, Craig Thomas, Daniel Thomas, Karen Thomas, Ruth Thomas, Roselin Thommai, Andrea Thorpe, Karen Thorpe, Glen Threadgold, Emma Tinsley, Alan Tracey, Jonathan Travers, Anthony Tromans, Ben Tubby, Cristina Tufarelli, Kathryn Turney, Darren Upson, Mark Vaudin, Ramya Viknaraja, Wendy Vine, Paul Voak, Sarah Walker, Melanie Wall, Justine Wallis, Michelle Wallis, Graham Warren, Georgina Warry, Andy Watson, Anthony Webb, Jeannette Webb, Alan Wells, Sarah Wells, Robert Welton, Paul West, Tony West, Angela Wheatley, Carl Wheatley, Gideon Wheeler, Hayley Whitaker, Adam White, Amelia White, Brian White, Johnathon White, Simon White, Matthew Whiteley, Adam Whittaker, Pamela Whittaker, Sara Widaa, Anna Wild, Jane Wilkinson, Paul Wilkinson, David Willey, Andy Williams, Bill Williams, Leanne Williams, Sophie Williams, Helen Williamson, Tamsin Wilmer, Laurens Wilming, Brian Wilson, Gareth Wilson, Margaret Wilson, Nyree Wilson, Siobhan Wilson, Wendy Wilson, Philip Window, Jenny Winster, James Witt, Fred Wobus, Emma Wood, Joe Wood, Sharon Woodeson, Rebecca Woodhouse, Richard Wooster, Matthew Wray, Paul Wray, Charmain Wright, Kathrine Wright, Julia Wyatt, Jane Xie, Louise Young, Sheila Young, Ruth Younger and Shenru Zhao

Washington University Genome Sequencing Center, Box 8501, 4444 Forest Park Avenue, St. Louis, MO 63108, USA: Sabiha Abbas, Amanda Abbott, Jane Abu-Threideh, Ranjeet Ahluwalia, Ella Alexander, Muhammad Alhawagri, Johar Ali, Jason Allen, Mark Ames, Stephanie Andrews, Susanna Angell, Paul Antonacci, Lucinda Antonacci-Fulton, Bessie Antoniou, Jon Armstrong, Clint Arnett, Vanessa Atkins, Kevin Austin, Cindi Bailey, Damon Baisden, Brad Barbazuk, Myrtle Barrett, Lilla Bartko, Chris Bauer, Henry Bauer, Dana Baum, Catherine Beck, Michael C Becker, Joseph Bedell, Kirk Behymer, Sean Behymer, Edward Belter, Gary Bemis, Dan Bentley, Amy Berghoff, Kelly Bernard, Zachary Bevins, Lauren Bielicki, Thomas Biewald, Linda Blackwood, Russell Blaine, Donald Blair, Mary Blanchard, Mary Blandford, Darin Blasiar, Jennifer Bolandis, Stephen Bolla, Traci Bollinger, Jeffrey Bong, Judith Boren-Prydydasz, Sherell Bourne, Kyle Bova, Elizabeth Boyer, Kourtney Bradford, Stephanie Brennan, Michelle Broy, Delali Buatsi, Christina Budnicki, Meghan Burkett, Jennifer Burkhart, Carrie Buss, Jessica Butler, Drucilla Caldwell, Rose Caldwell, Marco Cardenas, Kelly Carpenter, Jason Carter, Tim Carter, Todd Carter, Darren Casimere, Angela Chapman, Brandi Chiapelli, Asif T. Chinwalla, Stephanie L. Chissoe, William Christy, Matthew Cissell, Brenda Clark, Mari Jo Clark, Kathleen Clarke, Sandra W. Clifton, Jim Cloud, Brian Coblitz, Molly Cofman, Megan Connell, Joshua Conyers, Lisa L. Cook, Mark Cook, Matthew Cooper, Veronica Coppedge, Matthew Cordes, Holland Cordum, Marc Cotton, Laura Courtney, William Courtney, Krista Creason, JyeMon Crockett, Kevin Crouse, Taquillia Crum, Michael Dante, Ruth Davenport, Michelle David, Sharon Davidson, Teresa Davidson, Shanoa Davis, Andrew Delehaunty, Kim D. Delehaunty, Sandy Dempsey, Anu Desai, Jasna Despot, Monica Dickes, Kelly Dickinson, Nicole Dietrich, George Dignan, Richard Dixon, Amy Doebber, Nicholas Doerr, Mark Donoho, Margaret Dotson, Jennifer Doucette, Kristy Drone, Feiyu Du, Hui Du, Zijin Du, Chad Dubbelde, Grant Duckels, Sean Eddy, Scott Edinger, Jennifer Edwards, Tonya Ehlmann, James Eldred, Amy Elkin, Glendoria Elliott, Efrem Exum, Amanda Falk, Kimberly Farrow, Anthony Favello, Jacquelyn Fedele, Ginger Fewell, David Ficenec, Tanya Fiedler, Lisa Flagg, Alison Fleming, Nat Florence, Jason Fries, William Fronick, Johanna Fryman, Dan Fuhrmann, Lucinda A. Fulton, Robert S. Fulton, Diane Gaige, Tony Gaige, Joseph Garrett, Stacie Gattung, Cynthia Geisel, Steve Geisel, Alicia Gibson, Edward Gibson, Candi Giddings, Barbara Gillam, Yekaterina Gincherman, Warren R. Gish, Evening Glaser, Danielle Glossip, Jennifer Godfrey, Deepa Goela, Norma Goins, Judith Gotway, Ernest Goyea-Gbadebo, Laura Granderson, Tina Graves, Serena Gregory, Satbir Grewal, Justin Griffin, Heather Grover, Gary Gualberto, Christopher Gund, William Haakenson, Krista Haglund, Priscilla Hale, Shane Hale, Terri Hall, Zeyad Hamdan, Chalet Hannah, Richard Harkins, Gwen Harmon, Mark Harper, Anthony Harris, Michelle Harrison, Rob Hart, Kevin Haub, James Hawkins, Clay Hawryszko, Chuck Heidbrink, Kandis Hendrix, John Henkhaus, Karensa Henley, Carleena Henry, Nathaniel Hershberger, Joshua Heyen, Matthew Hickenbotham, Patrick Hill, Travis Hillen, LaDeana W. Hillier, Kurt Hinds, Jennifer Hodges, Erik Hoefgen, Leonard Holbrook, Holly Hollingsworth, Paul Holloway, Michael Holman, Andrea Holmes, Melisa Hotic, Shunfang Hou, Sean Houshmandi, Cristi Howell, Denise Hoyt, Carla Hubbard, Latonya Isaiah, Amber Isak, Ann Jacobs, Sara Jaeger, Cami Jeliti, Emily Jentes, Arthur Johnson, Douglas L. Johnson, Brenda Jones, Kimberly Jones, Rodney Jones, Corinne Joshu, Kelie Kang, Paula Kassos, Kimberly Keen, Jennifer Kellen, Sara Kennedy, Norma Keppler, Melissa Ketterman, Kyung Kim, Susan Kitchell, Darla Klebe, Bill Klinke, John Kloss, Laurie Knight, Michael Koch, Jeremy Kock, Sara Kohlberg, Ian Korf, Davorka Kovcic, Jeffry Kraemer, Jason B. Kramer, Pawel Krasucki, Piotr Krasucki, Rebecca Krauss, Colin Kremitzki, Scott Kruchowski, Tamara Kucaba, Michelle Lacy, Thomas Lakanen, Elizabeth Lamar, Kelly Lane, Yvonne Langston, John P. Latreille, Daniel Layman, Thomas Le, Thuy-Tien Pham Le, Tri-Tin Le, John J Ledwith, Nahmjee Lee, Lynn Lehnert, Sarah Lennox, Shawn Leonard, Kimberly Lesley, Leana Levin, Andrew Levy, Shannon Lewis, Lili Li, Todd Littlejohn, Nichole Long, Paul Lowery, Sandra Luxen, Terrie Lynch, Jason Maas, Jill MacDonald, Len Maggi, Maggie Maher, Pamela Marchetto, Elaine R. Mardis, Christopher Markovic, Catherine Marquis-Homeyer, Marco A. Marra, Gabor Marth, John C. Martin, Joseph Martin, Scott Martinka, Rachel Maupin, Kristi Maxeiner, Ryan McAdow, Maria Mcarther, Cynthia McCabe, Quentin McCray, Bradley McDill, Ken McDonald, Ramonna McDonald, Treasa McDonald, Dana McDonough, Rebecca McGrane, Shirley McKinney, Michael McLellan, Rebecca McMahon, John D. McPherson, Yvonne McQuerrey, Kelly Mead, Brian Meininger, Brian Merry, Rick Meyer, Chandra Meyers, Kevin Miller, Nancy Miller, Walt Miller, Tracy L. Miner, Brian Minges, Patrick J. Minx, Sheela Mishra, Deborah Moeller, Lisa Mohd Nor, Kenneth Moire, Bradley Moore, Todd Moore, Richard Morales, Nancy Mudd, Garrett Mullen, Molly Mullen, Elizabeth Mulvaney, Jennifer Murray, Matthew Myers, Amy Nash, William Nash, Joanne Nelson, Christine Nguyen, Nham Nhan, Candace Nicol, Laura Niemann, Laurie Nothaker, Tonia Nwagbo, Ben Oberkfell, Darren O'Brien, David O'Brien, Temitope Odunfa-Jones, Maja Kisic Okuka, Michael O'Malley, Suzanne Owens, Philip Ozersky, Sarah Page, Dimitrios Panussis, Kimberley Pape, Christina Parker, Adele Pauley, Edward Paulson, Julie Peak, Charlene Pearman, Dale Peluso, Kymberlie H. Pepin, Denise Peterson, Janine Pettiford, Brent Pfeiffer, Amy Phillips, Guy Pierce, Carol Pikula, Amy Podhrasky, Craig Pohl, Tracy Ponce, Sarah Puro, Christi Ralph, Jennifer Randall, James Randolph, Jerry Reed, Amy Reily, David Reiniesch, Linda Reitz, John Reskusich, Carrie Rhine, Lorrie Rice, Mark Richards, Jamie Richey, Joanne Rieff, Julie Riley, Ellen Ritchey, Judy Robertson, Kerry Robinson, Susan Rock, Tracy Rohlfing, Christine Rose, Ellen Ryan, Jennifer Ryan, Joseph Ryan, Sarah Ryno, Laura Sammons, Brent Sandberg, Thomas Sandbothe, Nathan Sander, Lisa Sapetti, Samuel Sasso, Mark Schaller, Carrie Schaus, Debra Scheer, Paul Scheet, Emilie Scherger, Luke Schneider, Brian Schultz, Kelsi Scott, Sacha Scott, Doug Scronce, Ryan Seim, Mandeep Sekhon, Shawn Shafer, Neha Shah, Sharhonda Shahid, Karina Shapiro, Proteon Shelby, Kimberly Shih, Michael Slaughter, Joanne Small, Aimee Smith, Angela Smith, Elyse Smith, Jana Smith, Nikki Smith, Reene Smith, Beth Smoker, Jacquelyn Snider, Lisa Spalding, John Spieth, Paula Steele, Laurita Stellyes, Nathan Stitziel, Tamberlyn Stoneking, Cynthia Strong, Joe Strong, Catrina Strowmatt, Eric Stuebe, Jessica Stumpf, Regina Suk, Hui Sun, Carrie Sutterer, Gary Swift, Sameer Talcherkar, Patra Thipkhosithkun, Johannah Thompson, Aye Mon Tin-Wollam, Chad Tomlinson, Mark Tonn, Lee Trani, Evanne Trevaskis, Susan Tucci, Bradley Twyman, Karen Underwood, Melanie Ureta, Phillip Valencia, Andrew Van Brunt, Christa Veath, Joelle Veizer, Caryn Wagner-McPherson, Jason Waligorski, Christopher Walker, Rebecca Walker, Timothy Wall, John Wallis, Pamela Wamsley, Robert H. Waterston, Phenicia Wedgeworth, Andrew Weihe, Michael C. Wendl, Nancy Wheeler, Shirley White, Nichole Whitworth, Donald Williams, Amy Williamson, Richard K. Wilson, Kellie Winchester, Mark Winkelmann, Jeffrey Woessner, Patricia Wohldmann, Jacob Wolff, Cliff Wollam, Kimberly Woods, J. Patrick Woolley, Ronald Worthington, Xiaoyun Wu, Kristine Wylie, Todd Wylie, Mark Yandell, Shiaw-Pyng Yang, Raymond Yeh, Martin Yoakum, Senait Zerazion, Xiao Zheng, Hui Jun Zhu and Michael Zidanic

US DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA: Anne Abrajano, Andrea Aerts, Dana Alcivare, Michael Altherr, Gina Amico-Keller, Janice Andora, C.H. Andredesz, Tim Andriese, Tim Andriese, Lennie Arcaina, Teresita Arcaina, Ruby Archuleta, Andre Arellano, Nancy Armstrong, Linda Ashworth, Christina Attix, Anita Avery, Aaron Avila, Julie Avila, Hummy Badri, Michele Bakis, Joe Balch, Michael Banda, Keith Beall, Don Beaton, Don Beaton, John Bercovitz, Ann Bergmann, Tony Beugelsdijk, Tory Bobo, John Boehm, Marnel Bondoc, M.P. Bonner, Eric Bowen, Wade Brannon, Elbert Branscomb, Amy Brower, Nancy Brown, Rita Brown, David Bruce, Robert Bruce, Eric Brunkhorst, Jennifer Bryant, Judy Buckingham, Karolyn Burkhart-Schultz, B. Bursell, Mira Bussod, Connie Campbell, Evelyn Campbell, Mary Campbell, Chenier Caoile, Heliodoro Cardenas, Mario Cepeda, Patrick Chain, Sandra Chaparro, Leslie Chasteen, Xian Chen, Jan-Fang Cheng, S.G. Chin , Corey Chinn, Mari Christensen, Alex Chung, Robert Cifelli, Lynne Clark, Jackie Cofield, Judith Cohn, Rick Colayco, Alex Copeland, Rebecca Cordray, Earl Cornell, Lisa Corsetti, Terrence Critchlow, Paul Critz, Linda Danganan, Willow Dean, Larry Deaven, Kerry Deere, Paramvir Dehal, Zuoming Deng, John Chris Detter, Sara Detter, J.M. Dias, Victoria Dias, Mark Dickson, Richard DiGennaro, Karen Dilts, M. Dimitrijevic-Bussod, Kami Dixon, Long Do, Norman Doggett, Suzanne Duarte, Christopher Elkin, Anne Marie Erler, Joe Fawcett, James Fey, Marie Fink, Kathe Fischer, Laurice Fischer, J. Patrick Fitch, Dave Flowers, Peg Folta, Dea Fotopulos, Matt Fourcade, Ken Frankel, Marvin Frazier, Jane Fridlyand, Stuart Gammon, Anca Georgescu, Amy Geotina, Isaias Gil, Tijana Glavina, Kristen Golvineaux, Sheryl Goodman, Lynn Goodwin, Laurie Gordon, Kristine Gould, Bruce Gray, Lance Green, Jeff Griffith, Jane Grimwood, Matt Groza, Hannibal Guarin, Kate Gunning, Chi Ha, Catherine Halsey, Sha Hammond, Cliff Han, Trevor Hawkins, Nina Henderson, Wendell Hom, Roya Hosseini, Zhenping Huang, Hillary Hughes-Hull, David Humphries, Matt Hupman, Jacqulene Hurshman, Kent Hutchings, Doug Hyatt, Joe Jaklevic, Karren Jamaca, Teresa Janecki, Jamie Jett, Phil Jewett, Lingxia Jiang, Jian Jin, Myma Jones, Eugine Jung, Kristen Kadner, Hitesh Kapur, Lisa Kegg, SuSu Khine, Joomyeoung Kim, Heather Kimball-Rojeski, William Kimmerly, Cynthia Ko, Art Kobayashi, William Kolbe, Kristina Kommander, Marie Krawczyk, Brent Kronmiller, V. Anne Krysiak, Carol Kuhn, Jane Lamerdin, Jane Lamerdin, Miriam Land, Frank Larimer, Frank Larimer, Bernadette Lato, Joon Ho Lee, Michael Lee, Karl Lehmann, Tina Leyba, Kenneth Lindo, Karla Lindquist, Albert Linkowski, Kathy Litton, S.Y. Liu, Crystal Llewellyn-Silva, Rebecca Lobb, Jessica Logan, John Longmire, Jose Luis Lopez, Yunian Lou, Stephen Lowry, X. Lu , Susan Lucas, Migdad Machrus, Madison Macht, Ramki Madabhushi, Ryan Mahnke, Mary Maltbie, Marissa Mariano, Lisa Marie Marieiro, Christopher Martin, Joel Martin, Michele Martinez, Paula McCready, Phil McGurn, Kim McMurry, Catherine Medina, Kristen Meier, Linda Meincke, Jon Menke, Julianne Meyne, Trini Miguel, Christie Miller, Tammy Milligan, Sheri Miner, Virginia Montgomery, Daniel Moy, Mark Mundt, Chris Munk, Richard Mural, Rick Myers, Rick Myers, Mohandas Narla, David Nelson, Jennifer Neunkirch, April Newman, Hoa Nguyen, Lisa Nguyen, Quan Nguyen, Matt Nolan, Pier Oddone, Jason Olivas, Anne Olsen, David Ow, Morey Parang, Beverly Parson-Quintana, Bipin Patel, Shripa Patel, Yi Peng, Ze Peng, Karl Petermann, Bill Petitt, Joyce Pfeiffer, Hoan Phan, Sam Pitluck, Lee Pittson, Ingrid Plajzer-Frick, Martin Pollard, Patricia Poundstone, Eunice Prakash, Paul Predki, Jennifer Primus, Lyle Probst, Emily Prusso, Glenda Quan, Lucia Ramirez, Michele Ramirez, David Randolph, Irmengaard Rapier, Warren Regala, Charles Reiter, X. Ren , Paul Richardson, Darrell Ricke, Donna Robinson, Juan Rodriguez, George Sakaldasis, Christina Sanders, Richard Sarmiento, Elizabeth Saunders, Denise Schmoyer, Jeremy Schmutz, Damian Scott, Duncan Scott, Manesh Shah, Jin Shang, Maria Shin, Jeff Shreve, Julie Simoni, John Sims, Linda Sindelar, Evan Skowronski, Tom Slezak, Joel Smith, Jay Snoddy, Gregory Stanley, Stephanie Stilwagen, Lisa Stubbs, Janet Stultz, Sandhya Subramanian, Rob Sutherland, Kristina Tacey, Tracy Takenaka, Tootie Tatum, Astrid Terry, Judy Tesmer, James Thiel, Paulette Thomas, Linda Thompson, Sue Thompson, Wendy Thompson, Grace Tong, David Torney, Mary Tran, Margie Trankiem, Stephan Trong, Ming Yu Tsai, Heidi Turner, James Turner, Jeanne Turturice, Edward Uberbacher, Chun Un, Quyen Ung, Ryan Van Luchene, Michele Vargas, Steffan Vartanian, N.P. Velasco, Olivia Velasquez, Carolyn Vertuca, V.S. Viswanathan, Jeanette Wagner, Mark Wagner, Wei Wan, Mei Wang, Edward Wehri, Richard Weidenbach, Sarah Wenning, Sara Wentz, Catherine White, Jennifer White, Scott White, Al Williams, David Wilson, Brenda Winleblech-Kelly, J.R. Wollard, Lawreen Woo , John Woolley, Tracy Wright, Melissa Wycoff-Montegro, Joan Yang, Mimi Yeh, Charles Yu, Brian Yumae and D.W. Zimmerman

Baylor College of Medicine Human Genome Sequencing Center, Department of Molecular and Human Genetics, One Baylor Plaza, Houston, TX 77030, USA: Charles Adams, Shauna M. Addison, Babajide Adio-Oduola, Andy Arenson, Michael Bailey, Tarsha Banks, Joseph Barbaria, Jessica Benton, Amy Bishop, George Blair, Kerstin Blankenburg, Benedict Bodota, David Bonnin, John B. Bouck, Sean L. Bowie, Magaly Brieva, Anissa Brooks, Eric Brown, M. Jennifer Brown, Nathaniel P. Bryant, Christian J. Buhay, Cristina Bunac, Carrie Burkett, Kelvin L. Burrell, Javonnie Burrows, Nick Cole Byrd, Christina C. Carlock, Tamika F. Carron, Kelvin Carter, Michael Carter, Joe Chacko, Dean Chavez, Guan Chen, Jennifer Chen, Rui Chen, Imrana Chowdhry, Constantine Christopoulos, Chelette D. Cleveland, Caroline Cox, Marcus D. Coyle, Amanda Crawford, Julia Dalton, Stephanie R. Dathorne, Mary Louise Davila, Clay Davis, Latarsha Davy-Carroll, Kimberly R. Delaney, Oliver Delgado, Denise DeShazo, Yan Ding, Hyen H. Dinh, Nau Domah-Rashid, Eliana Dones, Karen J. Douthwaite, Heather Draper, Shannon Dugan-Rocha, James Durbin, Christopher D. Earnhardt, Carlana C. Edwards, Christian Elhaj, Michael Escotto, Thomas Falls, Christina Fernandez, Nicole Flagg, Priscilla Foster, J. Patrick Frantz, Abdul Gabisi, Rajpaul Ganesh, Dawn K. Garcia, Toni T. Garner, Natalie N. Garza, Richard A. Gibbs, Rachel Gill, Barb Gillam, James H. Gorrell, Lora Leigh Gorrell, Whitney Guevarra, Preethi Gunaratne, Glenn F. Haller, Vincent A. Hanak, Barbara Harris, Charles D. Harris, Keith Harris, Maxine Hart, Paul Havlak, Alicia Hawes, Jometra Hawkins, Judith Hernandez, Anne Hodgson, Marilyn Hogues, Christie Holloway, Farah J. Homsi, Hailey P. Hosak, Jaime Hosak, Sharon Hou, Stephanie R. Howards, James Huber, Jennifer Hume, LaRonda R. Jackson, Brandi Jacobson, Yu Jia, Rudy Johnson, Angela Jolivet, Matthew Jolk, Margaret H. Jones, Sary Joudah, Steve Kaminsky, Steven Kaminsky, Elinor K. Karlsson, James P. Kelly, Susan Kelly, LaQuisha King, Natasha Kondejewski, Yanfei Kong, Jallah M. Korvah, Christie Kovar, Jasmina Kratovic, Raju S. Kucherlapati, Natosha Landry, Mannie Laredo, Belita Leal, Katrina Lee, Kwok Lee, Lakeshia C. Lewis, Lora Lewis, Jane Li, Zhangwan Li, Olivier Lichtarge, Charlene A. Lieu, Jing Liu, Wen Liu, La Quinta Logan, Orlando Logan, Hermela Loulseged, Ryan J. Lozado, Jing Lu, Alice Lucier, Raymond Lucier, Ruth Ann Luna, Renita Madu, Ryan Martin, Ashley D. Martindale, Carlos Martinez, Elizabeth Massey, Samantha Mawhiney, Michael McLeod, Gangwu Mei, Michael L. Metzker, George Miner, Teresa Mitchell, Shirley A. Mize, Wei Mo, Khatera Mohabba, Kate T. Montgomery2, Margaret B. Morgan, Sidney Morris, Margaret M. Moser, Donna M. Muzny, Sally Nash, Susan L. Naylor1, Dearl Neal, Angela Nelson, David L. Nelson, Anh Tuyet Nguyen, Natalie Nguyen, Robert Nguyen, Suzanne Nguyen, Elizabeth Nickerson, Angela Njoku, Stanley Nwokenkwo, Maryanne Oguh, Geoffrey O. Okwuonu, Gayatri Oswal, Rodolfo J. Oviedo, Araceli C. Pace, Bridgette Parrish, Seth Paxton, Brett A. Payton, Lesette M. Perez, Adams Pickens, Eltrick L. Primus, Ling Ling Pu, Miyo Quiles, Danell Reiter, Catherine M. Rives, Alberto Rojas, Ricky Rojas, Ibukunola Rojubokan, Matthew Rolfe, Arletha Russaw, Sogi Samuel, Hugo Saravia, Glenford G. Savery, Joseph Say, Steven E. Scherer, Esha Shah, Hongmei Shao, Helen Shen, Nasrin Latif Shooshtari, Margarita Simon, Ida Sisson, Erica J. Sodergren, Anastasia Sparks, Autumn Stamps, Haley T. Stanley, Aimee D. Stevenson, Carolyn Stimpson, Heather S. Stone, Amanda F. Svatek, Paul Tabor, Kavitha S. Tamerisa, Hangli Tang, Jennifer Tansey, Tineace Taylor, Janice M. Thomas, Nicole Thomas, Shereen Thomas, Kamran Usmani, Lydia Vasquez, Kristina Vassallo, Virginia Vera, Deborah K. Villalon, Donna Villasana, Rhonda Vinson, Quyen Vo, Michael Wahbah, Randy E. Wall, Suzhen Wang, Stephanie Ward-Moore, Ramiah Warren, Chloe N. Washington, Surah Watlington, Mary M Watrous, George M. Weinstock, Gabrielle A. Williams, Renada S. Williams, Angela Williamson, Steven H. Wooden, Kimberely C. Worley, Julia Wren, Glenda Wrensford, Cai Wu, Philip L. Wu, Lu Xiuhua, Zhenwu Yang, Heather R. Ybarra, Wei Yu, Jian Ling Zhou, Xiaojun Zhou and Sara E. Zorrilla

RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku Yokohama-city, Kanagawa 230-0045, Japan: Tomoyuki Aizu, Rie Arai, Yui Asahi, Fumiwo Ejima, Mitsuru Fujioka, Asao Fujiyama, Kyoko Fukano, Rintaro Fukawa, Qun Gu, Masahira Hattori, Matsumi Hirose, Minami Horishima, Kazuo Ishii, Hinako Ishizaki, Emi Isozaki, Noriko Ito, Takehiko Itoh, Chiharu Kawagoe, Kayo Kobayashi, Yoshikazu Kobayashi, Noriko Kodaka, Mai Kondo, Yuka Matsumura, Yuko Mitani, Hiroko Morita, Ayuko Motoyama, Shunsuke Nagao, Saori Nakagawa, Konomi Nakamura, Chikako Nakano, Aki Nishida, Yuko Odama, Nobuhiro Omori, Yoko Ono, Kenshiro Oshima, Yumie Oyama, Ritsuko Ozawa, Hong-seog Park, Ryoko Sakai, Yoshiyuki Sakaki, Hiroko Seki, Hidetsugu Shimizu, Jiuqin Sun, Takashi Tahara, Toshihisa Takagi, Sumiyo Takiguchi, Maho Tanaka, Ryoko Tanaka, Todd Taylor, Yoriko Terada, Miwako Tochigi, Naoko Tomioka, Yasushi Totoki, Atsushi Toyoda, Yumi Tsukamoto, Shiho Tsukuni, Rina Tsuzuki, Nozomi Uyama, Hiromi Wada, Hidemi Watanabe, Tetsushi Yada, Kaoru Yakushiji, Noriko Yamamoto, Yasue Yamashita, Shuji Yokoyama, Miho Yonezawa and Satoru Yoshida

Genoscope and CNRS UMR-8030, 2 Rue Gaston Cremieux, CP 5706, 91057 Evry Cedex, France: Francois Artiguenave, Nathalie Barbe, Marielle Besnard, Didier Boscus, Stephanie Briez, Philippe Brottier, Thomas Bruls, Laurence Cattolico, Nathalie Cha, Corinne Da Silva, Ivan Dubois, Michel Gouyvenoux, Gabor Gyapay, Roland Heilig, Stephanie Leclerc, Michael Levy, Ghislaine Magdelenat, Eric Pelletier, Jean-Louis Petit, Catherine Robert, William Saurin, Benoit Vacherie, Virginie Vico, Jean Weissenbach and Patrick Wincker

GTC Sequencing Center, Genome Therapeutics Corporation, 100 Beaver Street, Waltham, MA 02453-8443, USA: Michele Bakis, Romina Bashirzadeh, John Battles, Michael Bodnaruk, Gary Breton, Jim Brown, Carole Butler, Patrick Cahill, Anne Caron, Patricia Daggett, Thomas Dorman, Lynn Doucette-Stamm, JoAnn Dubois, Natasha Edwards, Johnny Ezedi, Shaun Flynn, Laura Freeman, Rene Gibson, David Gleeson, Gary Gryan, Becky Herman, Joseph Hitti, Tay Ho, Keri Holtham, Khanh Huynh, Christopher Hynds, Michael Johnson, Paul Joseph, Rachel Kadel-Garcia, Veena Kamath, Arnold Kana, Kristian Keane, Katrina Kopcewiez, Andrew Lach, Anna Lee, Hong Mei Lee, Randy Little, Wendy Lumm, Deepika Madan, Rodolfo Magararu, Jen-I Mao, Luba Mitnik-Gankin, Maribel Munoz, Minh Nguyen, William Nielson, Shashi Prabhakar, Jonathan Prescott-Roy, Dayong Qiu, Bruce Reinemann, Sean Robinson, Mike Roche, Dawn Rossetti, Marc Rubenfield, Olga Russakovskaya, Johnathan Segal, Douglas R. Smith, Phillip Snell, Mathew Stroika, G. Andre Turenne, Jennifer Walsh, Ying Wang, Keith Weinstock, Gerald Wheaton, Michael Wierbonies, Laipeng Wong, Qinxue Xu, Huiren Yang, Effie Zafiropoulos and Eileen Zhang

Department of Genome Analysis, Institute of Molecular Biotechnology, Beutenbergstrasse 11, D-07745 Jena, Germany: Cornelia Baumgart, Ines Baumgart, Karin Blechschmidt, Elisabeth Boehm, Christin Brunnckow, Nicole Creutzburg, Monika Dette, Bernd Drescher, Petra Eißmann, Susanne Fabisch, Beate Fischer, Silke Foerste, Petra Galgoczy, Sabine Gallert, Gernot Glöckner, Yvonne Görlich, Claudia Grosser, Jana Hamann, Ivonne Heintze, Niels Jahn, Erika Kantowski, Heike Klabunde, Sindy Kluge, Dorothee Lagemann, Sabine Landmann, Rüdiger Lehmann, Denise Lenk, Hella Ludewig, Elke Meier, Uwe Menzel, Evelyn Michaelis, Kati Möckel, Katja Mortag, Oliver Müller, Gabriele Nordsiek, Gerald Nyakatura, Birgit Pawelka, Uta Petz, Uwe Pick, Matthias Platzer, Carola Pohlmann, Andreas Polley, Bettina Raguschke, Norman Rahnis, Kathrin Reichwald, André Rosenthal, Silke Rosenthal, Sandra Rothe, Andreas Rump, Ruben Schattevoy, Annika Schauer, Markus Schilhabel, Mike Schilling, Liane Schlenkert, Marie-Luise Schmid, Jana Schoemburg, Andreas Schudy, Regina Schulz, Stefan Taudien, Bärbel Tautkus, Margit Teuchtler, Beate Voigt, Jacqueline Weber, Gaiping Wen , Claudia Wenderoth, Daniela Werler, Thomas Wiehe , Nadine Zeise, Renate Zenker and Wolfgang D. Zimmermann

Beijing Genomics Institute/Human Genome Center, Institute of Genetics, Chinese Academy of Sciences, Beijing 100101, China: Jingyue Bao, Qiyu Bao, Weidong Bao, Shihua Bi, Xuemen Bian, Lars Bolund,, Tianjing Cai, Ting Cao, Yuzhu Cao, Baoxian Chen, Chong Chen, Jianlong Chen, Jie Chen, Junbao Chen, Tong Chen4,, Yiyu Chen, Zhu Chen7, Zhihua Cheng7, Hongjuan Cui, Jinhui Cui, Peng Cui, Li Dai, Hao Ding, Hui Dong7, Wei Dong, Xiaojia Dong6, Yutao Du, Hongyuan Fan, Jianqiu Fang, Haiyan Feng, Jie Feng, Xiaoli Feng, Gang Fu7, Jimei Gao, Quan Gao6, Yang Gao, Jianing Geng, Guanghui Gong, Jinying Gong6, Jun Gu, Wenyi Gu7, Xiaocheng Gu, Qiaoning Guan, Qi Gui, Daorong Guo, Fengying He6, Jiaying He, Lin He, Jie Hu, Songnian Hu, Fang Huang, Guyang Huang4,7, Jia Jia7, Nan Jia, Lu Jiang, Yetao Jin, Yongsan Jin, Ning Kang, Ning Kang6, Mary-Clare King4,, Yi Kong, Meng Lei, Changfeng Li, Chenji Li, Eryao Li, Gang Li, Jiayang Li, Jihong Li, Jingxiang Li, Li Li, Lili Li, Ming Li7, Nan Li, Ran Li, Shengbin Li, Shuangding Li, Shuangli Li, Songgang Li, Tao Li, Wei Li, Wenjie Li, Yan Li, Yanni Li, Zhijie Li, Jinsong Liao, Wei Lin, Wei Ling7, Boyong Liu, Haili Liu, Kai Liu, Ning Liu4,8, Siqi Liu, Wei Liu, Xinshe Liu, Yanhua Liu, Ying Liu, Yu Liu, Zhanwei Liu, Tao Lu6, Yongxiang Lu, Gang Lv9, Cheng Ma, Jiao Ma, Qingmei Ma6, Shanshan Meng, Feng Mu, Yuxin Niu, Jiaofeng Pan, Qiuhui Qi, Xiaohua Qi, Xufang Qian7, Zengmin Qian, Boqin Qiang6, Zhenyong Qiao7, Shuangxi Ren7, Li Rong6, Yufen Shao, Fengye Shen7, Yan Shen6, Hongfang Shi, Michael Smith4,, Liping Song, Shuping Song, Jiajia Sun, Min Sun, Tao Sun, Yongqiao Sun, Yu Sun, Yue Sun, Wei Tan, Xinyu Tan6, Xiangjun Tang, Ran Tao, Yan Tian, Yuqing Tian5, Jingli Tong, Yuefeng Tu7, Ma Wan7, Dong Wang, Feng Wang, Guangxin Wang, Guihai Wang, Hongjuan Wang7, Hongwei Wang6, Huifeng Wang, Jian Wang, Juan Wang, Jun Wang4,9, Li Wang, Lijie Wang, Lijuan Wang, Liqun Wang7, Wenjun Wang, Xiaolei Wang, Xiaoning Wang, Xuegang Wang, Yan Wang, Ying Wang, Yuanyuan Wang, Chungen Wu7, Dongying Wu, Qingfa Wu, Xiaojing Wu, Yingying Xi6, Fei Xie, Ruqin Xu, Shuhua Xu7, Wei Xu, Yuning Xu6, Zhenyu Xuan12, Rui Xue, Yali Xue, Chunxia Yan, Fei Yan8, Guangmei Yan4,, Huanming Yang4,8, Shudong Yang, Xiaonan Yang, Zhijian Yao6, Haifeng Yin7, Bing Yu, Jun Yu, Kaiwen Yuan, Yixin Zeng, Dong Zhai, Bo Zhang, Fengmei Zhang, Guangyu Zhang, Guohua Zhang, Haiqing Zhang, Hongbo Zhang, Lanzhi Zhang, Li Zhang, Meihua Zhang, Meng Zhang, Ming Zhang7, Ruhua Zhang, Wei Zhang7, Xianglin Zhang7, Xiaoliang Zhang, Xiuqing Zhang, Yan Zhang5, Yilin Zhang, Ying Zhang, Yuansen Zhang, Yuzhi Zhang, Hongmei Zhao, Lijian Zhao, Zhijing Zhao, Zhicheng Zhen6, Ming Zhong7, Haixia Zhou, Nannan Zhou, Xinfeng Zhou6, Yan Zhou7, Yi Zhou6, Bingying Zhu7, Bofeng Zhu, Genfeng Zhu7, Ning Zhu6, Yongge Zhu and Zhen Zhu

Multimegabase Sequencing Center; The Institute for Systems Biology, 4225 Roosevelt Way, NE Suite 200, Seattle, WA 98105, USA: Nissa Abbasi, Mary Ellen Ahearn, Lida Baradarani, Dale Baskin, Brian Birditt, Scott Bloom, Cecilie Boysen, Roger Bumgarner, Rachel Dickhoff, Monica Dors, Peter Fleetwood, Cynthia Friedman, Grace Harrison, Leroy Hood, Rose James, Amardeep Kaur, Stephen Lasky, Inyoul Lee, Carol Loretz, Anup Madan, Anuradha Madan, Gregory G. Mahairas, Ryan Nesbitt, Shizhen Qin, Amber Ratcliffe, Lee Rowen, Jason Seto, Tristan Shaffer, Arian Smit, Todd Smith, Steven Swartzell, Barbara J. Trask and Kai Wang

Stanford Genome Technology Center, 855 California Avenue, Stanford, CA 94304, USA: Pia Abola, Scott Argus, V. Babb, Dan Bruno, E. Chung, Lane Conn, Martin Costa, Ronald W. Davis, Joel Elledge, J. Fan, David Faulkner, Nancy A. Federspiel, Pam Foreman, Slava Glukhov, Nancy Hansen, Zelig Herman, Richard Hyman, Sue Kalman, Omar Kurdi, Jennifer Mao, Rekha Marathe, Michael J. Proctor, Amanda Morehouse, Peter Oefner, Curtis Palm, David Ramirez, M. Rexan, Mitche Dela Rosa, Mary Smith, D. Vollrath, Julie Wilhelmy, Thomas Willis and Susan Yu

Stanford Human Genome Center and Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305-5120, USA: Eva Bajorek, Chenier Caoile, Jason Carriere, David R. Cox, Mark Dickson, Kami Dixon, Laurice Fischer, David Flowers, Dea Fotopulos, Carmen Garcia, Darren Gold, Jane Grimwood, Lauren Haydu, Caleb Holtzer, Ingrid Keseler, Kathy Litton, Jessica Logan, Jose Lopez, Cathy Medina, Richard M. Myers, Loan Nguyen, Lucia Ramirez, Alex Rodriquez, Stephanie Rogers, Angelica Salazar, Jeremy Schmutz, Jin Shang, Nancy Stone, Ming Tsai, Olivia Valesquez, Steffan Vartanian, Deborah Vitale, Jeremy Wheeler and Joan Yang

University Washington Genome Center, 225 Fluke Hall on Mason Road, Seattle, WA 98195, USA: Kerry Bubb, Riza Daza, Cindy Desmarais, Sven Duenwald, Kim Erickson, Thomas Gilbert, Michael Hite, Robert Hubley, Will Huges, Shawn Iodanoto, Don Jewett, Chris Junker, Arnie Kas, Rajinder Kaul, Myphoung Le, Regina Lim, Lloyd Lytle, Charles Magness, Z. Magnesss, Mathew Maza, Erin McClelland, Maynard Olson, Doug Passey, Xuan-Quynh Pham, Karen Avery Phelps, Ruolan Qiu, Stephan Ramsey, Chris Raymond, Bethany Richards, Zohreh Sadhegi, Channakhone Saenphimmachak, Elizabeth Sims, Arian Smit, Mari Stone, Tony Thomas, Gane Ka-Shu Wang, Zaining Wu, Jun Yu and Yang Zhou

Department of Molecular Biology, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo 160-8582, Japan: Norie Aoki, Michi Asahina, Shuichi Asakawa, Kazuhiko Kawasaki, Jun Kudoh, Shinsei Minoshima, Susumu Mitsuyama, Takashi Sasaki, Kazunori Shibuya, Atsushi Shimizu, Nobuyoshi Shimizu, Ai Shintani and Yuko Yoshizaki

University of Texas Southwestern Medical Center at Dallas, 6000 Harry Hines Blvd., Dallas, TX 75235-8591, USA: Pablo Aguayo, Sharla Arenare, Drew Armstrong , Maria Athanasiou, Mujeeb Basit, Daina Black, Jessica Brandon, Jill Buettner, Corey Butler, Corey Butler , Paul Card, Sharmaine Chamblis, Chris Davis, Joel Dunn, Cynthia English, Shannon Ethridge, Glen A. Evans, Nina Federova, Amber Fribish, Harold Garner, Monica Garza, Margaret Gordon, Connie Gorman, O�Dell Grant, Lisa Hahner, Susie Hayes, John Joslin, Steven Lam, Thuan Le, Todd Lester, Ed Lewis, Kok Ngai Loo, Meiyu Loo, Tony Major, Tony Major , James McFarland, Minh Nguyen, Sherri Osborne-Lawrence, Igor Rakoshchik, Jeff Schageman , Roger Schultz, Stephen Stimson, Minh Tran , Flora Varghese, Nikki Wagner, Kendra Waller, Travis Ward, John Wharton, John Whitaker, Jacquelyn Newton Willcot and John Zanoni

University of Oklahoma�s Advanced Center for Genome Technology, Dept. of Chemistry and Biochemistry, University of Oklahoma, 620 Parrington Oval, Rm 311, Norman, Oklahoma 73019, USA: Mueed Ahmad, Angelica Bodenteich, Feng Chen, Lingzhi Chu, Judy Crabtree, Stephane Deschamps, Anh Do, Trang Do, Joan Dolance, Angela Dorman, Clarence Ducummon, Andrew Duty, Mounir Elharam, Whitney Elkins, Fang Fang, Ying Fu, Glenda Hall, Karen Hartman, Kevin Hill, Ping Hu, Xiaohong Hu, Axin Hua, Emily Huang, Honggui Jia, Xiuhong Jiang, Steve Kenton, Akbar Khan, Doris Kupfer, Hongshing Lai, Lisa Lane, Hio Ieong Lao, Christopher Lau, Jennifer Lewis, Sharon Lewis, Hang Li, Shaoping Lin, Phoebe Loh, Eda Malaj, Jami Milam, Rose Morales-Diaz, Fares Najar, Thuan Nguyen, Ying Ni, Shelly Oommen, Huaqin Pan, Beth Perry, Stacey Phan, Sulan Qi, Yudong Qian, Linda Ray, Qun Ren, Qun Ren, Bruce A. Roe, Steve Shaull, Danica Sloan, Lin Song, Jaime Stone, Jing Tian, Runying Tian, Yonathan Tilahun, Qiaoyan Wang, Ying-Ping Wang, Zhili Wang, Doug White, Jim White, Diana Willingham, Stephen Wong, Heather Wright, Hong Min Wu, Hui Wu, Limei Yang, Ziyun Yao, Younju Yoon, Min Zhan, Guozhong Zhang, Liping Zhou and Hua Zhu

Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany: Stefanie Arndt, Alfred Beck, Katja Borzym, Donald Buczek, Jamel Chelly, Fiona Francis, Katja Heitmann, Steffen Hennig, Celine Hoff, Erich Junker, Petra Kioschis, Sven Klages, Marion Klein, Anna Kosiura, Michael Kube, Ines Langer, Hans Lehrach, Silvia Lehrack, Ines Marquard, Nathalie McDonell, Alfons Meindl, Katja Moll, Anthony Monaco, Andrea Nemeth, Annemarie Poustka, Uwe Radelof, Juliane Ramser, Richard Reinhardt, Simone Schuelzchen, Peter Seranski, Anke Starke, Christina Steffens, Ralf Sudbrak, Kieran Todd and Marie Laure Yaspo

Cold Spring Harbor Laboratory, Lita Annenberg Hazen Genome Center, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA: Melissa de la Bastide, Neilay Dedhia, Lidia Gnoj, Tina Gottesman, Susan Granat, Kristina Haberman, Aliya Hameed, Amy Hasegawa, Jane Hoffman, Emily Huang, Kendall Jenson, Arthur Johnson, Nancy Kaplan, Mohammad Lodhi, Anthony Matero, W. Richard McCombie, Andrew O'Shaughnessy, Laurence Parnell, Ray Preston, Milka Rodriguez, Kristin Schutz, Lei Hoon See, Ravi Shah, Monica Shekher, Nadim Shohdy, Lori Spiegel, I'kori Swaby, Sally Till and Danielle Vil

GBF - German Research Centre for Biotechnology, Mascheroder Weg 1, D-38124 Braunschweig, Germany: Helmut Blöcker, Petra Brandt, Ansgar Conrad, Simone Dose, Maja Grimm, Klaus Hornischer, Doris Järke, Gerhard Kauer, Tschong-Hun Löhnert, Gabriele Nordsiek, Joachim Reichelt, Maren Scharfe and Oliver Schön

Genome Analysis Group. The group consisted of the individuals listed below (in alphabetical order).

Richa Agarwala, L. Aravind16, Jeffrey A. Bailey, Alex Bateman, Serafim Batzoglou, Bruce Birren19 , Ewan Birney, Peer Bork,, John B. Bouck, Daniel G. Brown19, Christopher B. Burge, Lorenzo Cerutti20,, Hsiu-Chuan Chen16, Asif T. Chinwalla, Deanna Church16, Michele Clamp18, Francis S. Collins, Richard R. Copley22, Tobias Doerks21,22, Richard Durbin18, Sean R. Eddy, Evan E. Eichler17, William FitzHugh19, Adam Felsenfeld27, Terrence S. Furey, James Galagan19, Richard A. Gibbs23, James G.R. Gilbert18, Gustavo Glusman37, Cyrus Harmon, Yoshihide Hayashizaki, David Haussler, Henning Hermjakob20, LaDeanna Hillier26, Karsten Hokamp, Tim Hubbard18, Wonhee Jang16, L. Steven Johnson28, Thomas A. Jones28, Simon Kasif, Arek Kaspryzk20, Scot Kennedy, W. James Kent, Paul Kitts16, Eugene V. Koonin16, Ian Korf26, David Kulp30, Doron Lancet, Eric S. Lander19, Todd M. Lowe, Aoife McLysaght33, Jill Mesirov19, Tarjei Mikkelsen34, John V. Moran, Nicola Mulder20, James C. Mullikn18, Chad Nusbaum19, Victor J. Pollara19, Chris P. Ponting, Irit Rubin37, Greg Schuler16, Jörg Schultz22, Guy Slater20, Arian F.A. Smit, Elia Stupka20, John Sulston18, Joseph Szustakowki34, Danielle Thierry-Mieg16, Jean Thierry-Mieg16, Lukas Wagner16, John Wallis26, Robert Waterston26, Raymond Wheeler30, Alan Williams30 , Yuri I. Wolf16, Kenneth H. Wolfe33, Kim C. Worley23, Itai Yanai37, Shiaw-Pyng Yang26, Ru-Fang Yeh24 and Michael C. Zody19

 

Library Construction

Roswell Park Cancer Institute, Department of Cancer Genetics. Buffalo, New York 14263 USA. (Present address: BACPAC Resources, Children's Hospital Oakland Research Institute, 747 52nd Street Oakland, CA 94609Pieter de Jong, Joseph J. Catanese, Kazutoyo Osoegawa, Panayiotis A. Ioannou, Eirik Frengen, Baohui Zhao, Chenyan Wu, Chung Li Shu, Chira Chen, Barbara Swiatkiewicz, Aaron G. Mammoser, Beth A. Palka, Amy Beck, Alfred Cairo, Melanie Hierl, Michel van Geel, Norma Nowak, Jeffrey Conroy, Yu Wang

California Institute of Technology, Division of Biology. Pasadena, California 91125 USA. Hiroaki Shizuya, Sangdun Choi, and Yu-Juin Chen

 

DNA Sequence Databases

GenBank, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bldg. 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA: Richa Agarwala, L. Aravind, Hsiu-Chuan Chen, Deanna Church, Wonhee Jang, Paul Kitts, Eugene V. Koonin, Greg Schuler, Danielle Thierry-Mieg, Jean Thierry-Mieg, Lukas Wagner and Yuri I. Wolf

EMBL, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom: Ewan Birney, Nicole Reashci and Peter Sterk

DNA Data Bank of Japan, Center for Information Biology, National Institute of Genetics, 1111 Yata, Mishima-shi, Shizuoka-ken 411-8540, Japan: Kaoru Fukami-Kobayashi, Takashi Gojobori, Kazuho Ikeo, Tadashi Imanishi, Satoru Miyazaki, Ken Nishikawa, Motonori Ohta, Hideaki Sugawara and Yoshio Tateno

Scientific Management

National Human Genome Research Institute, U.S. National Institutes of Health, 31 Center Drive, Bethesda, MD 20892, USA: Francis Collins, Mark S. Guyer, Jane Peterson, Adam Felsenfeld and Kris A. Wetterstrand

Office of Science, U.S. Department of Energy, 19901 Germantown Road, Germantown, MD 20874, USA: Aristides Patrinos

The Wellcome Trust, 183 Euston Road, London, NW1 2BE, United Kingdom: Michael J. Morgan

 

3. Methods and additional notes

Section: Generating the draft genome sequence (p. 864)

Subsection: Clone selection (p. 865)

Page 866 col. 2, para.3 "Fingerprint data were reviewed �.bias against rearranged clones).

Seed clones were picked from the growing contigs as follows: We began by identifying fingerprint clone contigs that had been localized to targeted locations and that did not contain any clones that had previously been selected for sequencing. Contigs were localized using mapping data from a variety of sources that could be attached to the fingerprinted clones, including STS/hybridization data from McPherson and colleagues86, FISH data from several sources (C. McPherson et al., ref. 103), STS/PCR mapping data from several sources92,95,103, electronic PCR data (http://www.ncbi.nlm.nih.gov/STS/) matching the BAC end sequences with mapped STSs and others. Beginning with the largest available clone in a valid contig (clones >250 kb were excluded to avoid artifacts), the FPC program451 evaluated the fingerprints of all of the clones in the contig to determine largest clone for which all (but 2) of the individual bands in the restriction fragment pattern were common to or shared with (confirmed; having a band of equivalent size �3%) with bands in the patterns of flanking clones (again, ignoring >250 kb flanking clones >250 kb). (Since the restriction enzyme used to produce the clone inserts is different than the enzyme used to produce the fingerprints, two bands may arise from the insert-vector junction, which are not found in the genome or in flanking clones.) Selected clones were then checked for excessive overlap with previously selected or sequenced clones and with each other. The allowable overlap at this stage was varied to suit the demands of the project.

Clones (walking clones) extending from seed or other selected clones were selected as follows: In the early phases of the effort, clones were not necessarily correctly ordered within a fingerprint clone contig and indeed not all of the available clones had necessarily been incorporated into the contig. Starting with a previously selected (seed) clone, the FPC program compared the restriction fragment pattern of that clone with the patterns of all of the clones in the fingerprint database that overlapped with the seed clone. It then iteratively analyzed the clones identified in the first round of analysis to identify the additional clones that overlapped with those. In this way, a set of overlapping clones was identified and the clones in the set were ordered based on their overlap statistics. After ordering, all of the valid clones were identified (valid clones were defined as those with all but three of their bands confirmed by clones within 4 clones on either side). Any clone that also had outside evidence of overlap, e.g. through BAC end sequence matches or shared STS/hybridization data was selected for further evaluation. In cases with more than one clone with such outside evidence, the clone with the lowest overlap statistic (i.e., the one that was least redundant) was selected (in the case of ties, the largest clone was favored). Where there was no outside evidence, a clone was picked based on evaluation of the overlaps. The candidate clone was the first one that was found to have the minimal overlap with the seed clone (initially <20% overlap, rising to 30% in later phases of the mapping effort; the percentage overlap was estimated by dividing the sum of the sizes of the common bands by the size of the smaller of the two clones). To be picked, the clone also had to be bridged to the seed clone by a third, intermediate clone that confidently (<1e-4) overlapped both the seed clone and the candidate clone. The candidate clone was then further evaluated for fingerprint overlap with previously selected or sequenced clones.

Once clones were ordered within fingerprint clone contigs, a similar algorithm that exploited the known clone order was used to pick the walking clones. This algorithm was also adapted to pick a spanning/walking clone for complex contigs with 2 or more clones in the sequencing pipeline, using the fingerprint map as a guide.

Subsection: Sequencing (p. 867)

Page 868, left-hand column, line 20: "By examining � 500 bp."

The sizes of the gaps between adjacent initial sequence contigs in draft clones were measured using alignments of the initial sequence contigs from individual draft clones to contigs of size ≥ 40 kb from overlapping clones, usually finished clones. 10,999 gaps were examined. 1,726 gaps larger than 6,000 bp were discarded as probable artefacts due to misassemblies or incorrect alignments. The mean size of the gaps between the initial sequence contigs in draft clones was 554 bases. When the cutoff for discarding gaps was lowered to 3000 bp or raised to 12,000 bp, the mean gap size decreased to about 400 bp (estimated from 9,801 gaps) and increased to about 800 bp (estimated from 11,972 gaps) accordingly, indicating that there is still considerable uncertainty in the mean value. The 554 bp estimate for the mean gap size was used, along with the number of initial sequence contigs (Table 7) and the total number of bases in the initial sequence contigs (data not shown) to estimate the percentage of the draft clones that were covered by the initial sequence contigs. It was thus determined that, on average, about 96% of the draft clones was covered; assuming a mean gap size between 400 and 800 bp, the range in coverage is about 94-97%.

This comment also pertains to page 874, left-hand column, line 57: "Assuming that the sequence gaps � gaps within the draft sequenced clones"

Subsection: Assembly of the draft genome (p. 868)

Page 868, right-hand column, l. 47, "To eliminate such problems, sequenced clones were associated with the fingerprint clone contigs in the physical map�"

An FPC match statistic better than 1e-7 for the sequenced clone against the fpc fingerprint database was considered significant, based on empirical evidence. This match level was the weakest value used for placement when there was other confirmatory evidence to support the placement. In the absence of additional supportive data, a match score of better than 1e-9 was required for placement. In general, only the best match was used. Other confirmatory evidence included BAC end matches; the BAC end sequences were obtained from NCBI (dbGSS; http://www.ncbi.nlm.nih.gov/dbGSS/index.html). Only BAC end sequences with 15 or fewer matches to the genomic sequence were used to eliminate repetitive sequences. Additional information used to place clones included BAC paired-end sequence matches, shared STS matches, and "believed" sequence overlap relationships determined by investigators at the NCBI and at UC-Santa Cruz. In instances in which the data led to conflicting placements, the data were weighted based on estimates of reliability. In some cases, if there was conflicting placement data or only weak data for placement and, according to GigAssembler, the sequenced clone failed to overlap any clones in the assembly at their original placement positions, a placement was attempted at secondary sites suggested by the placement data.

Page 869, left-hand column, line 48 "Of these 942 contigs with sequenced clones� "

In general, merges between fingerprint clone contigs were based primarily on evaluation of the fingerprint data. Information about the STS map location of the fingerprint contigs was used to prevent spurious merges, to break spurious contigs and to suggest possible merges that had not been previously recognized. In addition, 62 contigs were merged on the basis of sequence overlap information, supported by STS map positions.

Subsection: Quality assessment (p. 871)

Sub-subsection: Alignment of the fingerprint clone contigs (p. 873)

Page 873, right-hand column, line 28: "The positions of most of the STSs� about 1.7% differed from one or more of them."

We localized the STS markers from seven different physical maps (the Genethon101 and Marshfield (http://research.marshfieldclinic.org/genetics/ ) genetic maps, the GeneMap99100, the G3 and Stanford TNG radiation hybrid maps (http://www-shgc.stanford.edu/Mapping/Marker/STSindex.html), and the Whitehead YAC and radiation hybrid map29) on the draft genome sequence using e-PCR, allowing one mismatch per primer and the default distance constraints between primers (50 bp deviation from expected size of product). Only those markers that were uniquely placed on the draft sequence were considered. There were 62,239 such markers. Of these, 1,095, or 1.7%, were mapped by ePCR to a chromosome of the draft sequence that was different from the chromosome indicated by the information from a genetic or radiation hybrid map.

Subsection: representation of random raw sequences (p. 874)

Page 875, left-hand column, line 9: "We compared the raw sequences � using the BLAST computer program."

We processed whole genome shotgun reads from four independently constructed libraries as follows. All reads with fewer than 300 bases of PHRED quality 20 or greater were removed. The remaining reads were then trimmed for vector and for quality, looking at the 5� end for the first window with at least 15 continuous non-vector bases of >PHRED20 and at the 3� end, starting from the left cutoff, for 12 contiguous non-vector bases with <PHRED20 scores. Only trimmed reads that had >95% of their trimmed bases with PHRED>20 and a length of >250 bases were kept. The reads after trimming were composed of 40% GC base pairs. Reads were masked for repeats using the RepeatMasker program (A.F.A. Smit & P. Green, http://repeatmasker.genome.washington.edu/cgi-bin/RM2_req.pl) and for low entropy data using the nseg option of BLAST (W. Gish, unpublished; http://blast.wustl.edu )Reads were retained and used only if there were at least 100 consecutive bases of PHRED quality 20 or greater and 100 consecutive unmasked bases.

Based on a test data set of random reads from finished projects, the following BLAST parameters were found to match 100% of the reads without false matches: -filter seg S=170 S2=150 W=13 gapW=4 gapS2=150 M=5 N=-11 Q=11 R=11. The set of masked trimmed reads was compared to the 7 October 7 2000 freeze of the HTGS data set, to all of Genbank and to the TSC SNP database using BLASTN 2.0MP (W. Gish, unpublished; http://blast.wustl.edu). The highest scoring match was aligned against the read using CROSSMATCH, demanding alignment of the full trimmed read at ≥97% identity for genomic sequence and with appropriate topological constraints for the SNP reads. Typically 1-2% of the matches were eliminated by this step.

 

Page 875, left-hand column, line 30: "We found that 88% of the bases of these cDNAs could be aligned ..."

We aligned the RefSeq cDNA sequences to the draft genome using the psLayout program104 and gathered statistics on the percentage of cDNA bases that aligned at various percent identity thresholds.

The distal 200 bases of each cDNA were not included in the computation of the percentage of aligning bases because alignments in these regions are less reliable. If any cDNA aligned in more than one way, each cDNA base involved in any alignment was counted only once. At a threshold of 98% identity for the alignments, we found that 87.9% of the cDNA bases aligned somewhere in the draft genome. When the threshold was increased to 99% identity, the percentage of aligning bases fell to 85.83%, and when the threshold was decreased to 97% identity, it rose to 88.5%. Further decreases in the threshold all the way down to 90% identity only increased the percentage of aligning bases one more percentage point, so the value of approximately 88% aligning bases, achieved by requiring 98% identity, represents a knee in the curve.

Section: Broad genomic landscape (p. 875)

page 876, right-hand column, line 9: "In addition, the human cytogenetic map ..."

The locations of the cytogenetically mapped clones on the draft genome sequence can be viewed at http://genome.ucsc.edu/goldenPath/mapPlots . Further information about the individual clones can be obtained at http://www.ncbi.nlm.nih.gov/genome/cyto/ and http://www.ncbi.nlm.nih.gov/genome/guide. Here, as well as on the browser at http://genome.ucsc.edu and http://www.ensembl.org/ , they can be viewed in the context of other genome annotation.

Subsection: Long-range variation in GC content (p. 876)

Page 877, left-hand column, line 30 "About three-quarters of the genome-wide variance� consistent with a homogeneous distribution"

All 3,312 windows of length 300 kb that had at least eight gap-free 20 kb subwindows and did not contain more than 50% simple repeats were extracted from the draft genome sequence. The average sample variance of the GC content of the subwindows of a window was 7.3%. The sample variance of all subwindows genome-wide (N = 36,562) was 27.4%. Hence, the variance of GC content within the 20 kb subwindows of a 300 kb window accounts for approximately one quarter of the overall variance of the GC content among all 20 kb subwindows in this sample. The average sample standard deviation of the GC content of the subwindows of a window was 2.4%.

Page 877, left-hand column, line 34: "In fact, the hypothesis � draft genome sequence."

For each of the 3,312 windows of length 300 kb, we tested the hypothesis that its 20 kb subwindows were sampled from a homogeneous GC distribution. The distribution was defined to have mean m equal to the GC-content in the combined subwindows of the 300 kb window, and the bases were taken as independent. Under this distribution, the GC-content of a 20 kb subwindow would have mean m and variance s2 = m(100-m)/20000. For m = 41%, the typical value, this gives s2 = 0.121%, which is about 0.017 times the average sample variance of 7.3%. For each window, the variance s2 and the sample variance ŝ2 were determined, along with the value c2 = (n-1) ŝ2/s2, where n is the number of subwindows of the window. Under the hypothesis of homogeneity, the statistic c2 should have an approximately chi-square distribution with n-1 degrees of freedom. However, for every one of the 3,312 windows, c2 > 31.5, which rejects the hypothesis of homogeneity with p-value >> 0.995.

Another way to test the hypothesis of homogeneity is to look in each 300 kb window for one 20 kb subwindow whose GC content differs significantly from the mean m for that window. In these tests, all 300 kb windows with less than 50% simple repeats and less than 25% gaps were tested (N = 10,596). Under the assumptions above, if X is the GC content of a subwindow, then D = (X-m)/sqrt[m(100-m)/20000] should have an approximately normal distribution. However, in all but four windows there is a subwindow with |D| > 3.0, i.e the GC content of the subwindow is more than 3.0 standard deviations from the mean of the window. The p-value for such a deviation is 0.0026. Considering that there are 15 possible subwindows, this gives an overall p-value of 0.039, i.e. the hypothesis of homogeneity is rejected with a p-value greater than 0.96.

The above analysis was repeated using 5 kb subwindows of 300 kb windows, and the hypothesis of homogeneity was rejected for all windows with p-value greater than 0.96, and with greater confidence for those windows tested with the chi-square test. Similar results were also obtained for 5 kb subwindows of 100 kb windows: all but thirteen windows were rejected with p-value greater than approximately 0.95, and all but three were rejected from those examined with the chi-square test. Since any region of 200 kb must contain one of the regions of 100 kb we tested for homogeneity, this indicates that there are few if any regions of 200 kb in the genome with homogeneous GC content.

Page 877, right-hand column, line 25: "Estimated band locations �"

Bands were assigned by a dynamic programming algorithm that attempted to maximize the number of cytogenetically mapped clones that lie within the range of possible sub-bands predicted from FISH, with special emphasis on high-resolution FISH-mapped clones provided by investigators at the National Cancer Institute103. The band positions were optimized subject to the constraint that the bands must appear in the known order along the draft genome sequence. Slight penalties for band size deviation from the standard fractional sizes were also imposed, so that in the absence of any FISH-mapped clones at all in a particular region, and given that there are no constraints from surrounding regions, the program would produce sub-bands corresponding to the standard fractional band lengths.

Section: Repeat content of the human genome (p. 879)

Subsection: Distribution of GC content (p. 884)

Concerning the subdivision of the draft genome sequence into 50 kb pieces of similar GC level. The same results will be obtained however the sequence is subdivided, as long as the fragments are around 50 kb long. Specifically, however, for the analyses shown in Figures 22 to 26, the draft genome sequence was subdivided in fragments of 40-60 kb (averaging 50 kb) overlappong by 1 kb. These fragments were created on the fly by the RepeatMasker program, and for each a repeat analysis was done. The repeat information files were grouped by the GC level of the fragment, and processed according to need.

For the analyses shown in Figures 23 and 25, the number of repeat copies was compared. The number of individual insertions per megabase of DNA of a particular GC level was extracted from the RepeatMasker output (RepeatMasker provides information on which fragments originated from the same inserted transposable element). The Y axis is the ratio of the frequency of Alu (fig 23) or LINE1 (fig 25) over the average frequency of these elements in the genome.

Subsection: Segmental Duplications (p. 889)

Our assessment of low copy repeats (genomic duplications) within the draft genome sequence involved a global analysis of all non-overlapping sequence. The analysis using a combination of DNA sequence analysis software and a suite of perlscripts developed for paralogy detection ( J. A. Bailey and E. E. Eichler, in preparation). The basic methodology included: repeatmasking (RepeatMasker v.4/20) of all reference sequences for common repeats, the removal and splicing of such repeat segments, global BLAST analysis of the segments for the identification of non-overlapping high-scoring segments, using relaxed affine gapping parameters which allowed large gaps up to 1 kb to be traversed (parameters: -G 180 �E 1 �q �80 �r 30 -z 3000000000 �Y 3000000000 �e 1e-10 �F F)), the reintroduction of common repeat elements into each pairwise alignment followed by optimal global alignment of the segments using the program ALIGN ( E.W. Myers and W. Miller, CABIOS (1989) 4:11-17). To detect internal duplications within each query segment, a modified version of blastz (W. Miller, unpublished) was used with similar relaxed gap parameters (B=2 M=30 I=-80 V=-80 O=180 E=1 W=14 Y=1400). Alignment statistics were generated (program:align_scorer), and alignments that equaled or exceeded the threshold of 1000 bases aligned with over 90% similarity (i.e. gaps excluded) were analyzed. Generation of global alignments also acted as a safeguard against false positives from BLAST analysis. In cases of extremely large gaps (>1kb, alignments were fractured. Such cases were detected and merged for gaps up to 20 kb.

Subsection: Pericentromeres and telomeres (p. 890)

Chromosome 22 (May 2000, Sanger Centre) and Chromosome 21 (Sept., NCBI) were analyzed for large duplications as described. For interchromosomal duplications, the chromosome was analyzed versus the NT accession contigs (NCBI) and versus all remaining HTGS accessions (draft and finished) for interchromosomal duplications. A final global alignment threshold, >90%; >=1000 bases, was used. Due to unassembled allelic overlaps, sequences containing highly similar alignments (>99.5% NT; >99.0% HTGS) were excluded as probable allelic overlaps. The duplicated sequence for chromosome 21 and chromosome 22 were graphically viewed using the program parasight (J. A. Bailey and E.E. Eichler, in preparation).

Subsection: Genome-wide analysis of segmental duplications. (p. 891)

Finished sequence included all assembled sequence from NCBI within the NT dataset (version of 5 September 2000). A global alignment threshold (>90%; �1000 bases) was used for comparisons between finished sequence. Further selection limited alignments for analyses to those less than 99.5% identity, as those greater than that were likely to represent unassembled allelic overlaps.

The 15 July 2000 version of the draft genome sequence was used as the basis for the duplication analysis of the entire human draft. A final global alignment threshold (>90%, �1000 bases and <98%) defined the limits of detection for duplicated sequence. Sequence alignments (>98%) appear to represent mainly missed allelic overlaps many of which were subsequently merged in later releases of the assembly (e.g. 7 October 2000). Final validation of duplicated segments >98% within the working draft will require finished sequence data and/or experimental validation.

Section: Gene content of the human genome (p. 892)

Subsection: Noncoding RNAs (p. 892)

To identify transfer RNA genes, we used tRNAscan-SE version 1.21 [T.M. Lowe, S.R. Eddy. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25,955-964 (1997)] to analyze the 7 October 7 2000 version of the draft genome sequence. tRNAscan-SE predicted 504 tRNA genes and 144 tRNA-derived pseudogenes. Three of the predicted genes had a non-canonical anticodon loop length, preventing tRNAscan-SE from unambiguously identifying the anticodon; although there are many possible explanations for them, for our current purposes we classified these as probable pseudogenes. After manual examination of the tRNAs with unlikely anticodons, four more of the predicted genes were also classified as probable pseudogenes: a putative UAA suppressor, a putative UAG suppressor, and two putative UGA-reading selenocysteine tRNAs. The remaining gene predictions were not examined manually. We know that a small number of the 497 "true" tRNA genes are likely to be pseudogenes or parts of tRNA-derived repetitive sequence elements because tRNAscan-SE's ability to separate pseudogenes from true genes is not perfect. Because tRNAscan-SE models tRNA consensus secondary structure, it is not a reliable detector of divergent tRNA pseudogenes. To more accurately estimate the number of tRNA-derived pseudogenes, all 648 sequences detected by tRNAscan-SE were used as WU-BLASTN queries (see below), and another 173 significantly related sequences were detected, bringing the estimated pseudogene count to 324.

To identify all ncRNA homologues other than tRNA genes, we performed sequence similarity searches using WashU BLASTN 2.0MP (W. Gishl, unpublished; http://blast.wustl.edu) on the 7 October 2000 genome assembly, with parameters "-kap wordmask=seg B=50000 W=8" and the default DNA scoring matrix. True genes were operationally defined as BLAST hits with ≥95% identity over ≥95% the length of the query. Related sequences (e.g. pseudogenes) were operationally defined as all other BLAST hits with P-values <= 0.001. To reconcile our tRNA gene count of 497 with the larger number of 1310 generally found in textbook references, we reexamined the primary data in a classic paper by Hatlen and Attardi252. The textbook estimate of 1310 human tRNA genes was based on their observation that purified and labelled human 4S RNA (e.g. the tRNA population) hybridizes to HeLa genomic DNA and saturates at a fraction of about 1.1x10-5 of the genome. The molecular weight of the human genome was thought at that time to be 3.1x1012 (about 4.7 billion bases). Recalculation using the current estimated genome size of 3.2 billion bases [T.R. Tiersch, R.W. Chandler, S.S. Wachtel, S. Elias. Reference standards for flow cytometry and application in comparative studies of nuclear DNA content. Cytometry 10, 706-710 (1989); this paper] gives an estimate of 890 tRNA-complementary loci instead of 1310. Hatlen and Attardi also noted, but at the time could not explain, a puzzling length heterogeneity in their hybridized genomic loci. We believe that they were observing the tRNA pseudogene population, many of which are truncated copies of tRNA genes; therefore we believe their hybridization-based estimate of ~890 loci included tRNA pseudogenes (of which we count 324 in the genome) in addition to the true tRNA genes (of which we count 497 in the genome).

Subsection: Protein-coding genes (p. 896)

Sub-subsection: Exploring properties of known genes (p. 896)

Known genes were aligned with Spidey (S. Wheelan et al., manuscript in preparation) and Acembly (D. Thierry-Mieg and J. Thierry-Mieg, unpublished; http://www.acedb.org/ ), which in both cases align the cDNA to the genome while allowing for introns. The results from the two programs were in broad agreement. 5,364 RefSeq entroess (from a 1 September 2000) release were used as a source of the cDNAs. The alignments of the cDNAs to the genome could be classified by the proportion of the cDNA that aligned to the genome and by the percentage of identical nucleotides between the cDNA and the genomic sequence. In most cases, there was an unambiguous location for a cDNA. However, some proportion at each level of coverage had more than one site with high identity matches; in these cases, one of the locations was arbitrarily chosen.

Sub-subsection: Towards a complete index of human genes (p. 898)

Creating an initial gene index (p. 899)

Ensembl: Ensembl aims to predict coding sequences of true genes with high confidence, by only predicting coding sequence regions which have confirming evidence across their entire length. The sources of confirmation are cDNA, EST and protein-based similarity. The Genscan computer program was run across the individual fragments of the genome and the resulting peptides were used to search vertebrate mRNA sources (extracted from the EMBL databank; http://www.ebi.ac.uk/index.html), EST (vertebrate dbEST; ftp://ncbi.nlm.nih.gov/genbank ) and a non-redundant protein database (SWIR; http://www.ebi.ac.uk/swissprot/ ). Protein hits of greater than 200 bits similarity were then further processed by using the GeneWise program with the similar protein against the assembled draft genome sequence (the 17 July 2000 version). A final gene-building method was then used to merge all the resulting information, being Genscan predictions with confirming similarity at a number of exons and the GeneWise gene predictions. The method only accepted a join between two exons if consistent similarity evidence was found on each exon with the following thresholds: (a) all GeneWise predictions were accepted, although redundant GeneWise predictions were discarded; and (b) for exons predicted by Genscan, a single protein or cDNA similarity of at least 100 bits or higher, or at least two EST hits of 100 bits or higher. This final process allows for alternative splicing, although modeling alternative splicing has not been optimised. Ensembl produced 35,500 gene predictions with 44,860 transcripts.

Merge procedure to produce a final protein set: To generate a single protein set for further analysis we merged the known protein sequences from RefSeq (version of 29Sept2000), SWISSPROT (Release 39.6 of 30th Aug 200), TREMBL (TrEMBL Release 14.17 of 1 Oct 2000) and TREMBL_NEW (1 Oct 2000) with the gene predictions. The later protein analysis required a non-redundant protein set where genes were represented as a single protein sequence; in the case of alternative splicing, a single, representative protein sequence was required. We are aware of the obvious limitations of this representation of the human proteome, but accommodating alternative splicing in the downstream analysis was very complex.

The genome prediction data set was prepared as follows: the Ensembl and Genie predictions were merged by examining overlap of coding exons in genomic coordinates. Two gene predictions were merged if a single coding exon on the same strand overlapped. From this set of merged predictions, we used only the Ensembl+Genie and the Ensembl-only predictions. In cases where there was more than one prediction, or for Ensembl genes, more than one transcript, we chose the longest protein sequence from each merged unit to represent the gene. The protein level merge then occurred by comparing the union of all the data sources in an all-vs-all FASTA comparison using default parameters. Two protein sequences were merged if the match covered at least 95% of the shorter sequence, and identity was ≥ 95%, which takes into account both nearly identical protein sequences and also nearly identical fragments.

Special attention was needed to prevent overrepresentation of alternative splice forms. Firstly we expanded the Swissprot and Trembl databases to represent known splice variants in the protein merge, but only took a single protein (the canonical database sequence) for the final protein set. An additional cull for alternative splice forms which remained as separate proteins was produced by taking the corresponding DNA sequences of the known proteins (RefSeq, SWISSPROT, TREMBL and TREMBL_NEW) and matching back to the genome using the SSAHA program without requiring a valid gene structure alignment. If the DNA derived from two protein sequences matched at over 28 base pairs at the same location, the longest protein sequence was used. Finally, clear bacterial contamination (proteins which had an almost identical match to a bacterial protein) were removed.

Quality Control on the protein set: We took 31 genes which we could confirm as being unavailable at the time of the gene builds (22 from RefSeq, 9 from the Sanger Centre gene identification program on chromosome X). 3 of the 31 sequences could not be found in the genome assembly. Using the wublastp program (http://blast.wustl.edu) with default parameters, we matched the 31 sequences to the IPI.1 set and visually inspected the alignments. 19 sequences showed a clear match to an IPI protein; 14 hit a single IPI protein, 3 hit 2 IPI proteins, 1 hit 3 IPI proteins and 1 hit 4 IPI proteins.

RIKEN mouse cDNAs. We took a random sample (1,000) of known genes, Ensembl-Genie genes and Ensembl-only genes and matched them to the Riken cDNA set of 15,294 cDNAs using the TBLASTN program (http://www.ncbi.nlm.nih.gov/BLAST/ ) with default parameters, at the 1e-6 E-value significance level.

The IPI and IGI can be found at http://www.ensembl.org/IPI/.

Additional information for Table 23 (p. 902). All of the tables of Interpro are accessible through http://www.sanger.ac.uk/Users/agb/Ensembl.

Section: Segmental history of the human genome (p. 908)

Subsection: Conserved segments between human and mouse (p. 908)

Putatively orthologous sequences were determined in two ways. Curated orthologues determined at the Jackson Laboratory (www.informatics.jax.org) were obtained by FTP. In addition, orthologues were calculated at the NCBI using the program megaBLAST [Z. Zhang et al., J. Comput. Biol. 7, 203-214 (2000)]. In order to calculate orthologues, non-EST mRNA sequences found in LocusLink (http://www.ncbi.nlm.nih.gov:80/LocusLink/) were obtained for both human and mouse. The megaBLAST analysis was performed first using the mouse sequence as the query and the human sequence as the database. A second analysis was performed in which the human sequence was the query and the mouse sequence was the database. Reciprocal best hits were retained as putative orthologues.

mRNA sequences were aligned to the draft genome sequence (7 October 2000 version) using the mRNA alignment tool Spidey (S. Wheelan et al., manuscript in preparation). Only mRNAs that could be aligned with high confidence (>90% of the mRNA, including the entire coding sequence, had to align, the worst exon had to have a pc_id >95%, and at least one exon had to have a pc_id >98%), and where more than 50% of the mRNA was found, were kept. If an mRNA aligned to more than one contig, efforts were made to determine the most likely location. Alignments that were in conflict with LocusLink map locations were disregarded.

Segments in the conserved synteny map were determined as follows. A segment had to contain at least 2 genes from the same area of the mouse genome. In addition to the mouse genes having to be on the same chromosome, the genes had to be on the same part of the chromosome (note the 7 breakpoints on the X chromosome). A cutoff of 15 cM was chosen, so if two mouse genes were from the same chromosome, but >15 cM apart, then a breakpoint was made. A large cutoff was made because the MGD genetic map is an integrated map, and thus the margin of confidence is large.

Section: Applications to medicine and biology

Subsections: Disease genes (p. 911) and Drug targets (p. 912)

971 OMIM loci which had links to the SwissProt or Sptrembl databases were used to define a non-exhaustive set of disease genes. For protein targets of pharmaceutical interest, the list published by Drews427 was manually mapped to protein database identifiers wherever possible, resulting in a list of 603 drug target proteins. These were matched using wublastp with default parameters [S.F. Altschul et al. Basic local alignment search tool. J Mol Biol 215,403-10 (1990] to the genome protein database IPI.1. The results were filtered to focus primarily on potential paralogues. Thus, distant similarity of only a single domain was rejected. Highly similar proteins, which might arise from artificial duplications in genome assembly, were also rejected. After experimenting with a number of criteria, the following heuristic was used: for cases on the same chromosome, matches with 70% to 90% identity over at least 50 amino acids were accepted, whereas for matches on different chromosomes, matches with 70% to 95% identity over at least 50 amino acids were required. A number of these putative paralogues were then examined by eye to see whether the similarity differences were spread evenly throughout the protein, rather than concentrating between high similarity and weak similarity. The putative paralogues were also compared against other forms of data (e.g., EST databases) to verify the gene prediction.

Supplementary information for Table 24. Probable vertebrate-specific horizontal gene transfers in the human genome

A. Set of 113 probable vertebrate-specific horizontal acquisitions of bacterial genes

protein ID

Orthologs in other vertebrates

Representation in bacteria

Protein function

AAG01853, AAG01854, AAG01855, IGI_M1_ctg12913_93

Pig, rat, chicken

Scattered, best hit in Thermotoga

Formiminotransferase cyclodeaminase

AAG09731

Mouse

Many bacteria, best hit in Streptomyces

Predicted methyltransferase

BAA91937

Rat

Most bacteria, best hits in Gram-positive

Fatty acid synthase component, thioesterase

BAB13402, IGI_M1_ctg17129_30

Many mammals, electric ray

Many bacteria, best hit in Mycobacterium

Membrane protein of cholinergic synaptic vesicles, quinone oxidoreductase

P31639, P53794,

CAB81772, IGI_M1_ctg14654_1, O15280, P13866, CAC00574, IGI_M1_ctg19042_15

Many mammals

Many bacteria, best hit in Vibrio

Na/glucose (and other solutes) cotransporter

CAB96131

Many mammals

Gram-positive bacteria, best hit in Streptomyces

neuraminidase

gi4885285

Rat, mouse

Many bacteria, best hit in Legionella

UDP-N-acetylglucosamine-2-epimerase/N-acylmannosamine kinase

gi6912516

Cow

Most bacteria, best hit in Synechocystis

Methionine sulfoxide reductase

Q92819,

gi8850215

Many mammals, chicken, Xenopus, Danio

Many bacteria, best hit in Rhizobium

Hyaluronan synthase

gi8922122

Mouse

Many bacteria and archaea, best hit in Synechocystis

Predicted arylsulfatase

gi8922697, IGI_M1_ctg595_96

Mouse, pig

Actinomycetes and several other bacteria, best hit in Streptomyces

Predicted oxidoreductase fused to multitransmembrane domain (bacterial homologs have oxidoreducatse domain only)

gi8923007

Rat, pig

Gram-positive bacteria, Thermotoga

Betaine-homocysteine methyltransferase

gi8923543, O60363, O75202, O75203

Mouse, rat

Most bacteria, best hit in Bacillus

Acetyl-CoA synthetase

IGI_M1_ctg14294_11, IGI_M1_ctg15247_50

Mouse, rat

Most bacteria, best hit in Methanothermobacter

Acetyl-CoA synthetase

gi8923844

Rat

Many bacteria, best hit in Anabaena

Soluble adenylate cyclase

IGI_M1_ctg13129_34

Rat, pig, Xenopus

Actinomycetes, Pseudomonas (best hit)

Glycine amidotransferase

IGI_M1_ctg13284_29

Mouse

Many bacteria, best hit in Streptomyces

Predicted methyltransferase

IGI_M1_ctg14210_35

Mouse

Many bacteria, best hit in Aquifex

CMP-N-acetylneuraminic acid synthase

P45381,

IGI_M1_ctg14333_22

Mouse, rat, cow

Cyanobacteria, gamma-proteobacteria, best hit in Synechocystis

aspartoacylase

IGI_M1_ctg15880_11

zebrafish

Gamma-proteobacteria

unknown

IGI_M1_ctg15970_12

Mouse, rat

Gamma-proteobacteria

Thiopurine S-methyltransferase

P21397, P27338,

IGI_M1_ctg16029_6, IGI_M1_ctg16029_9

Many mammals, rainbow trout

Many bacteria, best hits in Mycobacterium, Micrococcus, Synechocystis

Monoamine oxidase

IGI_M1_ctg16704_2

Rat, mouse, Oryzias (fish)

Gamma-proteobacteria, gram-positive bacteria, best hit in Pseudomonas

Quinone reductase

IGI_M1_ctg16942_3

Rat

Many bacteria, best hit in Thermotoga

Aldo-keto reductase

IGI_M1_ctg17057_13, IGI_M1_ctg17565_12

Mouse, rattlesnake, scomber

Gram-positive bacteria, Cyanobacteria, best hit in Bacillus

L-amino acid oxidase

IGI_M1_ctg25185_50

Mouse, zebrafish, fugu

Many bacteria, best hit in Pseudomonas

Predicted epozide hydrolase

IGI_M1_ctg19053_32

Rat, mouse

Many bacteria, best hit in E. coli

Phosphoglycerate mutase 1

O00154

Rat

Most bacteria, best hit in Bacillus

Acyl-CoA-thioesterase

O43826

Mouse, rat

Many bacteria, best hit in Chlamydia

Glucose-6-phosphate transporter

P10745

Many mammals, chicken, Xenopus, zebrafish etc

Many bacteria, best hit in Vibrio

Interphotoreceptor retinol-binding protein (tail-specific protease in bacteria)

P11245

Many mammals, chicken

Many bacteria, best hit in Mycobacterium

Arylamine N-acetyltransferase

P16455

Mouse, hamster

Many bacteria, best hit in Bacillus

Methylated-DNA-protein-cysteine methyltransferase

P29372

Mouse, rat

Many bacteria, best hit in Streptomyces

N-methylpurine Dna glycosylase

P46597

Cow, chicken

Actinomycetes

Hydroxyindole O-methyltransferase

P51570

Mouse

Most bacteria, best hit in E.coli

galactokinase

Q14397

Rat, Xenopus

Many bacteria, best hit in Vibrio

Glucokinase regulator, predicted sugar phosphate isomerase

Q9UBM0

Mouse, rat

Most bacteria, best hit in E. coli

8-oxo-dGTPase

Q9UHN1

Mouse, chicken

Many bacteria, best hit in Thermus

Mitochondrial DNA polymerase, regulatory subunit (glycyl-tRNA synthetase in bacteria)

gi10047132

NONE

Many bacteria, best hit in Thermotoga

3-ketoacyl-acyl-carrier protein reductase

gi4759048

NONE

All bacteria, best hit in Deinococcus

Ribosomal protein L33 homolog

P28330,

gi7656849

NONE

Most bacteria, best hit in Pseudomonas

Acyl-CoA dehydrogenase

gi7662276

NONE

Gamma-proteobacteria

unknown

gi7705582

NONE

Most bacteria, best hit in Pseudomonas

Glutamine synthetase

gi7705660

NONE

Most bacteria, best hits in gram-positive

Surfactant protein B, adenylosuccinate lyase

gi7705929

NONE

Most bacteria, best hit in Bacillus

Neutral sphyngomyelinase

gi7705953

NONE

All bacteria, best hit in Thermotoga

Unknown

gi8922911

NONE

Most bacteria, best hit in Rickettsia

Oxygen-independent coproporphyrinogen oxidase

gi8922946

NONE

Synechocystis, Pseudomonas

Predicted oxidoreductase

gi8923001

NONE

Many bacteria, best hit in Rickettsia

Predicted a/b superfamily hydrolase

gi8923417

NONE

Many bacteria, best hit in Aquifex

ADP-ribosylglycohydrolase

IGI_M1_ctg12730_25

NONE

Many bacteria, best hit in Agrobacterium

In bacteria, protein involved in conjugal DNA transfer and secretion (VirB10)

Q9ULI2,

IGI_M1_ctg12741_7

NONE

Most bacteria, best hit in Haemophilus

Ribosomal protein S6 � glutamic acid ligase

IGI_M1_ctg12824_124

NONE

Gamma-proteobacteria

Thiol-disulfide isomerase

IGI_M1_ctg12824_69

NONE

Many bacteria, best hit in Rickettsia

In bacteria, protein involved in conjugal DNA transfer (VirB4)

IGI_M1_ctg13002_32

NONE

Most bacteria, bect hit in Corynebacterium

Homoserine dehydrogenase

IGI_M1_ctg13238_61

NONE

Cyanobacteria, Actinomycetes, Sphingomonas (best hit in Synechocystis)

4-oxalomesaconate hydratase

IGI_M1_ctg13284_79

NONE

Gamma-proteobacteria

Predicted helicase

IGI_M1_ctg13305_116

NONE

Many bacteria, best hit in Thermotoga

Predicted phosphatase (homolog of histone H2A macro domain)

IGI_M1_ctg13419_28

NONE

Most bacteria, best hit in E. coli

Cyclopropane-fatty-acid-phospholipid synthase

IGI_M1_ctg13419_35

NONE

Many bacteria, best hit in Synechocystis

Predicted membrane transporter

IGI_M1_ctg13459_1

NONE

Most bacteria, best hit in Chlamydophila

Acyl-CoA theoester hydrolase

IGI_M1_ctg13492_20

NONE

Many bacteria and archaea, best hit in Chlamydia

dCTP deaminase

IGI_M1_ctg13715_89

NONE

Most bacteria, best hit in Salmonella

Ribonucleoside-diphosphate reductase

IGI_M1_ctg14250_20

NONE

Many bacteria, best hit in Vibrio

N-acetylneuraminate lyase

IGI_M1_ctg14293_4

NONE

Many bacteria, best hit in Pseudomonas

Triacylglycerol lipase

IGI_M1_ctg14420_10, IGI_M1_ctg14420_109

NONE

Many bacteria, best hit in Synechocystis

Predicted membrane symporter

IGI_M1_ctg15343_7

NONE

Gamma-proteobacteria

Anaerobic ribonucleide-diphosphate reductase

IGI_M1_ctg16010_18

NONE

All bacteria, best hit in Borrelia

Predicted metal-dependent hydrolase

IGI_M1_ctg16516_13

NONE

Many bacteria, best hit in Streptomyces

Polyketide synthase

IGI_M1_ctg16537_325

NONE

Most bacteria, best hit in bacillus

Orotate phosphoribosyl transferase

IGI_M1_ctg16537_333

NONE

Gram-positive bacteria, best hit in bacillus

Quinone oxidoreductase

IGI_M1_ctg18743_55

NONE

Many bacteria, best hit in E. coli

Selenophosphate synthase

IGI_M1_ctg19042_43

NONE

Many bacteria, best hit in Xylella

GTPase involved in ferrous iron transport

IGI_M1_ctg19053_28

NONE

Gram-positive bacteria

ATP-dependent nuclease subunit, helicase

IGI_M1_ctg19053_29

NONE

Most bacteria, best hit in Bacillus

Acetylornithine aminotransferase

IGI_M1_ctg19053_30

NONE

Many bacteria, best hit in Bacillus

IGI_M1_ctg19053_31

NONE

Most bacteria, best hit in Synechocystis

Mg-chelatase ATPase subunit

IGI_M1_ctg19241_54

NONE

Gamma-proteobacteria

Predicted membrane transporter

IGI_M1_ctg25107_24

NONE

Many bacteria, best hit in Bacillus

Non-ribosomal peptide synthetase

IGI_M1_ctg_52, O76044

NONE

Proteobacteria, gram-positive bacteria, best hit in E. coli

Beta-xylosidase

O43600

NONE

Most bacteria, best hits in gamma-proteobacteria

di-tripeptide ABC transporter, transmembrane subunit

O75588

NONE

Most bacteria, best hit in Neisseria

TRNA (guanine N1)-methyltransferase

P19971

NONE

Many bacteria, best hit in Deinococcus

Thymidine phosphorylase

Q16490

NONE

Most bacteria, best hit in Neisseria

Melanoma-associated antigen; DNA repair ATPase RecN in bacteria

Q99540

NONE

All bacteria, best hit in bacillus

NAD-dependent DNA ligase

Q9ULF2

NONE

Many bacteria, best hit in Thermotoga

Predicted phosphatase; homologs of histone H2A macro domain

gi4505321

NONE

Many bacterial plasmids; best hit in Leuconostoc

Myelin transcription factor; plasmid replication protien in bacteria

  1. The list of 110 genes initially considered to be candidates for horizontal gene transfer, but later determined not to be.

AAF28477

AAF28928

AAF35253

AAF35254

AAF62400

AAG09699

AAG09720

AAG09721

BAA92632

BAA95147

IGI_M1_ctg12284_1

IGI_M1_ctg12483_8

IGI_M1_ctg12567_31

IGI_M1_ctg12824_61

IGI_M1_ctg12824_91

IGI_M1_ctg12830_11

IGI_M1_ctg13002_39

IGI_M1_ctg13102_6

IGI_M1_ctg13297_17

IGI_M1_ctg13305_26

IGI_M1_ctg13313_62

IGI_M1_ctg13585_3

IGI_M1_ctg13708_109

IGI_M1_ctg13760_30

IGI_M1_ctg14409_24

IGI_M1_ctg14601_98

IGI_M1_ctg14715_17

IGI_M1_ctg14715_21

IGI_M1_ctg14757_3

IGI_M1_ctg14814_21

IGI_M1_ctg15014_27

IGI_M1_ctg15034_3

IGI_M1_ctg15071_1

IGI_M1_ctg15101_11

IGI_M1_ctg15313_8

IGI_M1_ctg15814_20

IGI_M1_ctg15841_3

IGI_M1_ctg15964_93

IGI_M1_ctg15972_32

IGI_M1_ctg16291_5

IGI_M1_ctg16479_110

IGI_M1_ctg16574_45

IGI_M1_ctg16586_6

IGI_M1_ctg16842_57

IGI_M1_ctg17140_20

IGI_M1_ctg18008_10

IGI_M1_ctg18099_16

IGI_M1_ctg18251_36

IGI_M1_ctg18484_6

IGI_M1_ctg19153_2

IGI_M1_ctg19241_70

IGI_M1_ctg25118_2

IGI_M1_ctg25123_18

IGI_M1_ctg25280_48

IGI_M1_ctg36_71

IGI_M1_ctg5499_13

IGI_M1_ctg62_5

O14644

O43174

O75627

O75839

O95165

P00966

P01241

P01242

P01243

P09587

P10265

P10415

P15559

P16083

P17643

P18440

P30042

P34913

P40126

P50440

P51580

P51606

Q07820

Q08426

Q12765

Q14191

Q15031

Q15333

Q16548

Q16873

Q92839

Q92843

Q99735

Q9UGX5

Q9UNL5

Q9Y539

Q9Y540

Q9Y541

gi4557791

gi4826688

gi4885515

gi5174389

gi6005882

gi6912664

gi7019521

gi7706168

gi8922766

gi8922871

gi8923586

gi8923752

gi8923900

gi9506741

gi9845285

 

 

4. Errata and other corrections

p. 887. A reference was omitted in the last sentence of the fifth paragraph in the right hand column; The sentence should read "�may have arisen by LINE-based transduction of 3' flanking sequences205, 205a, 206." Reference 205a is Goodier, J.L., Ostertag, E.M., Kazazian, H.H. Transduction of 3'-flanking sequences is common in L1 retrotransposition. Hum. Mol. Genet. 9, 653-657 (2000).

Figure 33, p. 892. The units on the Y axis are bp, not kb.

There are errors in both the legend and the labels at the top of the figure. The legend should read: "Sequence properties of segmental duplications. Distributions of length and per cent nucleotide identity are shown as a function of the number of aligned bp from the finished vs. finished human genomic sequence dataset. Intrachromosomal (blue), interchromosomal (red)."

p. 898, line 31. The final phrase of the sentence"�and the representativeness of currently 'known' human genes." should be deleted. The sentence should read "Before discussing the gene predictions for the human genome, it is useful to consider background issues, including previous estimates of the number of human genes and lessons learned from worms and flies."

p. 900, line 38. Remove "�(see above)� "

In Table 22 (Properties of the IGI/IPI human protein set, p. 900), the number of Matches to nonhuman proteins (third column) in the Ensembl data set (third row) should be 8,126, not 81,126.

The legend to Figure 41 (p. 905) should begin: "For each of 27 common domain families, the number of different Pfam domain types that co-occur with the family in each of the five eukaryotic proteomes. The 27 families were chosen to include the10 most common domain families in each proteome. The data are ranked�."

5. Additional Acknowledgements

A. The following is a list of the contributors of unpublished human genomic sequence deposited in GenBank and used in the development of the draft genome sequence, all of whom gave us permission to use their unpublished data.

E. Chen et al., Center for Genetic Medicine and Applied Biosystems; USA

S.-F. Tsai, National Yang-Ming University, Institute of Genetics, Taipei, 155 Li-Rong St Section 2, Peitou, Taiwan 11221, Republic of China

Y. Nakamura, K. Koyama, et al., Institute of Medical Science, the University of Tokyo, Human Genome Center, Laboratory of Molecular Medicine, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan

G. Kremmidiotis and D. Callen, Cytogenetics & Molecular Genetics, Women's & Children's Hospital, 72 King William Rd, Adelaide, SA 5006, Australia

K.T. Montgomery, S.T. Lau and R. Kucherlapati, Albert Einstein College of Medicine, Department of Molecular Genetics, 1300 Morris Park Avenue, Bronx, NY 10461, USA

V. Kodoyianni, Y.Ge, G.K. Krummel, L. Grable, J. Severin, M. Shannon, A. Brower, A.S. Olsen and L.M. Smith, Department of Chemistry, University of Wisconsin, 1101 University Ave., Madison, WI 53706, USA

B. Weiss et al. Human Genetics, University of Utah, 20 S. 2030 E., Rm 308, Salt Lake City, Utah 84112, USA

E.S. Fitzpatrick et al., Department of Human Genetics, Merck & Co. Inc, SumneyTown Pike, West Point, PA 19486, USA

T. Shina, Tokai University School of Medicine, Molecular Life Science, 2; Bohseidai, Isehara, Kanagawa 259-1193, Japan

E. Ben-Asher, N. Avidan, T. Olender, D. Lancet, L. Salmon and H. Tamary, Department of Molecular Genetics, Weizmann Institute of Science, P.O.Box 26, Rehovot 76100, Israel

K. Yoshinaga, K. Sakurada and A. Horii, Tohoku University School of Medicine, Department of Molecular Pathology, 2-1 Seiryo-machi, Aoba-ku, Sendai 980-8575, Japan

R.M. Crowl, D. Luk and M. Milnamow, Arthritis Research, Novartis Pharmaceuticals Corp., 556 Morris Ave., Summit, NJ 07901, USA

L.M. Gouya, c. Martin, J.-C.P. Deybach and H.V. Puy, Biochemistry and Molecular Genetics, INSERM U409, Hopital Louis Mourier, 178, Rue des Renouillers, Colombes, 92700, France

M. Stark, M. Creaven and D. Grafham, Genetic Cancer Susceptibility Unit, International Agency for Research on Cancer, 150 Cours Albert-Thomas, Lyon Cedex 08 69372, France

D. Kedra, J. Trifunovic, E. Seroussi, J. Jacobson, I. Fransson and J. Dumanski, Department of Molecular Medicine, Karolinska Hospital, Stockholm, Sweden

L.K. O'Brien, H.F. Sims and A.W Strauss, Pediatrics, St. Louis Children's Hospital, 1 Children's Place, St. Louis, MO 63110, USA

S. Richards, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA

E.H. Rozemuller and M.G.J Tilanus, Pathology, University Hospital Utrecht, P.O.Box 85500, Utrecht 3508GA, The Netherlands

T. Nobukani and Y. Murakami, National Cancer Center Research Institute, Oncogene Div.; 5-1-1, Tsukiji, Chuo-ku, Tokyo 104-0045, Japan

P. Verhasselt, Janssen Research Foundation, Beerse, Belgium

B. Additional Citations. The following papers, listed in descending order of amount of sequence, are those in which published sequences that were included in the draft genome sequence were first published.

  1. Loftus, B. J. et al. Genome duplications and other features in 12 Mb of DNA sequence from human chromosome 16p and 16q. Genomics 60, 295-308 (1999).
  2. Shiina, T. et al. Molecular dynamics of MHC genesis unraveled by sequence analysis of the 1,796,938-bp HLA class I region. Proc Natl Acad Sci U S A 96, 13282-7. (1999).
  3. Matsuda, F. et al. The complete nucleotide sequence of the human immunoglobulin heavy chain variable region locus. J Exp Med 188, 2151-62. (1998).
  4. Su, J.-S. & Tsai, S.-F. The Complete Genomic DNA Sequence of the Human ADH Gene Complex., (in preparation).
  5. Mimori, K. et al. Cancer-specific chromosome alterations in the constitutive fragile region FRA3B. Proc Natl Acad Sci U S A 96, 7456-61. (1999).
  6. Gomyo, H. et al. A 2-Mb sequence-ready contig map and a novel immunoglobulin superfamily gene IGSF4 in the LOH region of chromosome 11q23.2. Genomics 62, 139-46. (1999).
  7. Ciccodicola, A. et al. Differentially regulated and evolved genes in the fully sequenced Xq/Yq pseudoautosomal region. Hum Mol Genet 9, 395-401. (2000).
  8. Minaguchi, T. et al. Complete DNA sequence and characterization of a 330-kb VNTR-rich region on chromosome 6q27 that is commonly deleted in ovarian cancer. DNA Res 6, 131-6. (1999).
  9. Shiina, T. et al. Nucleotide sequencing analysis of the 146-kilobase segment around the IkBL and MICA genes at the centromeric end of the HLA class I region. Genomics 47, 372-82. (1998).
  10. Nishiwaki, T., Daigo, Y., Tamari, M., Fujii, Y. & Nakamura, Y. Molecular cloning, mapping, and characterization of two novel human genes, ORCTL3 and ORCTL4, bearing homology to organic-cation transporters. Cytogenet Cell Genet 83, 251-5. (1998).
  11. Ruddy, D. A. et al. A 1.1-Mb transcript map of the hereditary hemochromatosis locus. Genome Res 7, 441-56. (1997).
  12. Mizuki, N. et al. Nucleotide sequence analysis of the HLA class I region spanning the 237-kb segment around the HLA-B and -C genes. Genomics 42, 55-66. (1997).
  13. Inoue, H. et al. Sequence of the FRA3B common fragile region: implications for the mechanism of FHIT deletion. Proc Natl Acad Sci U S A 94, 14584-9. (1997).
  14. Touchman, J. W. et al. The genomic region encompassing the nephropathic cystinosis gene (CTNS): complete sequencing of a 200-kb segment and discovery of a novel gene within the common cystinosis-causing deletion. Genome Res 10, 165-73. (2000).
  15. Beck, S. et al. Evolutionary dynamics of non-coding sequences within the class II region of the human MHC. J Mol Biol 255, 1-13. (1996).
  16. Nomiyama, H. et al. Organization of the chemokine gene cluster on human chromosome 17q11.2 containing the genes for CC chemokine MPIF-1, HCC-2, HCC-1, LEC, and RANTES. J Interferon Cytokine Res 19, 227-34. (1999).
  17. Toguchida, J. et al. Complete genomic sequence of the human retinoblastoma susceptibility gene. Genomics 17, 535-43. (1993).
  18. Erlandsson, R., Wilson, J. F. & Paabo, S. Sex chromosomal transposable element accumulation and male-driven substitutional evolution in humans. Mol Biol Evol 17, 804-12. (2000).
  19. van Geel, M. et al. The FSHD region on human chromosome 4q35 contains potential coding regions among pseudogenes and a high density of repeat elements. Genomics 61, 55-65. (1999).
  20. Chen, Q. et al. Sequence of a 131-kb region of 5q13.1 containing the spinal muscular atrophy candidate genes SMN and NAIP. Genomics 48, 121-7. (1998).
  21. Baens, M., Peeters, P., Guo, C., Aerssens, J. & Marynen, P. Genomic organization of TEL: the human ETS-variant gene 6. Genome Res 6, 404-13. (1996).
  22. Smith, T. M. et al. Complete genomic sequence and analysis of 117 kb of human DNA containing the gene BRCA1. Genome Res 6, 1029-49. (1996).
  23. Tauchi, H. et al. Sequence analysis of an 800-kb genomic DNA region on chromosome 8q21 that contains the Nijmegen breakage syndrome gene, NBS1. Genomics 55, 242-7. (1999).
  24. Boldog, F. et al. Chromosome 3p14 homozygous deletions and sequence analysis of FRA3B. Hum Mol Genet 6, 193-203. (1997).
  25. Kedra, D. et al. The germinal center kinase gene and a novel CDC25-like gene are located in the vicinity of the PYGM gene on 11q13. Hum Genet 100, 611-9. (1997).
  26. Brand-Arpon, V. et al. A genomic region encompassing a cluster of olfactory receptor genes and a myosin light chain kinase (MYLK) gene is duplicated on human chromosome regions 3q13-q21 and 3p13. Genomics 56, 98-110. (1999).
  27. Glusman, G. et al. Sequence, structure, and evolution of a complete human olfactory receptor gene cluster. Genomics 63, 227-45. (2000).
  28. Bednarek, A. K. et al. WWOX, a novel WW domain-containing protein mapping to human chromosome 16q23.3-24.1, a region frequently affected in breast cancer. Cancer Res 60, 2140-5. (2000).
  29. Wang, A. et al. Association of unconventional myosin MYO15 mutations with human nonsyndromic deafness DFNB3. Science 280, 1447-51. (1998).
  30. Connelly, M. A., Zhang, H., Kieleczawa, J. & Anderson, C. W. The promoters for human DNA-PKcs (PRKDC) and MCM4: divergently transcribed genes located at chromosome 8 band q11. Genomics 47, 71-83. (1998).
  31. O'Keefe, D. S. et al. Mapping, genomic organization and promoter analysis of the human prostate-specific membrane antigen gene. Biochim Biophys Acta 1443, 113-27. (1998).
  32. Osborne, L. R. et al. Identification of genes from a 500-kb region at 7q11.23 that is commonly deleted in Williams syndrome patients. Genomics 36, 328-36. (1996).
  33. Andre, C. et al. Sequence analysis of two genomic regions containing the KIT and the FMS receptor tyrosine kinase genes. Genomics 39, 216-26. (1997).
  34. Yang, Y. et al. A 1-Mb PAC contig spanning the common eliminated region 1 (CER1) in microcell hybrid-derived SCID tumors. Genomics 62, 147-55. (1999).
  35. Levy-Lahad, E. et al. Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene. Genomics 34, 198-204. (1996).
  36. Ellis, M. C. et al. HLA class II haplotype and sequence analysis support a role for DQ in narcolepsy. Immunogenetics 46, 410-7. (1997).
  37. Giacalone, J. et al. Sequence assembly of a BAC clone from the human Y chromosome using high-resolution optical restriction maps Gene., (in preparation).
  38. Riethman, H. C. et al. Integration of telomere sequences with the draft human genome sequence. Nature 409, 948-951 (2001).
  39. Barry, A. E. et al. The 10q25 neocentromere and its inactive progenitor have identical primary nucleotide sequence: further evidence for epigenetic modification. Genome Res 10, 832-8. (2000).
  40. Crowl, R. M. & al., e. Genomic organization and promoter characterization of the human HTRA (PRSS11) gene, (in preparation).
  41. Winfield, S. L., Tayebi, N., Martin, B. M., Ginns, E. I. & Sidransky, E. Identification of three additional genes contiguous to the glucocerebrosidase locus on chromosome 1q21: implications for Gaucher disease. Genome Res 7, 1020-6. (1997).
  42. Collins, F. S. & Weissman, S. M. The molecular genetics of human hemoglobin. Prog Nucleic Acid Res Mol Biol 31, 315-462. (1984).
  43. Qu, X. Y. et al. Analysis of a 69-kb contiguous genomic sequence at a putative tumor suppressor gene locus on human chromosome 6q27. DNA Seq 9, 189-204. (1998).
  44. Chen, E. Y. et al. The human growth hormone locus: nucleotide sequence, biology, and evolution. Genomics 4, 479-97. (1989).
  45. Zucman-Rossi, J., Legoix, P., Victor, J. M., Lopez, B. & Thomas, G. Chromosome translocation based on illegitimate recombination in human tumors. Proc Natl Acad Sci U S A 95, 11786-91. (1998).
  46. McCombie, W. R. et al. Expressed genes, Alu repeats and polymorphisms in cosmids sequenced from chromosome 4p16.3. Nat Genet 1, 348-53. (1992).
  47. Granadino, B., Beltran-Valero de Bernabe, D., Fernandez-Canon, J. M., Penalva, M. A. & Rodriguez de Cordoba, S. The human homogentisate 1,2-dioxygenase (HGO) gene. Genomics 43, 115-22. (1997).
  48. Witke, W. F. et al. Complete structure of the human Gc gene: differences and similarities between members of the albumin gene family. Genomics 16, 751-4. (1993).
  49. Burn, T. C. et al. Analysis of the genomic sequence for the autosomal dominant polycystic kidney disease (PKD1) gene predicts the presence of a leucine-rich repeat. The American PKD1 Consortium (APKD1 Consortium). Hum Mol Genet 4, 575-82. (1995).
  50. Reichel, M. et al. Rapid isolation of chromosomal breakpoints from patients with t(4;11) acute lymphoblastic leukemia: implications for basic and clinical research. Cancer Res 59, 3357-62. (1999).
  51. Guenet, L. et al. Human release factor eRF1: structural organisation of the unique functional gene on chromosome 5 and of the three processed pseudogenes. FEBS Lett 454, 131-6. (1999).
  52. Shiina, T. et al. The beta-1,3-galactosyltransferase-4 (B3GALT4) gene is located in the centromeric segment of the human MHC class II region. Immunogenetics 51, 75-8. (2000).
  53. Wiedemann, L. M., MacGregor, A. & Caldas, C. Analysis of the region of the 5' end of the MLL gene involved in genomic duplication events. Br J Haematol 105, 256-64. (1999).
  54. Kishi, F. & Tabuchi, M. Human natural resistance-associated macrophage protein 2: gene cloning and protein identification. Biochem Biophys Res Commun 251, 775-83. (1998).
  55. Hampe, A., Shamoon, B. M., Gobet, M., Sherr, C. J. & Galibert, F. Nucleotide sequence and structural organization of the human FMS proto-oncogene. Oncogene Res 4, 9-17. (1989).
  56. Kikuti, Y. Y. et al. Physical mapping 220 kb centromeric of the human MHC and DNA sequence analysis of the 43-kb segment including the RING1, HKE6, and HKE4 genes. Genomics 42, 422-35. (1997).
  57. Freitas, E. M. et al. Sequencing of 42kb of the APO E-C2 gene cluster reveals a new gene: PEREC1. DNA Seq 9, 89-100. (1998).
  58. Nelson, J. E. & Krawetz, S. A. Characterization of a human locus in transition. J Biol Chem 269, 31067-73. (1994).
  59. Bock, J. H. et al. Nucleotide sequence analysis of the human KCNJ1 potassium channel locus. Gene 188, 9-16. (1997).
  60. Huber, R. et al. DNA methylation in transcriptional repression of two differentially expressed X-linked genes, GPC3 and SYBL1. Proc Natl Acad Sci U S A 96, 616-21. (1999).
  61. Yoshiura, K. et al. Characterization of a novel gene disrupted by a balanced chromosomal translocation t(2;19)(q11.2;q13.3) in a family with cleft lip and palate. Genomics 54, 231-40. (1998).
  62. Kuivaniemi, H., Tromp, G., Chu, M. L. & Prockop, D. J. Structure of a full-length cDNA clone for the prepro alpha 2(I) chain of human type I procollagen. Comparison with the chicken gene confirms unusual patterns of gene conservation. Biochem J 252, 633-40. (1988).
  63. Iris, F. J. et al. Dense Alu clustering and a potential new member of the NF kappa B family within a 90 kilobase HLA class III segment. Nat Genet 3, 137-45. (1993).
  64. Yoshitake, S., Schach, B. G., Foster, D. C., Davie, E. W. & Kurachi, K. Nucleotide sequence of the gene for human factor IX (antihemophilic factor B). Biochemistry 24, 3736-50. (1985).
  65. Degen, S. J., Rajput, B. & Reich, E. The human tissue plasminogen activator gene. J Biol Chem 261, 6972-85. (1986).
  66. Beckmann, J. S. et al. Identification of muscle-specific calpain and beta-sarcoglycan genes in progressive autosomal recessive muscular dystrophies. Neuromuscul Disord 6, 455-62. (1996).
  67. Bashir, R. et al. A gene related to Caenorhabditis elegans spermatogenesis factor fer-1 is mutated in limb-girdle muscular dystrophy type 2B. Nat Genet 20, 37-42. (1998).
  68. Kimura, Y. et al. Genomic structure and chromosomal localization of GML (GPI-anchored molecule-like protein), a gene induced by p53. Genomics 41, 477-80. (1997).
  69. Brookman, K. W. et al. ERCC4 (XPF) encodes a human nucleotide excision repair protein with eukaryotic recombination homologs. Mol Cell Biol 16, 6553-62. (1996).
  70. Murray, J. et al. Comparative sequence analysis of human minisatellites showing meiotic repeat instability. Genome Res 9, 130-6. (1999).
  71. Furuno, N. et al. Complete nucleotide sequence of the human RCC1 gene involved in coupling between DNA replication and mitosis. Genomics 11, 459-61. (1991).
  72. Jenkins, J. K. et al. Intracellular IL-1 receptor antagonist promoter: cell type-specific and inducible regulatory regions. J Immunol 158, 748-55. (1997).
  73. Bottenus, R. E., Ichinose, A. & Davie, E. W. Nucleotide sequence of the gene for the b subunit of human factor XIII. Biochemistry 29, 11195-209. (1990).
  74. Evans, P. & Kemp, J. Exon/intron structure of the human transferrin receptor gene. Gene 199, 123-31. (1997).
  75. Yatsunami, K. et al. Structure of the L-histidine decarboxylase gene. J Biol Chem 269, 1554-9. (1994).
  76. Ala-Kokko, L. et al. Conservation of the sizes of 53 introns and over 100 intronic sequences for the binding of common transcription factors in the human and mouse genes for type II procollagen (COL2A1). Biochem J 308, 923-9. (1995).
  77. Madeyski, K., Lidberg, U., Bjursell, G. & Nilsson, J. Structure and organization of the human carboxyl ester lipase locus. Mamm Genome 9, 334-8. (1998).
  78. Kelly, A. & Trowsdale, J. Complete nucleotide sequence of a functional HLA-DP beta gene and the region between the DP beta 1 and DP alpha 1 genes: comparison of the 5' ends of HLA class II genes. Nucleic Acids Res 13, 1607-21. (1985).
  79. Lawrance, S. K., Das, H. K., Pan, J. & Weissman, S. M. The genomic organisation and nucleotide sequence of the HLA-SB(DP) alpha gene. Nucleic Acids Res 13, 7515-28. (1985).