Introduction

 - here we give a brief overview of GeneLynx purpose and organization.


Text search - if you want to locate a gene using a keyword or other identifier.


Quick Search - searches through all keywords and alphanumeric identifiers (including accession numbers). Simple, fast and powerful.

Advanced search - searches specified identifiers, logically combined in various ways.

NOTE TO THE USER - you can help us make it better

BLAST search - if you want to locate a gene in GeneLynx using a query nucleotide or protein sequence.


The contents of a GeneLynx record - on finding your way through the list of links for a particular gene.


User comment submission - what to do to help us improve the quality of GeneLynx data by correcting the errors you noticed and pointing to information that is missing.


Batch retrieval of GeneLynx IDs - now you can submit a list of identifiers (accession numbers, OMIM IDs, LocusLink numbers etc.) and GeneLynx will match them to the corresponding GeneLynx records. The resuld is a list in either HTML or plain text format which you can use for linking your database to GeneLynx.


Submitting a new resource for inclusion in GeneLynx - if you are in charge of a database with entries relating to human gene and would like GeneLynx to include links to it, we have tried hard to make it as easy as possible for both you and ourselves. Please read how to submit the data for your resource, and it will be included in GeneLynx in no time!




  • Appendix 1: Resources linked to GeneLynx

  • Appendix 2: Identifiers searchable by Quick Search

  • Appendix 3: Identifiers searchable by Advanced Search

  • Appendix 4: Browser compatibility issues


Introduction


GeneLynx is a service whose aim is to provide the fastest way between an IDENTIFIER for a given HUMAN GENE and all the data available for that gene on the web. It is organized as a set of HYPERLINKS leading to numerous databases and other resources. The number of currently linked resources is 32, and new ones are added regularly. For a complete list of linked resources, see Appendix 1.


An IDENTIFIER to search GeneLynx with is almost any piece of information that pertains to the gene of interest. It can be e.g.


a keyword, usually a word contained in the name or description of the gene, but also any alphanumeric identifier such as an accession number or ID (e.g. Swiss-Prot ID). The list of identifiers that can be used as keywords is given in Appendix 2. They can be used in either Quick Search or Advanced Text Search.

an identifier of the specified type, either numeric or alphanumeric, can be specified together with its type using the Advanced Text Search form. The list of identifiers that can be searched this way is given in Appendix 3.

a nucleotide or protein sequence can be used to locate a gene in GeneLynx system by searching them against cDNA and EST sequences associated with GeneLynx records. The searches are performed using BLAST 2.1.

Text search


There are two different ways of text search. In most cases the simpler Quick search will be more than enough powerful to locate a GeneLynx record for a gene of interest. Use Advanced search only in the cases where


it is not possible to formulate a query that is precise enough using Quick Search, or

you want to search GeneLynx by purely numeric identifiers, such as those of GDB, LocusLink, or OMIM, which are currently not searchable by Quick Search.

Quick Search


The Quick Search can be accessed from either GeneLynx Home Page or from Text Search page. There is no difference in function between the two.




"Combine terms with" option. Currently the query is separated into words, and some characters (punctuation, brackets, special characters) are ignored. Each word is then looked up in GeneLynx index table.

      In the case the user has chosen to combine terms (words) by "AND", only those GeneLynx records containing all words from the query are displayed in the result list.

     If the terms were combined by "OR", the list of hits contains GeneLynx records for genes that contain at least one of the terms in the query. Each of the terms is given the score value, which is inversely related to the number of GeneLynx records that share the term. For example the term "SYS_HUMAN" (a Swiss-Prot ID) is associated with only one GeneLynx record, and therefore has higher score than e.g. the term "clathrin" which in the current version of GeneLynx is associated with 21 records. The list of hits is sorted by decreasing sum of scores per record.


"Exclude low scoring hits" option. This option only affects searches performed with "Combine terms with OR" option. Some terms from the index are associated with a large number of GeneLynx entries, e.g. "protein" (9533) or "gene"(1895). Excluding low-scoring hits means excluding from the list the hits that contain only such frequent terms. For example, if the query was "divalent iron protein", at the top of the list there will be records that contain all three terms, followed by records that contain both "divalent" and "iron", then those that contain only "divalent", then those matching only "iron". If the low scoring hits were included, the list will not contain the records matching only "protein".


Advanced Search


The Advanced text search can be accessed from Text Search page. It is meant to be used in the cases when you can not formulate the query precisely enough by using Quick Search, or for searching specific numeric identifiers.


Advanced Search Form


The terms can be combined with "AND" (lists the records that contain all identifiers), "OR" (lists the records that contain either one of the given identifiers), or "BUTNOT" (lists the records that contain the first identifier, but none of the identifiers below).


NOTE TO THE USERS


The GeneLynx search systems described above are planned to evolve based on feedback provided by you. If you need a search feature that you know you or others would use often, please send your suggestions to Boris Lenhard. We feel that many of the more "intelligent" search algorithms are slow and unnecessarily fuzzy, retrieving too many irrelevant hits, so we would like our users to guide us in the right direction.


BLAST Search


GeneLynx provides a basic BLAST search utility specifically designed for retrieving GeneLynx records.




There are three ways to submit a sequence:


Paste the sequence: Paste the sequence from the clipboard. The sequence can be in raw (sequence only) or FASTA format (the ID line is simply ignored). It can also contain numbers and spaces (which are both ignored), which makes direct pasting of the sequence portions of GenBank/EMBL/Swiss-Prot records possible.


Enter valid nucleotide/protein accession number. The accession number of any sequence that can be retrieved from NCBI.


Enter the name of a sequence file. If the sequence that you want to submit is in a local file on your computer, enter its filename with the full path. The sequence can also be in raw or FASTA format. In case that the file contains multiple sequences in FASTA format, only the first one is used.


Other options available are:



Sequence type: Hopefully this is obvious. Soon the default type will be "Auto", i.e. GeneLynx will try to determine the type from the sequence itself.


E-value threshold: Basic option for regulating the threshold significance of BLAST hits. If there are users who think it could be controlled in a more adequate manner, please send your suggestions to Boris Lenhard.


The submitted sequence is searched against a set of (1) all cDNAs associated with GeneLynx records, and (2) assembled EST sequences of EST-only GeneLynx records. A typical list of hits looks as follows:




The hits are sorted by increasing E-value, with only one (best scoring) cDNA per GeneLynx record shown. The link to the NCBI record for best scoring cDNA sequence is given in the rightmost column.


The contents of a GeneLynx record


GeneLynx contains hyperlinks associated with a GeneLynx record which, as a rule, corresponds to one human gene. If you happen to find an entry that appears to violate this rule, you are strongly encouraged to submit a comment for that record.


Resource categories


The links point to different resources, which are divided into following resource categories:


Summary pages - a somewhat arbitrary category, containing links to resources that provide a summary for the gene and/or extensive set of further links. Most of these are gene-based like GeneLynx itself. One notable exception is the Swiss-Prot, which is protein-based (also a member of "Protein sequences" category, making it the only resource tat is duplicated on the page), but which contains a wealth of links to other resources.


Genomic Resources - resources that provide information on the gene in the context of its location in the human genome. It includes resources with genomic sequences, chromosome maps etc.


Transcripts - collection of resources on mRNA/cDNA sequences.


Protein sequences - major protein sequence databases.


Protein structure and domains - a collection of links on protein tertiary structure, protein domains and patterns.


Protein function and disease links - this is a category that will probably be split in two in future releases of GeneLynx. It contains link to resources on enzyme function, metabolic pathways and disease associations of a given gene product.


Homologs - this is a growing category of links to information on nonhuman genes and proteins homologous to the gene of current GeneLynx record.


ESTs -although ESTs are also transcribed sequences, their abundance and chaotic content lead us to the decision to put them in a separate group. It contains links to EST sequences and assembled EST clusters.


The categories are subject to change (as is anything else, if necessary) .


Information available for the links


The layout of a GeneLynx category is:




Since GeneLynx is envisioned as a collection of links, and not a summary of textual information about the gene, current version does not contain descriptions of the links' content. Instead, links open in separate windows so that several resource pages can be inspected simultaneously while retaining access to other links.


Note: Once GeneLynx reaches Release 1.0 it will also contain "mouse-over" descriptions of link contents (i.e. upon placing mouse pointer on e.g. a Swiss-Prot link, the contents of the Swiss-Prot record's DE field will appear in browser status line, and above the link itself in browsers supporting that feature). We would appreciate user's feedback on potential usability of that feature, since it is not too trivial to implement.


User Comment Submission


We believe that user comment submission system is one of major features that makes GeneLynx more usable than similar resources. It is simple and self-explanatory, but here is a quick tour of its features.


The comment submission form is accessed by clicking on the links that looks like this:




and is located immediately below the name and the locus of the gene. By clicking on the above link on the page of GeneLynx record #2611, you access its comment submission form:




The fields are:


Type of comment - Choose among 'Error report/correction', 'Supplementary information' or 'Other'. This field is for curator's orientation only.


Comment title - Here you state the reason for your comment.


Enter comment - Here you type the text of the actual comment. You are encouraged to be as detailed as possible here, for that will make curator's life easier. Feel free to include web addresses and literature references that support your comment, if the problem is not trivial.


Attach file(s) to the comment - along the lines of being as detailed as possible, you may want to include some files (documents, alignments, BLAST search reports etc.) that corroborate the claim you make in the comment.


E-mail , Name, Company/Institution - I do not need to explain those, do I :)


Remember, all of the form fields are there so that the curator could make corrections more quickly. In a great majority of cases the curator will be much less knowledgeable about a particular gene than the person who sets out to send a comment on it.


When you submit a comment, you will receive an E-mail message confirming it. The comment will simultaneously appear in the bottom if the Comment form page, so that other GeneLynx users will be able to read it (and all the previous comment on that record) before submitting their own. In the above example (#2611), there were no previous comments, so that after our comment submission and subsequent access to the #2611 comment page, the bottom of the page looked like this:




In the above example, two records with the same set of links have been detected, which usually means that GeneLynx clustering algorithm has made a rare error of not grouping all cDNAs of the FALZ gene into a single cluster. A GeneLynx curator will inspect the cDNAs for both GeneLynx records (#2611 and #20402), and if they really belong to the same gene, merge them into a single one. In this case, the merged record would be #2611, and #20402 would be retired. If the user accesses #20402 page later (e.g. via a bookmark made previously), she will access a page with retirement notice and a link pointing to the #2611 record.


Linking to GeneLynx - batch retrieval of GeneLynx IDs


(This service is currently experimental. We need your feedback.)



This service might be of interest to you in the following cases:


Suppose you have performed a search of a database or an experiment including large number of nucleotide or protein sequences, and obtained a list of identifiers (nucleotide accession numbers, protein sequence identifiers, HUGO numbers of genes etc.) GeneLynx enables you to paste or upload this list to the Batch GeneLynx page and obtain a list in which these identifiers are associated with appropriate GeneLynx IDs and descriptions.

You have a list of identifiers from a resource that is included in GeneLynx, and want to retrieve a list of GeneLynx numbers for linking from your resource's web pages.

In either case, all you need is a list of identifiers (nucleotide accession numbers, protein sequence identifiers etc.) that you submit by pasting or uploading it on the "Linking to GeneLynx" page. E.g. if you have a collection of Swiss-Prot IDs:




Paste the IDs (or select a plain text file containing the IDs to upload). Select output format, and the sorting criterion. (If you choose to sort by submitted identifiers, and the submitted identifiers are numbers, a numerical sort is automatically performed; otherwise the identifiers are sorted alphabetically).


The HTML output looks like this:




It is meant for browsing, and provides descriptions and links to the correspondig GeneLynx records.The HTML page can also be saved locally for later use.


The text format is a two-column, tab-delimited format that is easily parsed by computer programs or loaded into a spreadsheed or database table:


  1. Y167_HUMAN 109
  2. SMN_HUMAN 189
  3. KEAP_HUMAN 193
  4. HK32_HUMAN 307
  5. NR41_HUMAN 478
  6. CIB1_HUMAN 683
  7. KP58_HUMAN 749
  8. PTPG_HUMAN 758
  9. GCSR_HUMAN 809
  10. CCAC_HUMAN 994
  11. RTN1_HUMAN 1475
  12. MAFG_HUMAN 1522
  13. STX4_HUMAN 1811
  14. N4BM_HUMAN 2114


...



Submitting a new resource for inclusion in GeneLynx


(This service is currently experimental. We need your feedback.)


To facilitate (and encourage) addition of new database resources to the collection of GeneLynx links by their authors and curators, we have developed a (highly experimental) standardized procedure for external resource submission. Currently, a resource to be submitted must meet the following requirements:


Its individual records must be accessible on the web via a URL that includes their unique identifier.

each unique identifier should be associated with an identifier belonging to a resource that is already included GeneLynx. The associations should be listed in a two-column plain text file (tab-, space- or comma-delimited).



  1. NEW001 2049
  1. NEW002 79154
  2. NEW003 7067
  3. NEW004 64979
  4. NEW005 55438
  5. NEW006 5324
  6. NEW007 57528
  7. NEW008 55116
  8. NEW009 215
  9. NEW010 10421
  10. NEW011 29056
  11. NEW012 10072
  12. NEW013 56617
  13. NEW014 29108
  14. NEW015 7442

...

In the above case, the second column contains LocusLink IDs (the order of columns can be reversed). Now go to GeneLynx resource submission page and fill the submission form as follows:




General Information about the resource is necessary for the proper labelling and categorization of the new resource.


Resource name is the name that will appear in the resource label on a GeneLynx record page.


Resource home page is the URL of the new resource's home page, preferably with additional information about the resource.


GeneLynx Category is the label for a group of related resources (see The structure of a GeneLynx record).


The data file section is where you specify the location of the newres2loc file. You can either specify its URL on a http or anonymous ftp server, or upload it from your local file system. However, we recommend that you specify the URL, preferably with the "Check for updates" option. That the file will be reloaded for each GeneLynx update, so you can just keep the file current instead of resubmitting it to geneLynx each time it is changed.


In "The existing identifier is a" menu, choose the identifier to which your resource's identifier is matched in the file you are submitting (in the above case "LocusLink number").


The "Existing identifier is in column" should be obvious. In our example, the existing identifier is LocusLink number which is in column 2 of the newres2loc file.


Linking section is where you type the URL forming rule for retrieving records of the new resource. Type triple hash ### where the actual identifier should be inserted.


In Contact information, your E-mail is required in order to recieve the confirmation of successful submission and final inclusion in GeneLynx by the curator. It is also possible that a GeneLynx curator coctacts you for additional information.


After you submit the required information, you will come to Submitted resource test page which looks like this:




Under the table containing some simple figures about the submission (you should check them to make sure they make sense), there is the Test table with up to 10 randomly picked pairs of newly formed associations between new resource identifiers and GeneLynx records. There are three things you could check:


that the new resource identifiers link to the proper web addresses; click on them to open the corresponding web pages in a new window

that the selected new resource identifiers are associated with appropriate GeneLynx records;lick on a GeneLynx ID in the second column to open the GeneLynx record

after you open the GeneLynx record page, locate the new resource on the page. It should not be too hard - its Resource label is the only one coloured red:



Check that it is in the desired category and that the links work as they should.


If everything is in order, click "Confirm"on the Submitted resource test page, and you are done. You will receive an E-mail message confirming the succesful submission, and a GeneLynx curator will notify you of further developments.


Appendix 1


  1. Resources linked to GeneLynx
  2. Gentaur Group
  3. UniGene Home page  
  4. LocusLink  
  5. GeneCards Home page · About  
  6. Swiss-Prot  
  7. KEGG gene Home page  
  8. EGAD Home page  
  9. euGenes  
  10. MIPS Home page · About  
  11. HumanPSD Home page  
  12. Genomic resources
  13. Genomic sequences  
  14. GDB Home page  
  15. GenAtlas Home page · About  
  16. Ensembl gene Home page  
  17. Transcripts
  18. RefSeq About  
  19. cDNA sequences  
  20. Ensembl transcript Home page  
  21. Protein sequences  
  22. Swiss-Prot Home page  
  23. TrEMBL Home page  
  24. PIR Home page · About  
  25. GenPept  
  26. Protein structure and domains  
  27. PDB  
  28. Closest PDB structure A link to the most similar available PDB structure for a given Genelynx gene product, as determined by HSSP.
  29. HSSP  
  30. InterPro  
  31. PRINTS  
  32. PFAM  
  33. BLOCKS  
  34. SBASE  
  35. PROSITE  
  36. Protein function and disease links  
  37. GeneOntology (at MGI)  
  38. MEROPS proteases  
  39. ENZYME database  
  40. WIT  
  41. BRENDA  
  42. OMIM  
  43. GeneClinics  
  44. Networks and Pathways  
  45. KEGG pathway  
  46. PubGene  
  47. Homologs  
  48. Nucleotide  
  49. Protein  
  50. Unigene  
  51. LocusLink  
  52. MGD  
  53. ESTs  
  54. STACK cluster  
  55. EST sequences






Appendix 2


Identifiers searchable by Quick Search




Identifier Remark

Keyword Any word from description/definition lines of the UniGene, LocusLink, SwissProt, Trembl and HUGO records associated with GeneLynx records

Nucleotide accession number GenBank, EMBL, DDBJ and EST sequences. Human only. If the query returns no hits, try a BLAST search with that accesion number.

Protein accession number Swiss-Prot, TrEMBL, PIR and GenPept sequences.If the query returns no hits, try a BLAST search with that accesion number.

Swiss-Prot ID e.g. SYS_HUMAN

PIR ID not the sama as PIR accession number

PDB ID  

UniGene ID for Quick search, use the full identifier, e.g. Hs.4888

HUGO symbol Official human gene symbols

Other gene symbols Gene symbols other than the official (HUGO) ones, but used in literature and databases.



Appendix 3


Identifiers searchable by Advanced Search


Under construction...coming soon.


Appendix 4


Browser compatibility issues


GeneLynx has been tested on several platforms with different web browsers. While we tried to make the interface as browser-independent as possible, some browser just do not allow that. However, all the detected incompatibilities are of the cosmetic nature, making the pages less pretty than we intended it to be :) without affecting access to any of the GeneLynx capabilities. Here is a table that summarizes our experience:


Browser Version OS Known problems

MS Internet Explorer 4.0, 5.0, 5.5

MS Windows

(9x, NT,2000)


None

Netscape 6.0

Linux 2.2.17


Linux 2.4.4


None

Netscape Communicator 4.5, 4.61, 4.74

Linux 2.2.x


Tru64 Unix (Compaq)


The page layout is altered incorrectly upon window resizing


Mouse-over underlining of links does not work


There are no boxes around items in the left-side menu on all pages


Konqueror

2.1.1


2.1beta2


Linux 2.2.x


Tru64 Unix (Compaq)


Occasional minor occurrences of incorrect font attribute rendering; otherwise much better and faster than Netscape 4.X

Lynx Several

Linux 2.2.x


Tru64 Unix (Compaq)


Just for the those of Spartan spirit. Most features work, but the layout is hard to follow (No. we do not intend to work toward improving it - we strongly believe that a really negligible number of Lynx users will have ever heard of GeneLynx and vice versa.)

If you experience problems other than those listed in the table, please let us know.



Send comments and questions to Boris Lenhard