Introduction
- here we give a brief overview of GeneLynx purpose and organization.
Text search - if you want to locate a gene using a keyword or other identifier.
Quick Search - searches through all keywords and alphanumeric identifiers (including accession numbers). Simple, fast and powerful.
Advanced search - searches specified identifiers, logically combined in various ways.
NOTE TO THE USER - you can help us make it better
BLAST search - if you want to locate a gene in GeneLynx using a query nucleotide or protein sequence.
The contents of a GeneLynx record - on finding your way through the list of links for a particular gene.
User comment submission - what to do to help us improve the quality of GeneLynx data by correcting the errors you noticed and pointing to information that is missing.
Batch retrieval of GeneLynx IDs - now you can submit a list of identifiers (accession numbers, OMIM IDs, LocusLink numbers etc.) and GeneLynx will match them to the corresponding GeneLynx records. The resuld is a list in either HTML or plain text format which you can use for linking your database to GeneLynx.
Submitting a new resource for inclusion in GeneLynx - if you are in charge of a database with entries relating to human gene and would like GeneLynx to include links to it, we have tried hard to make it as easy as possible for both you and ourselves. Please read how to submit the data for your resource, and it will be included in GeneLynx in no time!
- Appendix 1: Resources linked to GeneLynx
- Appendix 2: Identifiers searchable by Quick Search
- Appendix 3: Identifiers searchable by Advanced Search
- Appendix 4: Browser compatibility issues
Introduction
GeneLynx is a service whose aim is to provide the fastest way between an IDENTIFIER for a given HUMAN GENE and all the data available for that gene on the web. It is organized as a set of HYPERLINKS leading to numerous databases and other resources. The number of currently linked resources is 32, and new ones are added regularly. For a complete list of linked resources, see Appendix 1.
An IDENTIFIER to search GeneLynx with is almost any piece of information that pertains to the gene of interest. It can be e.g.
a keyword, usually a word contained in the name or description of the gene, but also any alphanumeric identifier such as an accession number or ID (e.g. Swiss-Prot ID). The list of identifiers that can be used as keywords is given in Appendix 2. They can be used in either Quick Search or Advanced Text Search.
an identifier of the specified type, either numeric or alphanumeric, can be specified together with its type using the Advanced Text Search form. The list of identifiers that can be searched this way is given in Appendix 3.
a nucleotide or protein sequence can be used to locate a gene in GeneLynx system by searching them against cDNA and EST sequences associated with GeneLynx records. The searches are performed using BLAST 2.1.
Text search
There are two different ways of text search. In most cases the simpler Quick search will be more than enough powerful to locate a GeneLynx record for a gene of interest. Use Advanced search only in the cases where
it is not possible to formulate a query that is precise enough using Quick Search, or
you want to search GeneLynx by purely numeric identifiers, such as those of GDB, LocusLink, or OMIM, which are currently not searchable by Quick Search.
Quick Search
The Quick Search can be accessed from either GeneLynx Home Page or from Text Search page. There is no difference in function between the two.
"Combine terms with" option. Currently the query is separated into words, and some characters (punctuation, brackets, special characters) are ignored. Each word is then looked up in GeneLynx index table.
In the case the user has chosen to combine terms (words) by "AND", only those GeneLynx records containing all words from the query are displayed in the result list.
If the terms were combined by "OR", the list of hits contains GeneLynx records for genes that contain at least one of the terms in the query. Each of the terms is given the score value, which is inversely related to the number of GeneLynx records that share the term. For example the term "SYS_HUMAN" (a Swiss-Prot ID) is associated with only one GeneLynx record, and therefore has higher score than e.g. the term "clathrin" which in the current version of GeneLynx is associated with 21 records. The list of hits is sorted by decreasing sum of scores per record.
"Exclude low scoring hits" option. This option only affects searches performed with "Combine terms with OR" option. Some terms from the index are associated with a large number of GeneLynx entries, e.g. "protein" (9533) or "gene"(1895). Excluding low-scoring hits means excluding from the list the hits that contain only such frequent terms. For example, if the query was "divalent iron protein", at the top of the list there will be records that contain all three terms, followed by records that contain both "divalent" and "iron", then those that contain only "divalent", then those matching only "iron". If the low scoring hits were included, the list will not contain the records matching only "protein".
Advanced Search
The Advanced text search can be accessed from Text Search page. It is meant to be used in the cases when you can not formulate the query precisely enough by using Quick Search, or for searching specific numeric identifiers.
Advanced Search Form
The terms can be combined with "AND" (lists the records that contain all identifiers), "OR" (lists the records that contain either one of the given identifiers), or "BUTNOT" (lists the records that contain the first identifier, but none of the identifiers below).
NOTE TO THE USERS
The GeneLynx search systems described above are planned to evolve based on feedback provided by you. If you need a search feature that you know you or others would use often, please send your suggestions to Boris Lenhard. We feel that many of the more "intelligent" search algorithms are slow and unnecessarily fuzzy, retrieving too many irrelevant hits, so we would like our users to guide us in the right direction.
BLAST Search
GeneLynx provides a basic BLAST search utility specifically designed for retrieving GeneLynx records.
There are three ways to submit a sequence:
Paste the sequence: Paste the sequence from the clipboard. The sequence can be in raw (sequence only) or FASTA format (the ID line is simply ignored). It can also contain numbers and spaces (which are both ignored), which makes direct pasting of the sequence portions of GenBank/EMBL/Swiss-Prot records possible.
Enter valid nucleotide/protein accession number. The accession number of any sequence that can be retrieved from NCBI.
Enter the name of a sequence file. If the sequence that you want to submit is in a local file on your computer, enter its filename with the full path. The sequence can also be in raw or FASTA format. In case that the file contains multiple sequences in FASTA format, only the first one is used.
Other options available are:
Sequence type: Hopefully this is obvious. Soon the default type will be "Auto", i.e. GeneLynx will try to determine the type from the sequence itself.
E-value threshold: Basic option for regulating the threshold significance of BLAST hits. If there are users who think it could be controlled in a more adequate manner, please send your suggestions to Boris Lenhard.
The submitted sequence is searched against a set of (1) all cDNAs associated with GeneLynx records, and (2) assembled EST sequences of EST-only GeneLynx records. A typical list of hits looks as follows:
The hits are sorted by increasing E-value, with only one (best scoring) cDNA per GeneLynx record shown. The link to the NCBI record for best scoring cDNA sequence is given in the rightmost column.
The contents of a GeneLynx record
GeneLynx contains hyperlinks associated with a GeneLynx record which, as a rule, corresponds to one human gene. If you happen to find an entry that appears to violate this rule, you are strongly encouraged to submit a comment for that record.
Resource categories
The links point to different resources, which are divided into following resource categories:
Summary pages - a somewhat arbitrary category, containing links to resources that provide a summary for the gene and/or extensive set of further links. Most of these are gene-based like GeneLynx itself. One notable exception is the Swiss-Prot, which is protein-based (also a member of "Protein sequences" category, making it the only resource tat is duplicated on the page), but which contains a wealth of links to other resources.
Genomic Resources - resources that provide information on the gene in the context of its location in the human genome. It includes resources with genomic sequences, chromosome maps etc.
Transcripts - collection of resources on mRNA/cDNA sequences.
Protein sequences - major protein sequence databases.
Protein structure and domains - a collection of links on protein tertiary structure, protein domains and patterns.
Protein function and disease links - this is a category that will probably be split in two in future releases of GeneLynx. It contains link to resources on enzyme function, metabolic pathways and disease associations of a given gene product.
Homologs - this is a growing category of links to information on nonhuman genes and proteins homologous to the gene of current GeneLynx record.
ESTs -although ESTs are also transcribed sequences, their abundance and chaotic content lead us to the decision to put them in a separate group. It contains links to EST sequences and assembled EST clusters.
The categories are subject to change (as is anything else, if necessary) .
Information available for the links
The layout of a GeneLynx category is:
Since GeneLynx is envisioned as a collection of links, and not a summary of textual information about the gene, current version does not contain descriptions of the links' content. Instead, links open in separate windows so that several resource pages can be inspected simultaneously while retaining access to other links.
Note: Once GeneLynx reaches Release 1.0 it will also contain "mouse-over" descriptions of link contents (i.e. upon placing mouse pointer on e.g. a Swiss-Prot link, the contents of the Swiss-Prot record's DE field will appear in browser status line, and above the link itself in browsers supporting that feature). We would appreciate user's feedback on potential usability of that feature, since it is not too trivial to implement.
User Comment Submission
We believe that user comment submission system is one of major features that makes GeneLynx more usable than similar resources. It is simple and self-explanatory, but here is a quick tour of its features.
The comment submission form is accessed by clicking on the links that looks like this:
and is located immediately below the name and the locus of the gene. By clicking on the above link on the page of GeneLynx record #2611, you access its comment submission form:
The fields are:
Type of comment - Choose among 'Error report/correction', 'Supplementary information' or 'Other'. This field is for curator's orientation only.
Comment title - Here you state the reason for your comment.
Enter comment - Here you type the text of the actual comment. You are encouraged to be as detailed as possible here, for that will make curator's life easier. Feel free to include web addresses and literature references that support your comment, if the problem is not trivial.
Attach file(s) to the comment - along the lines of being as detailed as possible, you may want to include some files (documents, alignments, BLAST search reports etc.) that corroborate the claim you make in the comment.
E-mail , Name, Company/Institution - I do not need to explain those, do I :)
Remember, all of the form fields are there so that the curator could make corrections more quickly. In a great majority of cases the curator will be much less knowledgeable about a particular gene than the person who sets out to send a comment on it.
When you submit a comment, you will receive an E-mail message confirming it. The comment will simultaneously appear in the bottom if the Comment form page, so that other GeneLynx users will be able to read it (and all the previous comment on that record) before submitting their own. In the above example (#2611), there were no previous comments, so that after our comment submission and subsequent access to the #2611 comment page, the bottom of the page looked like this:
In the above example, two records with the same set of links have been detected, which usually means that GeneLynx clustering algorithm has made a rare error of not grouping all cDNAs of the FALZ gene into a single cluster. A GeneLynx curator will inspect the cDNAs for both GeneLynx records (#2611 and #20402), and if they really belong to the same gene, merge them into a single one. In this case, the merged record would be #2611, and #20402 would be retired. If the user accesses #20402 page later (e.g. via a bookmark made previously), she will access a page with retirement notice and a link pointing to the #2611 record.
Linking to GeneLynx - batch retrieval of GeneLynx IDs
(This service is currently experimental. We need your feedback.)
This service might be of interest to you in the following cases:
Suppose you have performed a search of a database or an experiment including large number of nucleotide or protein sequences, and obtained a list of identifiers (nucleotide accession numbers, protein sequence identifiers, HUGO numbers of genes etc.) GeneLynx enables you to paste or upload this list to the Batch GeneLynx page and obtain a list in which these identifiers are associated with appropriate GeneLynx IDs and descriptions.
You have a list of identifiers from a resource that is included in GeneLynx, and want to retrieve a list of GeneLynx numbers for linking from your resource's web pages.
In either case, all you need is a list of identifiers (nucleotide accession numbers, protein sequence identifiers etc.) that you submit by pasting or uploading it on the "Linking to GeneLynx" page. E.g. if you have a collection of Swiss-Prot IDs:
Paste the IDs (or select a plain text file containing the IDs to upload). Select output format, and the sorting criterion. (If you choose to sort by submitted identifiers, and the submitted identifiers are numbers, a numerical sort is automatically performed; otherwise the identifiers are sorted alphabetically).
The HTML output looks like this:
It is meant for browsing, and provides descriptions and links to the correspondig GeneLynx records.The HTML page can also be saved locally for later use.
The text format is a two-column, tab-delimited format that is easily parsed by computer programs or loaded into a spreadsheed or database table:
- Y167_HUMAN 109
- SMN_HUMAN 189
- KEAP_HUMAN 193
- HK32_HUMAN 307
- NR41_HUMAN 478
- CIB1_HUMAN 683
- KP58_HUMAN 749
- PTPG_HUMAN 758
- GCSR_HUMAN 809
- CCAC_HUMAN 994
- RTN1_HUMAN 1475
- MAFG_HUMAN 1522
- STX4_HUMAN 1811
- N4BM_HUMAN 2114
...
Submitting a new resource for inclusion in GeneLynx
(This service is currently experimental. We need your feedback.)
To facilitate (and encourage) addition of new database resources to the collection of GeneLynx links by their authors and curators, we have developed a (highly experimental) standardized procedure for external resource submission. Currently, a resource to be submitted must meet the following requirements:
Its individual records must be accessible on the web via a URL that includes their unique identifier.
each unique identifier should be associated with an identifier belonging to a resource that is already included GeneLynx. The associations should be listed in a two-column plain text file (tab-, space- or comma-delimited).
- NEW001 2049
- NEW002 79154
- NEW003 7067
- NEW004 64979
- NEW005 55438
- NEW006 5324
- NEW007 57528
- NEW008 55116
- NEW009 215
- NEW010 10421
- NEW011 29056
- NEW012 10072
- NEW013 56617
- NEW014 29108
- NEW015 7442
...
In the above case, the second column contains LocusLink IDs (the order of columns can be reversed). Now go to GeneLynx resource submission page and fill the submission form as follows:
General Information about the resource is necessary for the proper labelling and categorization of the new resource.
Resource name is the name that will appear in the resource label on a GeneLynx record page.
Resource home page is the URL of the new resource's home page, preferably with additional information about the resource.
GeneLynx Category is the label for a group of related resources (see The structure of a GeneLynx record).
The data file section is where you specify the location of the newres2loc file. You can either specify its URL on a http or anonymous ftp server, or upload it from your local file system. However, we recommend that you specify the URL, preferably with the "Check for updates" option. That the file will be reloaded for each GeneLynx update, so you can just keep the file current instead of resubmitting it to geneLynx each time it is changed.
In "The existing identifier is a" menu, choose the identifier to which your resource's identifier is matched in the file you are submitting (in the above case "LocusLink number").
The "Existing identifier is in column" should be obvious. In our example, the existing identifier is LocusLink number which is in column 2 of the newres2loc file.
Linking section is where you type the URL forming rule for retrieving records of the new resource. Type triple hash ### where the actual identifier should be inserted.
In Contact information, your E-mail is required in order to recieve the confirmation of successful submission and final inclusion in GeneLynx by the curator. It is also possible that a GeneLynx curator coctacts you for additional information.
After you submit the required information, you will come to Submitted resource test page which looks like this:
Under the table containing some simple figures about the submission (you should check them to make sure they make sense), there is the Test table with up to 10 randomly picked pairs of newly formed associations between new resource identifiers and GeneLynx records. There are three things you could check:
that the new resource identifiers link to the proper web addresses; click on them to open the corresponding web pages in a new window
that the selected new resource identifiers are associated with appropriate GeneLynx records;lick on a GeneLynx ID in the second column to open the GeneLynx record
after you open the GeneLynx record page, locate the new resource on the page. It should not be too hard - its Resource label is the only one coloured red:
Check that it is in the desired category and that the links work as they should.
If everything is in order, click "Confirm"on the Submitted resource test page, and you are done. You will receive an E-mail message confirming the succesful submission, and a GeneLynx curator will notify you of further developments.
Appendix 1
- Resources linked to GeneLynx
- Gentaur Group
- UniGene Home page
- LocusLink
- GeneCards Home page · About
- Swiss-Prot
- KEGG gene Home page
- EGAD Home page
- euGenes
- MIPS Home page · About
- HumanPSD Home page
- Genomic resources
- Genomic sequences
- GDB Home page
- GenAtlas Home page · About
- Ensembl gene Home page
- Transcripts
- RefSeq About
- cDNA sequences
- Ensembl transcript Home page
- Protein sequences
- Swiss-Prot Home page
- TrEMBL Home page
- PIR Home page · About
- GenPept
- Protein structure and domains
- PDB
- Closest PDB structure A link to the most similar available PDB structure for a given Genelynx gene product, as determined by HSSP.
- HSSP
- InterPro
- PRINTS
- PFAM
- BLOCKS
- SBASE
- PROSITE
- Protein function and disease links
- GeneOntology (at MGI)
- MEROPS proteases
- ENZYME database
- WIT
- BRENDA
- OMIM
- GeneClinics
- Networks and Pathways
- KEGG pathway
- PubGene
- Homologs
- Nucleotide
- Protein
- Unigene
- LocusLink
- MGD
- ESTs
- STACK cluster
- EST sequences
Appendix 2
Identifiers searchable by Quick Search
Identifier Remark
Keyword Any word from description/definition lines of the UniGene, LocusLink, SwissProt, Trembl and HUGO records associated with GeneLynx records
Nucleotide accession number GenBank, EMBL, DDBJ and EST sequences. Human only. If the query returns no hits, try a BLAST search with that accesion number.
Protein accession number Swiss-Prot, TrEMBL, PIR and GenPept sequences.If the query returns no hits, try a BLAST search with that accesion number.
Swiss-Prot ID e.g. SYS_HUMAN
PIR ID not the sama as PIR accession number
PDB ID
UniGene ID for Quick search, use the full identifier, e.g. Hs.4888
HUGO symbol Official human gene symbols
Other gene symbols Gene symbols other than the official (HUGO) ones, but used in literature and databases.
Appendix 3
Identifiers searchable by Advanced Search
Under construction...coming soon.
Appendix 4
Browser compatibility issues
GeneLynx has been tested on several platforms with different web browsers. While we tried to make the interface as browser-independent as possible, some browser just do not allow that. However, all the detected incompatibilities are of the cosmetic nature, making the pages less pretty than we intended it to be :) without affecting access to any of the GeneLynx capabilities. Here is a table that summarizes our experience:
Browser Version OS Known problems
MS Internet Explorer 4.0, 5.0, 5.5
MS Windows
(9x, NT,2000)
None
Netscape 6.0
Linux 2.2.17
Linux 2.4.4
None
Netscape Communicator 4.5, 4.61, 4.74
Linux 2.2.x
Tru64 Unix (Compaq)
The page layout is altered incorrectly upon window resizing
Mouse-over underlining of links does not work
There are no boxes around items in the left-side menu on all pages
Konqueror
2.1.1
2.1beta2
Linux 2.2.x
Tru64 Unix (Compaq)
Occasional minor occurrences of incorrect font attribute rendering; otherwise much better and faster than Netscape 4.X
Lynx Several
Linux 2.2.x
Tru64 Unix (Compaq)
Just for the those of Spartan spirit. Most features work, but the layout is hard to follow (No. we do not intend to work toward improving it - we strongly believe that a really negligible number of Lynx users will have ever heard of GeneLynx and vice versa.)
If you experience problems other than those listed in the table, please let us know.
Send comments and questions to Boris Lenhard