SourceForge.net Logo

Genes Database Schema


About this Document

This document was written by Sean Lamb (slambo@msn.fullfeed.com), lead developer on the Genes project (http://genes.sourceforge.net). If you have any questions about this, you can either email the author or participate in the genes-devel mailing list (list homepage is at http://lists.sourceforge.net/mailman/listinfo/genes-devel).


Database Schema Logic

There are two tables off of which the entire database revolves: Person and Source. The Person table, although it is the single most important table in the database, is also the smallest. It contains exactly one column: per_id, the person ID used by the database.

The other key table, Source, is important because, in an ideal world, every piece of information should be documented as to where the information came from. Now, in the real world, this won't happen, especially due to the tedious nature of documenting research, however, the functionality is there if the researcher wants (or feels the need) to use it.

From these two base tables, the database takes steroids and breaks out into no less than 24 distinct tables, which allows the greatest amount of flexibility for the researcher.

About the Person Records - Names

Everything about a person and everything that a person experiences can be broken down into one of four categories: Names, Attributes, Events and Families. The contents of the Name table should be fairly self-explanatory: Prefix, Given Name (including any middle names), Surname, Postfix, Notes, Surety, Primary flag and Effective Date (the four fields Calendar, Year, Month and Day make up the Effective Date, following the logic in the GenDate class). While the first five fields should be obvious to the researcher, the Surety and Primary fields may require some explanation.

The Surety field in the Name table, like in all the other tables that have a Surety field, represents the level of surety that this record contains the actual fact. This level of surety can be on any integer scale (i.e. -3 to +3, 0 to 10, etc.) with one end of the spectrum designated as most sure (definite fact) and the other end as least sure (wild speculation, and probably false). So, if the user chooses a range of -3 to +3 with +3 representing fact and -3 representing fiction, a user would put +3 for resources of irrefutable surety (i.e. the researcher is entering his or her own name). Family legends might start at the other end of the spectrum until proven by credible and reliable resources.

The Primary field in the Name table is also an idea that flows through to other tables. While a person can have many names throughout a lifetime, but the name marked as Primary is the name that will appear in spaces where only one name will fit (i.e. on a Pedigree Chart). Looking at recent history makes it apparent why we need to keep names in a separate table, rather than as a field in the Person table. For example, a woman born as Anne Smith in 1952 may have changed her name during the "Summer of Love" to something like Butterfly Moon. She may have reared a child under this name as well, which means that Butterfly Moon would appear on the child's birth certificate as the mother's name. Later, when she married in 1983, she may have changed her name again to take her new husband's last name, perhaps becoming Butterfly Jacobsen. So in this example, we have a person who is listed on legal documents under three names, which means that the researcher needs to keep track of three names. However, one of these names needs to be designated as the primary name. In most cases, this will be the person's birth name, but the researcher may decide to use any of the names that have been discovered as the primary. It's all a matter of how the researcher wants the person listed on the reports.

Since each name can be referenced in many resources, we include a table to store this information as well in the Name-source table. Its behavior is the same as that of the Event-source table explained below.

About the Person Records - Events

Everything that happens to a person, a birth, a marriage, a school graduation, a census count, is an event. But, you say, a person can only be born once, so why not store that information in the Person table with the ID? Well, genealogy research is just that - research. When we look back far enough, we find people where nobody really knows exactly when a certain person may have been born, or if that person was born at all. We may find six different documents that relate six different birth dates for the same person. If we kept only one of those birthdates in the Person table, we may make the mistake of choosing the wrong date to store. Sometimes such discrepancies are found as little as 60 years ago, which means that every genealogist is likely to find conflicting data.

Putting the events in a separate table, we enable the researcher to track each of these different events so that the primary event (much like the primary name) can be easily switched to another record of the same type. Events in the Event table are categorized by an event type, which defines the text that will appear both on the screen and in reports as well as the number of main participants in the event - a graduation, for example, will have one main participant, while a marriage will have two main participants.

A birth, although it really involves at least three people, is categorized here as having only one main participant, the person who is being born. The reasoning here is that as we research, we will most often find all of the information for any particular person in pieces. We will find a person's name and gender in one resource, birth date in a second resource, a list of siblings in a third, and maybe a census record in a fourth. We may never find the person's parents in our research. Furthermore, when we look at the way that we associate birth dates with people, we don't usually write that two people had a child on such-and-such a date; we usually write that so-and-so was born on a certain date. Then, after writing this, we usually write the names of the person's parents in another location entirely. The parental relationships in this database are tracked in the Family and Child tables.

Additionally, every event can be witnessed by one or more other people in our database. For example, at a formal wedding ceremony, both the bride's and groom's families normally attend. Having this information in the database can help us track where a person was and when a person lived (or died) more efficiently. If we know that cousin Joe attended Sarah's wedding, we know that he didn't die until after that date, and was born before that date. This information is stored in the Witness table. This table contains merely the Event ID and the Person ID of the witness, because all of the person's information is stored elsewhere while all of the event's information is stored in the Event table. However, we don't establish a Witness-source table because we use the Event-source table to link the event to the resource where we found the information.

Since every event in a person's life can be referenced in a myriad of resources, we have a table that links events to resources - Event-source. The general information about the resource is stored in the Source table, and the Event-source table includes the foreign keys from both Event and Source to create the link between the two objects. Also, in Event-source, is any information that is specific to where in the resource the information can be obtained (i.e. page numbers, the surety level for this event, any additional detail like the film reel or volume number, and freeform notes).

About the Person Records - Attributes

Often, in our research, we will find information that is extremely helpful but that can't really be thought of as an event. For example, we may find a resource that states that a great-great- aunt died unmarried and without issue (without parenting a child). If we find this information in a reliable resource, we can avoid spending valuable time looking for people who never existed in the first place.

We might also find, in another example, that great-grandfather was enlisted in the Army. If we don't have the muster rolls or know of any battles in which he participated, then we don't have an event we can use to track this military information. We may find it helpful the next time we go to the local library to have a list of people in our database who did serve in one of the armed forces so that we then have the ability to find the muster rolls and battles.

It is information like this that the Attributes are designed for. Each attribute in the database has a type and a value. The type identifies what information is being stored for the attribute while the value is the data itself. To continue with the two examples above, the woman who died unmarried and without issue might have a NeverMarried attribute set to Y and a WithoutIssue attribute also set to Y. The great-grandfather who served in the army might have a Military attribute set to A to denote Army service.

Attributes, like events can be referenced in a multitude of resources, so we include the Attribute-source table to record this information. This table behaves the same way as the Event-source table explained above.

About the Person Records - Families

One of the foundations of genealogy research is the Family Group Sheet. So, it only follows that one of the most important aspect of this research is establishing the familial relationships. We track these relationships in the database with the Person, Family and Child tables. The Family table is fairly straightforward, including fields for foreign keys from Person that indicate the Mother and Father, as well as any notes about the family. The marriage and/or separation information is already stored in the Event table, so we merely reference that table with the Mother and Father person IDs to find any marriage and separation events to display on screens and reports.

Since every family has anywhere from zero to 15 or more children (although this may seem a bit extreme by today's standards...), not counting adoptions or foster children, we list the child relationships in the Child table. The fields here are also rather simple, foreign keys for each of the Family ID and Person ID, a freeform note field, and a character to identify the type of relationship to this family (i.e. S or D for Son or Daughter, A for Adopted, T for Step, etc.).

Since individual resources can have information on families and children, there is a table that links the Child records to resources (Child-source) and a table that links the Family records to resources (Family-source). These two tables behave just like the Event-source table mentioned above.

About the Resources - Sources

The Source table includes general information about a resource that will aid a researcher in finding that resource again in the future. It is a fairly simple matter of tracking the information that would be included in any typical bibliography: resource name, author, publisher, copyright date and publishing location. We also make spaces for the researcher to include additional information about the resource, including: ISBN, repository and/or storage location (place), library catalog number, the location within the repository where the source can be found (i.e. building and floor numbers), a description of the resource and a list of surnames that are included in the resource.

Often, in doing our research, we will find information that is published by someone who is in our database. For this case, we also include the Person ID field in the Source table. By storing the Person ID of the author, if it's available, we save memory and time because we don't have to retype the name information here.

About the Resources - Repositories

One important aspect of databases is that they allow the researcher to track information without retyping it all whenever a report is needed. Since each resource is stored in a repository somewhere, be it a public library, Family History Center or a private collection, we need to know where a resource is if we are ever going to reference it again. The Repository table is basically the "yellow pages" (or "business pages" if you prefer) of the telephone directory. The Repository table includes the name, notes and an identifier for the Place, which is described below.

About the Locations - Places, Cities, Counties, States, Countries and Continents

Since a location for an event can also be the location for a private resource collection, and also to avoid having to retype the same location over and over again (which increases the possibility of errors) while entering data, location information is stored in a separate table. The Place table structure should be familiar to anyone who has filled out address information on a form before. There are no foreign keys, only the primary key of the Place ID.

Working our way into more general terms about places, every place is, usually, part of a city or town somewhere. Generalizing the locations further, every place is within a county or parish, every county is within a state or province, every state is within a country and every country is within a continent. Since political boundaries change throughout time (as in the reunification of Germany, or the breakup of the Soviet Union), we allow the user to enter the place information exactly as it is written in the resource, but we save the names in these other tables so the user doesn't have to always retype the same names over and over again. While the user is entering data, we programmatically examine what has been entered into a field, and compare it with the keys in the appropriate region table, then prompt the user if this is the correct location.

Note: since not everyone will want these prompts popping up all the time, we provide a global option for the user whether or not to lookup this information as it is being typed. Regardless of the option that is chosen, the user can type in the full location as it appears in the resource.

About the Database Information - Researchers and Research Logs

When we print reports from the database, we are often preparing information to send to other researchers. The report will, quite naturally, include information about the people in the database, but should also have the option of including the researcher's information automatically so that the user doesn't have to add it later. We can store the data compiler's location in a configuration file, but it becomes more handy to integrate a whole list of researchers into the database when we think of keeping track of all the names of people with whom we've corresponded.

The Researcher table in this database can be thought of as a miniature phone book. The database compiler's information will be stored as researcher ID number 1, while other researchers' information will be stored as other records in the table. In addition to all the usual fields in a contact list, we also include a field for the surname list that the researcher is interested in. This way, we can easily pick out the right person to correspond with for more research.

In doing our research, we don't want to waste time studying the same resources more than a couple times for specific information, so we include a research log. The Research-log table is mainly made up of foreign keys from each of the other types of objects that describe the people in the database: Person, Name, Attribute, Event and Family; it also includes foreign keys for the resources and locations where the research has taken place: Source, Researcher, Repository and Place.

Note that for any record in the Research Log, each of these foreign keys is optional. The research that we do takes place on specific dates, so we record the date here as well, just like a date in the rest of the database. We also include notes and detail fields for the user to fill in information about what has been researched and what still needs to be examined.

Possibilities for Expansion

There are a ouple of places in this database schema where data could still be split out into additional tables if necessary. For example, the user's preferences can be stored in a table that contains exactly one record, like in some other database applications. In this way, we avoid creating global variables to hold information and just look it up when we need it.

Resources can be stored in multiple locations as well. We can create a table (maybe called Source-repository) that links the resources to the repositories. However, it wasn't done here because typically when we find a resource, we always go back to the same place to examine that resource. Right now, if we find the same resource in multiple locations, creating a second or third (or however many) Source record will be the way to store this information. Besides, most of the researchers that I've talked to are not researching professionally and don't need quite as extensive a level of detail as this.


Table and Field List

Attribute
Field NameField TypeKeys
atr_aty_idlong integerforeign key from Attribute-type
atr_idlong integerprimary key
atr_per_idlong integerforeign key from Person
atr_primarybit(1)
atr_suretyinteger
atr_valuecharacter(1)
Attribute-source
Field NameField TypeKeys
atr-sou_atr_idlong integerprimary key (foreign key from Attribute)
atr-sou_detailmemo
atr-sou_notememo
atr-sou_pagelistmemo
atr-sou_sou_idlong integerprimary key (foreign key from Source)
atr-sou_suretyinteger
Attribute-type
Field NameField TypeKeys
aty_idlong integerprimary key
aty_textcharacter(25)
aty_validvaluescharacter(50)
Child
Field NameField TypeKeys
chi_fam_idlong integerforeign key from Family
chi_idlong integerprimary key
chi_notememo
chi_per_idlong integerforeign key from Person
chi_relationshipcharacter(1)
City
Field NameField TypeKeys
cit_abbr5character(5)
cit_countrymemo
cit_countymemo
cit_idlong integerprimary key
cit_namememo
cit_statememo
Continent
Field NameField TypeKeys
cnt_abbr3character(3)
cnt_idlong integerprimary key
cnt_namecharacter(10)
Country
Field NameField TypeKeys
ctr_abbr2character(2)
ctr_abbr3character(3)
ctr_idlong integerprimary key
ctr_namememo
County
Field NameField TypeKeys
cty_abbrcharacter(5)
cty_countrymemo
cty_idlong integerprimary key
cty_namememo
cty_statememo
Event
Field NameField TypeKeys
evt_calendarinteger
evt_dayinteger
evt_detailmemo
evt_idlong integerprimary key
evt_monthinteger
evt_per_id1long integerforeign key from Person
evt_per_id2long integerforeign key from Person
evt_plc_idlong integerforeign key from Place
evt_primarybit(1)
evt_suretyinteger
evt_ety_idlong integerforeign key from Event-type
evt_yearlong integer
Event-source
Field NameField TypeKeys
evt-sou_detailmemo
evt-sou_evt_idlong integerprimary key (foreign key from Event)
evt-sou_notememo
evt-sou_pagelistmemo
evt-sou_sou_idlong integerprimary key (foreign key from Source)
evt-sou_suretyinteger
Event-type
Field NameField TypeKeys
ety_abbrcharacter(1)
ety_idlong integerprimary key
ety_numpersonbit(1)
ety_sentencememo
ety_shorttextcharacter(5)
ety_textcharacter(25)
Family
Field NameField TypeKeys
fam_father_per_idlong integerforeign key from Person
fam_idlong integerprimary key
fam_mother_per_idlong integerforeign key from Person
fam_notememo
Family-source
Field NameField TypeKeys
fam-sou_detailmemo
fam-sou_fam_idlong integerprimary key (foreign key from Family)
fam-sou_notememo
fam-sou_pagelistmemo
fam-sou_sou_idlong integerprimary key (foreign key from Source)
fam-sou_suretyinteger
Name
Field NameField TypeKeys
nam_calendarinteger
nam_dayinteger
nam_givenmemo
nam_idlong integerprimary key
nam_monthinteger
nam_notememo
nam_per_idlong integerforeign key from Person
nam_postfixmemo
nam_prefixmemo
nam_primarybit(1)
nam_suretyinteger
nam_surnamememo
nam_yearlong integer
Name-source
Field NameField TypeKeys
nam-sou_detailmemo
nam-sou_nam_idlong integerprimary key (foreign key from Name)
nam-sou_notememo
nam-sou_pagelistmemo
nam-sou_sou_idlong integerprimary key (foreign key from Source)
nam-sou_suretyinteger
Person
Field NameField TypeKeys
per_idlong integerprimary key
Place
Field NameField TypeKeys
plc_address1memo
plc_address2memo
plc_citymemo
plc_countrymemo
plc_countymemo
plc_descriptionmemo
plc_idlong integerprimary key
plc_namememo
plc_notesmemo
plc_statememo
Repository
Field NameField TypeKeys
rep_idlong integerprimary key
rep_namememo
rep_notesmemo
rep_plc_idlong integerforeign key from Place
Researcher
Field NameField TypeKeys
res_emailmemo
res_faxmemo
res_idlong integerprimary key
res_namememo
res_notememo
res_per_idlong integerforeign key from Person
res_phonememo
res_plc_idlong integerforeign key from Place
res_preferredcontactcharacter(1)
res_primarybit(1)
res_surnamelistmemo
Research-log
Field NameField TypeKeys
rlg_atr_idlong integerforeign key from Attribute
rlg_calendarinteger
rlg_dayinteger
rlg_detailmemo
rlg_evt_idlong integerforeign key from Event
rlg_fam_idlong integerforeign key from Family
rlg_idlong integerprimary key
rlg_monthinteger
rlg_nam_idlong integerforeign key from Name
rlg_notememo
rlg_per_idlong integerforeign key from Person
rlg_plc_idlong integerforeign key from Place
rlg_rep_idlong integerforeign key from Repository
rlg_res_idlong integerforeign key from Researcher
rlg_sou_idlong integerforeign key from Source
rlg_yearlong integer
Source
Field NameField TypeKeys
sou_authormemo
sou_author_per_idlong integerforeign key from Person
sou_catalognumbermemo
sou_copyrightmemo
sou_descriptionmemo
sou_idlong integerprimary key
sou_isbncharacter(13)
sou_namememo
sou_plc_idlong integerforeign key from Place
sou_publishermemo
sou_publishlocationmemo
sou_rep_idlong integerforeign key from Repository
sou_replocationmemo
sou_surnamelistmemo
State
Field NameField TypeKeys
stt_abbrcharacter(5)
stt_abbr2character(2)
stt_countrymemo
stt_idlong integerprimary key
stt_namememo

Entity Relationship Diagrams

Note: these diagrams are split into many pieces merely for readability. In the databsae, there are no such splits.


             +--------+ 1             M +------+
             | Person |-----------------| Name |
             +--------+                 +------+
               |1 |1 |1
               |  |  |                M +-------+ M       1 +------------+
               |  |  +------------------| Event |-----------| Event-type |
               |  |                     +-------+           +------------+
               |  |    M +---------+ M   |1
               |  +------| Witness |-----+
               |         +---------+
               |                      M +-----------+ M   1 +----------------+
               +------------------------| Attribute |-------| Attribute-type |
                                        +-----------+       +----------------+




            +--------+ 1   (Parent)   M +--------+
            | Person |------------------| Family |
            +--------+                  +--------+
                    |1                   |1
                    |     M +-------+ M  |
                    +-------| Child |----+
                            +-------+



            +------+ 1         M +-------------+ M      1 +--------+
            | Name |-------------| Name-source |----------| Source |
            +------+             +-------------+          +--------+
                                                           |1 |1 |1
            +-------+ 1        M +--------------+ M        |  |  |
            | Event |------------| Event-source |----------+  |  |
            +-------+            +--------------+             |  |
                                                              |  |
            +-----------+ 1    M +------------------+ M       |  |
            | Attribute |--------| Attribute-source |---------+  |
            +-----------+        +------------------+            |
                                                                 |
            +--------+ 1       M +---------------+ M             |
            | Family |-----------| Family-source |---------------+
            +--------+           +---------------+




            +--------+ M       1 +------------+ M      1 +-------+
            | Source |-----------| Repository |----------| Place |
            +--------+           +------------+          +-------+
                                                          |1 |1
            +-------+ M                                   |  |
            | Event |-------------------------------------+  |
            +-------+                                        |
                                                             |
            +--------------+ M                               |
            | Research-log |---------------------------------+
            +--------------+




            +------------+ 1                       M +--------------+
            | Researcher |---------------------------| Research-log |
            +------------+                           +--------------+
               |M                                     |M |M |M |M |M
               |1                                     |  |  |  |  |
            +--------+ 1                              |  |  |  |  |
            | Person |--------------------------------+  |  |  |  |
            +--------+                                   |  |  |  |
                                                         |  |  |  |
            +-----------+ 1                              |  |  |  |
            | Attribute |--------------------------------+  |  |  |
            +-----------+                                   |  |  |
                                                            |  |  |
            +------+ 1                                      |  |  |
            | Name |----------------------------------------+  |  |
            +------+                                           |  |
                                                               |  |
            +-------+ 1                                        |  |
            | Event |------------------------------------------+  |
            +-------+                                             |
                                                                  |
            +--------+ 1                                          |
            | Family |--------------------------------------------+
            +--------+



            +-------+ 1                           M +--------------+
            | Place |-------------------------------| Research-log |
            +-------+                               +--------------+
                                                     |M |M |M
            +------------+ 1                         |  |  |
            | Repository |---------------------------+  |  |
            +------------+                              |  |
                                                        |  |
            +------------+ 1                            |  |
            | Researcher |------------------------------+  |
            +------------+                                 |
                                                           |
            +--------+ 1                                   |
            | Source |-------------------------------------+
            +--------+



            +------+
            | City |        ---\
            +------+            \
                                 \
            +--------+            \
            | County |      -------\ 
            +--------+              \
                                     \
            +-------+                 \
            | State |       ----------->   These tables are not relationally linked to others or
            +-------+                 /    to each other due to the nature of the research.  We
                                     /     won't necessarily have all of the location information
            +---------+             /      at once, and political boundaries change names and
            | Country |     -------/       locations over time.  They are included to save typing
            +---------+           /        time and prevent errors.
                                 /
            +-----------+       /
            | Continent |   ---/
            +-----------+