A repository stores data and metadata (information about your data). Administrators must add one or more repositories to enable Discovery Center users to sign in, load (or link to) their data, and run profiling activities. You must create a minimum of one repository. Each repository server holds multiple repositories.
You can add a new repository at any time. During the process you can also review available data connections and add a new data connection as needed.
To add a new repository to the current repository server
Do one of the following:
- On the navigation menu, click Data Access > New Repository.
- On the Home page, go to the Add panel. In the Choose a category drop-down list, select New Repository, then click the plus icon ().
- On the Find Repositories, Search Results page, click New Repository.
The New Repository window opens.
For Name, enter a unique name for the repository. Note the following:
- Use a combination of lowercase English alpha characters, numbers, and underscore characters (_) to ensure that the name is acceptable to all file systems. Spaces are not allowed.
- Symbols and non-English characters are not valid as these names are also used as directory names.
- When you enter a name, the Save button is enabled. You can accept the default values and click Save to add the repository or modify the values as outlined in the following steps.
Note: After the repository is added, you cannot rename it. -
For Language / Region, do one of the following:
- Keep the default option Original to use the non-collated sort method.
- Click the drop-down list and select the appropriate locale (your unique language or region identifier) for your collation method. To filter the drop-down list, start typing the name of the language or region you want.
Note: After the repository is added, you cannot change this selection. This prevents compatibility issues and also allows you to upgrade existing repositories without corrupting them. -
For Data Profiling Pattern, do one of the following:
- Keep the default option default.
- Click the drop-down list and select one of the following patterns.
- Patterns are
- the shape of a data value described by coded values and are useful to identify data deviations and duplications.
Pattern Code Examples default
Jane Smith would be displayed as a4_a5
06/31/2017 would be displayed as d2pd2pd4
Jane Smith would be displayed as ul3_ul4
06/31/2017 would be displayed as d2Sd2Sd4
Jane Smith would be displayed as AAAA AAAAA
06/31/2017 would be displayed and NN/NN/NNNN
See default pattern examples. Same codes are used.
Note: For more information about how to use patterns, see Data Profiling Patterns. -
For Public Cache Size, do one of the following:
- Keep the default option 256 (the recommended minimum).
- Click the drop-down list and select a cache size of 128, 512, or 1024.
The public cache size specifies the server memory allocated for the repository. This value is site-specific. In general, the larger the value, the better the drill-down performance although the fewer total repositories you can add to the server.
- Additional cache size guidelines:
- Performance will rapidly decline if the total cache size for all repositories on the server exceeds available server memory. Therefore, 256 MB is the recommended cache size for each repository on the server. If you exceed available server memory and you attempt to add a repository, a message will display that says there is not enough memory to process the command.
- When setting the cache size, note that the cache size allowed per data connection can be specified as a performance setting in the Repository Manager. This value specifies the maximum memory, in megabytes, to be used when loading data, analyzing attributes, and creating indexes.
Click Save to add the repository to the server. If the repository name you entered is already in use, the following message displays:
Name is in use by another repository. Enter a unique name
. Change the name and click Save again.A confirmation summary displays showing details of the newly added repository. - Click Edit Settings to make changes to the repository's data profiling pattern and public cache size. The Edit Repository window opens. Make changes as needed, then click Save to return to the confirmation summary.
Click Review Data Connections to see a list of the available data connections and to add a new connection as needed. The New Repository: Data Connections window opens. In the Available Connections table, note the name and type for all connections that can be used by the repository.
Do one of the following:
- To add another data connection for the repository to access, click New Data Connection. The New Data Connection window opens. For more information, see Adding New Data Connections.
- Click Done to return to the confirmation summary.
Do one of the following:
- Click New Repository to continue adding repositories.
- Click Done to close the window.