Data repository setup - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

The "Data Repository" allows you to manage and permission access to data on file systems through the Data360 Analyze Directory. The data that you access could be stored in the Data360 Analyze local file system, or on a shared drive.

You can assign permissions to resources such as data flows, nodes and data collections to ensure that separate teams can only access the assets that belong to them.

The data repository setup is comprised of:

  • Data Connections - Data connections are target file system locations that are accessible to the OS user on the machine that is running Data360 Analyze. Data connections provide system administrators the ability to define connections to a number of file system locations within Data360 Analyze. The target location could be either a file system location on the Data360 Analyze host machine, or a mapped drive available to the Data360 Analyze OS user on the host machine.
  • Mount Points - Mount points provide system administrators the ability to mount a specified file system location, relative to the defined target location in a data connection, to either a Data360 Analyze directory or a data collection.
  • Data Collections - Data collections provide users a view of data files and data directories at a file system location, within the Data360 Analyze Directory, defined by a combination of the mount point in use by the data collection and the data connection referenced by the mount point.
Note: If you want to fully secure access to data files from nodes within data flows, you need to setup Execution Sandboxing, see Enable execution sandboxing.

Examples

Connect to an external file system

You have a file system with this structure:
D:/ 
	|--Finance
		|--Payroll
		|--Accounts
			|--accountsJan.csv
			|--accountsFeb.xlsx

To be able to see this in Data360 Analyze an admin user completes these tasks:

  1. Create a new data connection to connect to the shared network drive D:/
  2. Create a folder under "Public Documents" called "Finance Documents".
  3. You want to grant access to the files available at the new data connection location, only to users who are working on this project. Configure permissions against the "Finance Documents" folder, allowing only a certain group of users to access anything in this folder.
  4. Configure a mount point to mount the connection D:/ with the relative path "Finance" to //Public Documents/Finance Documents
  5. In the Finance Documents folder, create a data collection called "Accounts".

In the Data360 Analyze Directory, this is configured as follows:

Public Documents#directory
	|--Finance Documents#directory <-- Mount here: connection: D:/ Relative path: /Finance
		|--Accounts#data-collection
			|--accountsJan.csv
			|--accountsFeb.xlsx

With this setup, end users who have access to the "Finance Documents" folder can access data from the system location D:/Finance. When they browse the "Accounts" data collection, these files will be listed and accessed in Data360 Analyze using Data360 Analyze paths, for example for a file D:/Finance/Accounts/accountsJan.csv the Data360 Analyze path would be: //Public Documents/Finance Documents/Accounts/accountsJan.csv. This style of reference can be used within data flows.

Access files that exist on the file system

  1. Create a data collection in the "Public Documents" folder called "Finance Documents".
  2. Configure a mount point using the D:/ data connection, with a relative path of "Finance" mounting directly into the "Finance Documents" data connection.

In the Data360 Analyze Directory, this is configured as follows:

Public Documents#directory
	|--Finance Documents#data-collection <-- Mount here: connection: D:/ Relative path: /Finance
            |--Payroll
            |--Accounts
		  |--accountsJan.csv
		  |--accountsFeb.xlsx
Note: If you create a data collection before configuring a mount point, you will see a warning that the data collection is not mounted. In this case, you need to configure the mount point after creating the data collection.

Default data repository configuration

After a fresh install of Data360 Analyze, you can upload and download data files using this default configuration:

  • A default data connection is created on the <siteDir>/shared folder.
  • Default mount points are created using the default data connection under the Users directory, with no relative path, and under the Public Documents directory //Public Documents/Public Data, with a relative path of 'Public'.

  • Default data collections are created for the Admin user profile which results in the Admin user being able to access <siteDir>/shared/admin/admin data, and for the Public Documents folder which results in users being able to access <siteDir>/shared/Public/. Each subsequent user that is created uses the same shared mount point which the initial admin user uses and has a data collection setup as part of the user creation process.

You can then create additional data connections, mount points and data collections, as needed. It is recommended that you change the default data connection location to save space, see Create a data connection.

For upgrade information, see the Data360 Analyze Server Installation Guide.