You can use ad hoc queries to perform direct SQL queries on files in data stores without having to first create a data view, allowing you to explore and understand your data before creating an Analysis. Ad hoc queries should be used only when access is infrequent, and performance is not critical.
Configure a data store with ad hoc queries
Complete the following steps to configure a data store with ad hoc queries.
- Select the Pipelines menu at the top of the page.
- Click the menu button to the right of the data store and select Edit >Edit Stage.
- Select the Details tab.
- Make sure Allow Ad Hoc Queries is checked.
- Click Save.
Requirements for ad hoc queries
- The Ad Hoc Query setting is available and selectable for file based data stores only. For internal database data stores, the setting is selected by default and disabled.
- Ad hoc queries are allowed when the Record Delimiter is "New Line" or "Carriage Return New Line".
- For Internal S3 and External Default Data Stores on the cloud version of the product, the Ad Hoc Query setting is only available for Data Stores that were created in version 3.0 of the product or later.
- For Internal HDFS, External HDFS, and External Default Data Stores on the enterprise edition of the product, the Ad Hoc query setting is only supported if Apache Hive is configured.
- For Internal S3/HDFS and External Default Data Stores, ad hoc queries are not supported if any field names in the Data Store clash with one of the following reserved field names: file-id, work-id, month, date, year, version, datastore-name, devenv-id, owning-member-id.
- External, non-default Data Stores of type HDFS or S3 will allow you to specify partition fields that will be used to parition the data when performing ad hoc queries. This parameter is available in the Data Store's Other tab.
- For External Data Stores using the Delimited layout, ad hoc queries are only supported if the record delimiter is Newline or Carriage Return Newline.
Improve ad hoc query performance
You can improve the performance of ad hoc queries by using the Cache Query Results option. When Cache Query Results is selected, duplicate queries use cached data. Cached data expires one hour after it is cached.
Depending on your data store settings, the Cache Query Results option might appear grayed out (disabled). If this is the case, the default setting cannot be changed.
By default, Cache Query Results is not selected for internal data stores if the Store Repository Type is Database. It is selected (and grayed out) for internal data stores if the Store Repository Type is S3 or HDFS. It is also selected (and grayed out) for external data stores if the Store Repository Type is S3, HDFS or Database, and for Default external data stores where the Layout Type supports ad hoc querying.
Cache Query Results is not selected (and grayed out) for all other data stores.