While your HDFS may eventually contain lots of different types of data, Precisely recommends that you start with only a few sources. That usually means a subset of segments in a legacy IMS database or small number of Relational database tables. Since your data may come from different platforms as well, pick just one to get started.
Since most implementations will utilize Connect CDC (SQData) change data capture to collect the data sent to HDFS, it is easy to forget that downstream Consumers may need access to data that hasn't changed in some time and therefore has never been published to HDFS. There are several methods for performing an "Initial Load" and they vary depending on the original source of data, be it hosted on the Mainframe, Linux or Windows. See the Initial Load sections of the applicable Change Data Capture reference documentation for more details. Precisely also recommends special consideration be given to the HDFS file names and Metadata associated with this historical data.