Using Tableau Prep

broken image


  1. Using Tableau Prep
  2. Tableau Prep Tutorial
  3. Tableau Prep Help
  4. Using Tableau Prep Builder
  5. Tableau Prep Videos
  6. Download Tableau Prep

Working in Tableau Prep Open Tableau Prep and use two Input steps to bring in the Timesheet and Calendar Scaffold data sources. Next, use a Join step to join the Timesheet data source to the calendar scaffold on Date. This join will bring the Pay Period numbers into the Timesheet data.

  1. Why and how do you use Prep to solve data prep? Join this introductory class to find out how to clean, shape and combine data the 'Tableau Prep way'.
  2. Tableau Prep needs two functions: the first one will contain the actual R code you want to use on your data; the second one is giving Tableau Prep the information on how to return the data. The second function is used to define a certain output schema.

Note: Starting in version 2020.4, you can now create and edit flows in Tableau Server and Tableau Online. The content in this topic applies to all platforms, unless specifically noted. For more information about authoring flows on the web, see Tableau Prep on the Web.

At any point in your flow you can manually save your work, or let Tableau automatically do it for you when creating or editing flows on the web. When working with flows on the web, there are a few differences.

Tableau Prep BuilderTableau Prep on the web
  • View a preview of the data in your flow in Tableau Desktop.
  • Include direct file connections in your flow input or package your files and publish the packaged flow to your server.
  • Output your flow to a file, published data source, or to a database (version 2020.3.1 and later).
  • Create and edit flows on the web.
  • Upload files for your flow inputs and connect to a variety of data sources.
  • Output your flow to a published data source or to a database.

To keep data fresh you can manually run your flows from Tableau Prep Builder or from the command line. You can also run flows published on Tableau Server or Tableau Online manually or on a schedule. For more information about running flows, see Run your flow.

Save a flow

In Tableau Prep Builder, you can manually save your flow to back up your work before performing any additional operations. Your flow is saved in the Tableau Prep flow (.tfl) file format.

You can also package your local files (Excel, Text Files, and Tableau extracts) with your flow to share with others, just like packaging a workbook for sharing in Tableau Desktop. Only local files can be packaged with a flow. Data from database connections, for example, aren't included.

In web authoring, local files are automatically packaged with our flow. Direct file connections aren't yet supported.

When you save a packaged flow, the flow is saved as a Packaged Tableau Flow File (.tflx).

  • To manually save your flow, from the top menu, select File > Save.

  • In Tableau Prep Builder, to package your data files with your flow, from the top menu, do one of the following:

    • Select File > Export Packaged Flow

    • Select File > Save As. Then in the Save As dialog, select Packaged Tableau Flow Files from the Save as type drop down menu.

Automatically save your flows on the web

Supported in Tableau Server version 2020.4 and later.

If you create or edit flows on the web, as soon as you make a change to the flow (connect to a data source, add a step, and so on) your work is automatically saved every few seconds as a draft so you won't lose your work.

You can only save flows to the server you are currently signed into. You can't create a draft flow on one server and try and save or publish it to another server. If you want to publish the flow to a different project on the server, use the File > Publish As menu option, then select your project from the dialog.

Draft flows can only be seen by you until you publish them and make them available to anyone who has permissions to access the project on your server. Flows in a draft status are tagged with a Draft badge so you can easily find your flows that are in progress. If the flow has never been published, a Never Published badge is shown next to the Draft badge.

Using Tableau Prep

After a flow is published and you edit and republish the flow, a new version is created. You can see a list of flow versions in the Revision History dialog. From the Explore page, click the Actions menu and select Revision History.

For more information about managing revision history, see Work with Content Revisions(Link opens in a new window) in the ableau Desktop help.

Note: Autosave is enabled by default. It is possible, but not recommended, for administrators to disable autosave on a site. To turn off autosave, use the Tableau Server REST API method 'Update Site' and set the flowAutoSaveEnabled attribute to false. For more information, see Tableau Server REST API Site Methods: Update Site(Link opens in a new window).

Automatic file recovery

Supported in Tableau Prep Builder version 2020.3.3 and later.

By default, Tableau Prep Builder automatically saves a draft of any open flows if the application freezes or crashes. Draft flows are saved in your Recovered Flows folder in your My Tableau Prep repository. The next time you open the application, a dialog is shown with a list of the recovered flows to select from. You can open a recovered flow and continue where you left off, or delete the recovered flow file if you don't need it.

Note: If you have recovered flows in your Recovered Flows folder, this dialog shows every time you open the application until that folder is empty.

If you don't want this feature enabled, as an Administrator, you can turn it off during install or after install. For more information about how to turn off this feature, see Turn off file recovery(Link opens in a new window) in the Tableau Desktop and Tableau Prep Deployment Guide.

View flow output in Tableau Desktop

Note: This option is not available on the web.

Using Tableau Prep

Sometimes when you're cleaning your data you might want to check your progress by looking at it in Tableau Desktop. When your flow opens in Tableau Desktop, Tableau Prep Builder creates a permanent Tableau .hyper file and a Tableau data source (.tds) file. These files are saved in your Tableau repository in the Datasources file so you can experiment with your data at any time.

Tableau Prep Tutorial

When you open the flow in Tableau Desktop, you can see the data sample that you are working with in your flow with the operations applied to it, up to the step that you selected.

Note: While you can experiment with your data, Tableau only shows you a sample of your data and you won't be able to save the workbook as a packaged workbook (.twbx). When you are ready to work with your data in Tableau, create an output step in your flow and save the output to a file or as a published data source, then connect to the full data source in Tableau.

To view your data sample in Tableau Desktop do the following:

  1. Right-click the step where you want to view your data, and select Preview in Tableau Desktop from the context menu.

  2. Tableau Desktop opens on the Sheet tab.

Create data extract files and published data sources

Important: Starting in Tableau Prep Builder version 2020.3.1, Tableau Data Extract (.tde) files are no longer supported for flow output. Any flows published to a server version 2020.3 and later that output to this file type must be converted to output to the Hyper Extract (.hyper) file type. Otherwise the flow will fail to run. If the flow is published to Tableau Server or Tableau Online, download the flow, change the output type and republish the flow to avoid flow run errors.

To create your flow output, run your flow. When you run your flow, your changes are applied to your entire data set. Running the flow results in a Tableau Data Source (.tds) and a Tableau Data Extract (.hyper) file.

Note: You can publish data extracts or published data sources to Tableau Server version 10.0 and later as well as to Tableau Online.

Tableau Prep Builder

Tableau prep software

You can create an extract file from your flow output to use in Tableau Desktop or to share your data with third parties. Create an extract file in the following formats:

  • Hyper Extract (.hyper): This is the latest Tableau extract file type and can only be consumed by Tableau Desktop or Tableau Server version 10.5 and later.

  • Comma Separated Value (.csv): Save the extract to a .csv file to share your data with third parties. The encoding of exported CSV file will be UTF-8 with BOM.

  • Microsoft Excel (.xlsx): Starting in version 2021.1.2, you can output our flow data to a Microsoft Excel spreadsheet. Legacy Microsoft Excel .xls file types are not supported.

Tableau Prep Builder and on the web

You can publish your output as a published data source to Tableau Server or Tableau Online to share your data and provide centralized access to the data you have cleaned, shaped, and combined.

You can also save your flow output to a database to create, replace, or append the table data with your clean, prepared flow data. For more information, see Save flow output data to external databases.

You can also run your flow using incremental refresh. This option saves time and resources and enables you to refresh only new data instead of your full data set. For information about how to configure and run your flow using incremental refresh, see Refresh Flow Data Using Incremental Refresh .

Tableau data prep

Note: To publish Tableau Prep Builder output to Tableau Server, the Tableau Server REST API must be enabled. For more information see Rest API Requirements(Link opens in a new window) in the Tableau Rest API Help. To publish to a server that uses Secure Socket Layer (SSL) encryption certificates, additional configuration steps are needed on the machine running Tableau Prep Builder. For more information, see the Before you Install(Link opens in a new window) in the Tableau Desktop and Tableau Prep Builder Deployment Guide.

Create an extract to a file

Software

Sometimes when you're cleaning your data you might want to check your progress by looking at it in Tableau Desktop. When your flow opens in Tableau Desktop, Tableau Prep Builder creates a permanent Tableau .hyper file and a Tableau data source (.tds) file. These files are saved in your Tableau repository in the Datasources file so you can experiment with your data at any time.

Tableau Prep Tutorial

When you open the flow in Tableau Desktop, you can see the data sample that you are working with in your flow with the operations applied to it, up to the step that you selected.

Note: While you can experiment with your data, Tableau only shows you a sample of your data and you won't be able to save the workbook as a packaged workbook (.twbx). When you are ready to work with your data in Tableau, create an output step in your flow and save the output to a file or as a published data source, then connect to the full data source in Tableau.

To view your data sample in Tableau Desktop do the following:

  1. Right-click the step where you want to view your data, and select Preview in Tableau Desktop from the context menu.

  2. Tableau Desktop opens on the Sheet tab.

Create data extract files and published data sources

Important: Starting in Tableau Prep Builder version 2020.3.1, Tableau Data Extract (.tde) files are no longer supported for flow output. Any flows published to a server version 2020.3 and later that output to this file type must be converted to output to the Hyper Extract (.hyper) file type. Otherwise the flow will fail to run. If the flow is published to Tableau Server or Tableau Online, download the flow, change the output type and republish the flow to avoid flow run errors.

To create your flow output, run your flow. When you run your flow, your changes are applied to your entire data set. Running the flow results in a Tableau Data Source (.tds) and a Tableau Data Extract (.hyper) file.

Note: You can publish data extracts or published data sources to Tableau Server version 10.0 and later as well as to Tableau Online.

Tableau Prep Builder

You can create an extract file from your flow output to use in Tableau Desktop or to share your data with third parties. Create an extract file in the following formats:

  • Hyper Extract (.hyper): This is the latest Tableau extract file type and can only be consumed by Tableau Desktop or Tableau Server version 10.5 and later.

  • Comma Separated Value (.csv): Save the extract to a .csv file to share your data with third parties. The encoding of exported CSV file will be UTF-8 with BOM.

  • Microsoft Excel (.xlsx): Starting in version 2021.1.2, you can output our flow data to a Microsoft Excel spreadsheet. Legacy Microsoft Excel .xls file types are not supported.

Tableau Prep Builder and on the web

You can publish your output as a published data source to Tableau Server or Tableau Online to share your data and provide centralized access to the data you have cleaned, shaped, and combined.

You can also save your flow output to a database to create, replace, or append the table data with your clean, prepared flow data. For more information, see Save flow output data to external databases.

You can also run your flow using incremental refresh. This option saves time and resources and enables you to refresh only new data instead of your full data set. For information about how to configure and run your flow using incremental refresh, see Refresh Flow Data Using Incremental Refresh .

Note: To publish Tableau Prep Builder output to Tableau Server, the Tableau Server REST API must be enabled. For more information see Rest API Requirements(Link opens in a new window) in the Tableau Rest API Help. To publish to a server that uses Secure Socket Layer (SSL) encryption certificates, additional configuration steps are needed on the machine running Tableau Prep Builder. For more information, see the Before you Install(Link opens in a new window) in the Tableau Desktop and Tableau Prep Builder Deployment Guide.

Create an extract to a file

Note: This output option is not available when creating or editing flows on the web.

  1. Click the plus icon on a step and select Add Output.

    If you have run the flow before, click the run flow button on the Output step. This runs the flow and updates your output.

    The Output pane opens and shows you a snapshot of your data.

  2. Ipad air 2 ios 14. In the left pane select File from the Save output to drop-down list. In prior versions, select Save to file.

  3. Click the Browse button, then in the Save Extract As dialog, enter a name for the file and click Accept.

  4. In the Output type field, select from the following output types:

    • Tableau Data Extract (.hyper)

    • Comma Separated Values (.csv)

  5. (Tableau Prep Builder version 2020.2.1 and later) In the Write Options section, view the default write option to write the new data to your files and make any changes as needed. For more information, see Configure write options.

    • Create table: This option creates a new table or replaces the existing table with the new output.

    • Append to table: This option adds the new data to your existing table. If the table doesn't already exist, a new table is created and subsequent runs will add new rows to this table.

      Note: Append to table isn't supported for .csv output types. For more information about supported refresh combinations, see Flow refresh options.

  6. Click Run Flow to run the flow and generate the extract file.

Create an extract to a Microsoft Excel Worksheet

Supported in Tableau Prep Builder version 2021.1.2 and later. This output option is not currently supported on Tableau Server or Tableau Online.

Tableau Prep Help

When you output flow data to a Microsoft Excel worksheet you can create a new worksheet or append or replace data in an existing worksheet. The following conditions apply:

  • Only Microsoft Excel .xlsx file formats are supported.
  • The worksheet rows begin at cell A1.
  • When appending or replacing data, the first row is assumed to be headers.
  • Header names are added when creating a new worksheet, but not when adding data to an existing worksheet.
  • Any formatting or formulas in existing worksheets aren't applied to the flow output.
  • Writing to named tables or ranges is not currently supported.

Output flow data to a Microsoft Excel worksheet file

  1. Click the plus icon on a step and select Add Output.

    If you have run the flow before, click the run flow button on the Output step. This runs the flow and updates your output.

    The Output pane opens and shows you a snapshot of your data.

  2. In the left pane select File from the Save output to drop-down list.

  3. Click the Browse button, then in the Save Extract As dialog, enter or select the file name and click Accept.

  4. In the Output type field, select Microsoft Excel (.xlsx).

  5. In the Worksheet field, select the worksheet you want to write your results to, or enter a new name in the field instead, then click on Create new table.
  6. In the Write Options section, select one of the following write options:

    • Create table: Creates or re-creates (if the file already exists) the worksheet with your flow data.

    • Append to table: Adds new rows to an existing worksheet. If the worksheet doesn't exist, one is created and subsequent flow runs add rows to that worksheet.

    • Replace data: Replaces all of the existing data except the first row in an existing worksheet with the flow data.

      A field comparison shows you the fields in your flow that match the fields in your worksheet, if it already exists. If the worksheet is new, then a one-to-one field match is shown. Any fields that don't match are ignored.

  7. Click Run Flow to run the flow and generate the Microsoft Excel extract file.

Create a published data source

  1. Click the plus icon on a step and select Add Output.

    Note: Tableau Prep Builder refreshes previously published data sources and maintains any data modeling (for example calculated fields, number formatting, and so on) that might be included in the data source. If the data source can't be refreshed, the data source will be replaced instead.

  2. The output pane opens and shows you a snapshot of your data.

  3. From the Save output to drop-down list, select Published data source (Publish as data source in previous versions) . Complete the following fields:

    • Server (Tableau Prep Builder only): Select the server where you want to publish the data source and data extract. If you aren't signed in to a server you will be prompted to sign in.

      Note: Starting in Tableau Prep Builder version 2020.1.4, after you sign into your server, Tableau Prep Builder remembers your server name and credentials when you close the application. The next time you open the application, you are already signed into your server.

      On the Mac, you may be prompted to provide access to your Mac keychain so Tableau Prep Builder can securely use SSL certificates to connect to your Tableau Server or Tableau Online environment.

      If you are outputting to Tableau Online include the pod your site is hosted on in the 'serverUrl'. For example, 'https://eu-west-1a.online.tableau.com' not 'https://online.tableau.com'.

    • Ms teams emojis. Project: Select the project where you want to load the data source and extract.

    • Name: Enter a file name.

    • Description: Enter a description for the data source.

  4. (Tableau Prep Builder version 2020.2.1 and later) In the Write Options section, view the default write option to write the new data to your files and make any changes as needed. For more information, see Configure write options

    • Create table: This option creates a new table or replaces the existing table with the new output.

    • Append to table: This option adds the new data to your existing table. If the table doesn't already exist, a new table is created and subsequent runs will add new rows to this table.

  5. Click Run Flow to run the flow and publish the data source.

Save flow output data to external databases

Supported in Tableau Prep Builder version 2020.3.1 and later and on Tableau Server and Tableau Online starting in version 2020.4.

Important: This feature enables you to permanently delete and replace data in an external database. Be sure that you have permissions to write to the database.
To prevent data loss, you can use the Custom SQL option to make a copy of your table data and run it before writing the flow data to the table.

You can connect to data from any of the connectors that Tableau Prep Builder or the web supports and output data to an external database. This enables you to add or update data in your database with clean, prepped data from your flow each time the flow is run. This feature is available for both incremental and full refresh options. For more information about how to configure incremental refresh, see Refresh Flow Data Using Incremental Refresh .

Using Tableau Prep Builder

When you save your flow output to an external database, Tableau Prep does the following:

  1. Generates the rows and runs any SQL commands against the database.
  2. Writes the data to a temporary table (or staging area if outputting to Snowflake) in the output database.
  3. If the operation is successful, the data is moved from the temporary table (or your staging area for Snowflake) into the destination table.
  4. Runs any SQL commands that you want to run after writing the data to the database.

If the SQL script fails, the flow will fail. However your data will still be loaded to your database tables. You can try running the flow again or manually run your SQL script on your database to apply it.

Output options

You can select the following options when writing data to a database. If the table doesn't already exist, it's created when the flow is first run.

  • Append to table: This option adds data to an existing table. If the table doesn't exist, the table is created when the flow is first run and data is added to that table with each subsequent flow run.
  • Create table: This option creates a new table with the data from your flow. If the table already exists, the table and any existing data structure or properties defined for the table is deleted and replaced with a new table that uses the flow data structure. Any fields that exist in the flow are added to the new database table.
  • Replace data: This option deletes the data in your existing table and replaces it with the data in your flow, but preserves the structure and properties of the database table. If the table doesn't exist, the table is created when the flow is first run and the table data is replaced with each subsequent flow run.

Additional options

In addition to the write options, you can also include custom SQL scripts or add a new tables to your database.

  • Custom SQL scripts: Enter your custom SQL and select whether to run your script before, after or both before and after data is written to the database tables.You can use these scripts to create a copy of your database table before the flow data is written to the table, add an index, add other table properties, and so on.
  • Add a new table: Add a new table with a unique name to the database instead of selecting one from the existing table list. If you want to apply a schema other than the default schema (Microsoft SQL Server and PostgreSQL), you can specify it using the syntax [schema name].[table name].

Supported databases and database requirements

Tableau Prep supports writing flow data to tables in a select number of databases. Flows that run on a schedule in Tableau Online can only write to these databases if they are cloud-hosted.

Some databases have data restrictions or requirements. Tableau Prep may also impose some limits to maintain peak performance when writing data to the supported databases. The following table lists the databases where you can save your flow data and any database restrictions or requirements. Data that doesn't meet these requirements can result in errors when running the flow.

Note Setting character limits for your fields is not yet supported. However, you can create the tables in your database that include character limit constraints, then use the Replace data option to replace your data but maintain the table's structure in your database.

Tableau Prep Videos

DatabaseRequirements or restrictions
Amazon Redshift
  • Collation sequences aren't supported. See the Amazon Redshift(Link opens in a new window) documentation for more information.
  • Field names are converted to all lowercase.
  • Up to 8192 characters can be written for text field values. Longer values will be truncated.
Microsoft SQL Server
  • Up to 3072 characters can be written for text field values. Longer values will be truncated.

MySQL
  • Up to 8192 characters can be written for text field values. Longer values will be truncated.
Oracle
  • Field and table names can't exceed 30 characters.
  • Up to 1000 characters can be written for text field values. Longer values will be truncated.
  • Special characters in field names may cause errors.
PostgreSQL
  • Up to 8192 characters can be written for text field values. Longer values will be truncated.
Snowflake
  • Up to 8192 characters can be written for text field values. Longer values will be truncated.
  • Warehouse options must be set to auto-resume to enable Tableau Prep to write data to the database warehouse. For more information, see Auto-suspension and Auto-resumption(Link opens in a new window) in the Snowflake documentation.

Teradata
  • Up to 1000 characters can be written for text field values. Longer values will be truncated.

Save flow data to a database

Note: Writing flow output to a database using Windows Authentication isn't supported. If you use this method of authentication, you'll need to change the connection authentication to use the username and password.
You can embed your credentials for the database when publishing the flow. For more information about embedding credentials, see the Databases section in Publish a flow from Tableau Prep Builder.

  1. Click the plus icon on a step and select Add Output.
  2. From the Save output to drop-down list, select Database table.
  3. In the Settings tab, enter the following information:
    • In the Connection drop down list , select the database connector where you want to write your flow output. Only supported connectors are shown. This can be the same connector that you used for your flow input or a different connector. If you select a different connector, you'll be prompted to sign in.

      Important: Make sure you have write permission to the database you select. Otherwise the flow might only partially process the data.

    • In the Database drop-down list, select the database where you want to save your flow output data.
    • In the Table drop-down list, select the table where you want to save your flow output data. Depending on the Write Option you select, a new table will be created, the flow data will replace any existing data in the table, or flow data will be added to the existing table.

      To create a new table in the database, enter a unique table name in the field instead, then click on Create new table. When you run the flow for the first time, no matter which write option you select, the table is created in the database using the same schema as the flow.

  4. The output pane shows you a snapshot of your data. A field comparison shows you the fields in your flow that match the fields in your table, if the table already exists. If the table is new, then a one-to-one field match is shown.

    If there are any field mismatches, a status note shows you any errors.

    • No match: Field is ignored: Fields exist in the flow but not in the database. The field won't be added to the database table unless you select the Create table write option and perform a full refresh. Then the flow fields are added to the database table and use the flow output schema.
    • No match: Field will contain Null values: Fields exist in the database but not in the flow. The flow passes a Null value to the database table for the field. If the field does exist in the flow, but is mismatched because the field name is different, you can navigate to a cleaning step and edit the field name to match the database field name. For information about how to edit field name, see Apply cleaning operations.
    • Error: Field data types do not match: The data type assigned to a field in both the flow and the database table you are writing your output to must match, otherwise the flow will fail. You can navigate to a cleaning step and edit the field data type to fix this. For more information about changing data types, see Review the data types assigned to your data.
  5. Select a write option. You can select a different option for full and incremental refresh and the option is applied when you select your flow run method. For more information about running our flow using incremental refresh, see Refresh Flow Data Using Incremental Refresh .
    • Append to table: This option adds data to an existing table. If the table doesn't exist, the table is created when the flow is first run and data is added to that table with each subsequent flow run.
    • Create table: This option creates a new table. If the table with the same name already exists, the existing table is deleted and replaced with the new table. Any existing data structure or properties defined for the table are also deleted and replaced with the flow data structure. Any fields that exist in the flow are added to the new database table.
    • Replace data: This option deletes the data in your existing table and replaces it with the data in your flow, but preserves the structure and properties of the database table.
  6. (optional) Click on the Custom SQL tab and enter your SQL script. You can enter a script to run Before and After the data is written to the table.

  7. Click Run Flow to run the flow and write your data to your selected database.

Other articles in this section

Download Tableau Prep

Thanks for your feedback!



broken image