Skip to main content

Best Practices for Syncing Large Data Sets in iPaaS.com

Advice on getting your data populatd

Updated yesterday

Syncing a large volume of data through iPaaS.com, whether it's tens of thousands of products, customers, or orders, behaves very differently from a standard ongoing sync. Records can appear to take longer to process, errors can surface in unexpected places, and impatience often leads to decisions that make things slower, not faster.

This article explains why large syncs work the way they do, what controls sync speed, and why the hub-and-spoke architecture behind iPaaS.com is the right approach even when an initial sync takes longer than expected.

Why large syncs are different

In standard integration operation, data flows in real time: a record changes in System A, a webhook fires, iPaaS.com processes it, and System B is updated within seconds.

A large initial sync is the opposite of that. Instead of a trickle of individual changes, you're asking the platform to ingest, normalize, and distribute thousands (or tens of thousands) of records, each of which may require multiple API calls to complete.

A single product sync, for example, might require separate API calls for:

  • The parent product record

  • Each product option (size, color, etc.)

  • Each variant

  • Category assignments

  • Inventory per location

A catalog of 10,000 products can easily require 100,000+ API calls before the first record reaches its destination. Multiply that across systems and data types, and it becomes clear why large syncs need to be approached differently.

The #1 factor controlling sync speed: throttle settings

The single biggest lever you have over how fast (or slow) a large sync runs is each system’s throttling configuration in Subscription Settings.

Throttling is not a bug or a limitation; it's a control system designed to protect both iPaaS.com and the external APIs it connects to from being overwhelmed. Every connected system has finite resources, and hitting those limits mid-sync is worse than going slower in the first place due to:

  • API calls up until failure are wasted (see below)

  • Issues with update vs add mappings: if the main product is created and then fails halfway through, it would be an update call next. If your update mapping is different, you may have incomplete data.

  • There are also some systems that penalize you for hitting your limits. For instance, if you can make 100 calls/hr and you make the 101st call, then your next batch of calls will be smaller, or they will make you wait longer until you get a new batch.

Where to find throttle settings

Navigate to Subscription Management → Subscriptions → [your subscription] → Edit.

The key settings are:

Setting

What it controls

API Throttle Limit

Maximum API calls FROM iPaaS.com allowed within the throttle window.

Set to 0 for all modern integrations (Salesforce, Shopify, etc.). Any non-zero value causes iPaaS.com to actively manage webhook pacing in a way that creates idle gaps during bulk transfers — significantly slowing throughput. The only exception is legacy systems without their own rate limiting (e.g., Nav, Counterpoint).

API Throttle Seconds

The time window (in seconds) for the FROM iPaaS.com limit above

Paired with Throttle Limit. Has no effect when Throttle Limit is 0. Set to 0 always.

API Throttle Limit (TO IPAAS)

Maximum API calls TO iPaaS.com allowed within the throttle window.

Set to 0 for all modern integrations (Salesforce, Shopify, etc.). Any non-zero value causes iPaaS.com to actively manage webhook pacing in a way that creates idle gaps during bulk transfers — significantly slowing throughput. The only exception is legacy systems without their own rate limiting (e.g., Nav, Counterpoint).

API Throttle Seconds (TO IPAAS)

The time window (in seconds) for the TO iPaaS.com limit above

Paired with Throttle Limit. Has no effect when Throttle Limit is 0. Set to 0 always.

Concurrent Connections

How many simultaneous API connections are open at once

Batch Executions

How many repetitive calls are grouped together

⚠️ Why Throttle Limit is often set to 500 — and why you should change it

New subscriptions frequently default to a Throttle Limit of 500. This is the wrong setting for most integrations. When a non-zero throttle limit is hit, iPaaS.com moves pending webhooks into a secondary holding queue and feeds them back gradually — creating a stop-start pattern where webhooks process for a few seconds, then sit idle for the remainder of the rate window. During large bulk transfers, this cycle can add hours to an otherwise fast sync.


First thing to do on any new subscription: set both throttle fields to 0. Use Concurrent Connections (based on what is allowed by the external system) to control throughput instead — that's the right lever.

These values are pre-populated by the integration, but they can be adjusted. For most use cases, the recommended approach is:

  • Set Throttle Limit to 0 (unlimited)

  • Use Concurrent Connections as your primary speed control

Think of concurrent connections as a dial between a fire hose and a garden hose. High values push data hard in short bursts. Lower values create a steady, sustained flow. For large syncs, a steady flow is almost always more reliable.

⚠️ Critical: When transferring a single piece of data, if it hits a throttle limit mid-way through processing, all previous API calls in that transfer are wasted and the entire transaction must restart from scratch. A transaction that requires 10 API calls but fails on call 9 costs you 9 calls and produces nothing. Lowering concurrent connections prevents this failure pattern.

Why the hub-and-spoke model is the right approach — even if the initial sync takes longer

iPaaS.com is built on a hub-and-spoke architecture. Every system in your iPaaS.com account connects to a central hub rather than directly to each other. Data always flows:

System A → iPaaS.com Hub → System B

It never goes directly from System A to System B.

This architecture is optimal for several reasons:

1. Data is normalized once, distributed many times

When a record arrives at the hub, iPaaS.com normalizes it to a common standard. That normalized record can then be sent to any number of downstream systems without re-processing or re-mapping. If you add a third system later, you don't re-sync from the original source, you distribute from the hub.

2. Error tracking is centralized

In a point-to-point integration, a failure in one connection is invisible to the others. In iPaaS.com's hub model, every error is logged in one place. You can see exactly where in the chain a record failed, retry it, and get detailed debug output, all from a central dashboard.

3. The platform is built to handle queue-based volume

iPaaS.com uses a queueing system, not batch scheduling, to process data. Records are ingested, normalized, and distributed in order. The system autoscales to handle load. This is fundamentally different from trying to push a bulk export directly between two systems and hoping they both stay responsive.

Note: Even when iPaaS.com polls it queues up the individual records.

4. Redundant API calls are eliminated

In a direct A-to-B architecture, the same data often needs to be fetched multiple times across multiple connections. The hub model reduces this by treating iPaaS.com as the authoritative source of normalized data, cutting down the total number of external API calls required.

5. One place to monitor, one place to fix

During a large sync, something will inevitably need attention: a missing field, an unmatched category, a rate limit hit, etc. Having everything visible and actionable in one place (error logs, activity logs, manual sync, debug mode) is a major operational advantage.

How to approach a large sync

Step 1: Plan your data flow order

Dependencies matter. Syncing in the wrong order will produce errors that have nothing to do with your configuration.

The recommended sequence is:

  1. Locations / warehouses

  2. Payment methods

  3. Shipping methods

  4. Products and categories

  5. Customers

  6. Transactions / orders

A transaction record that references a shipping method or SKU that doesn't exist yet in the destination system will fail, even if the data itself is perfect.

Note: the above order is a general best practice, but make sure that it applies to your configuration before proceeding.

Step 2: Configure collision handling before you start

When syncing a large dataset, you will almost certainly encounter records that already exist in the destination system. Decide in advance how to handle them:

  • Error: Reject the duplicate

  • Remap and Link: Link to the existing record and treat as an update

  • Update and Link: Update the existing record and link

  • Update and No Link: Update without linking

Leaving this unset before a bulk load is a common source of unexpected errors. Note that not all integrations support collision handling.

Step 3: Use the right sync method for your volume

iPaaS.com offers three approaches to bulk loading:

Initialization: Available in Subscription Settings under "Initialize Data." Designed specifically for bulk loading for smaller data collections. Not available for all integrations (for example, HubSpot currently does not support initialization). Check your subscription settings first.

Polling (/poll scope): For polling-based integrations, you can trigger a full data pull via Manual Sync. Clear the "last polled date" in your integration settings first (if it's not shown, it means it has not been set and you can proceed) to ensure all historical records are captured, not just recent ones.

Note that many integrations have settings about how far back to poll on the first polling job. Make sure this setting, if present, is configured appropriately.

Postman Runner: For large datasets that exceed the ~3,000 ID limit of Manual Sync, Postman Runner automates adding records to the iPaaS.com queue via API. This is the recommended approach for very large volumes. Allow 100–500ms between requests to avoid overwhelming the queue.

⚠️ When using the API, do not rely on HTTP 200 status alone to confirm records are queued. The endpoint may accept a request even if the scope is invalid or misspelled — verify by checking the Events Queue in your subscription dashboard.

Step 4: Always test with a small batch first

Before running a full dataset, sync 5–10 records and verify they process correctly end-to-end. This catches mapping issues, missing fields, and collision handling misconfigurations before they affect thousands of records.

Step 5: Monitor throughout the sync

Don't start a large sync and walk away. Use:

  • Activity Logs: Real-time view of data as it moves through the system

  • Events Queue: Confirm records are being ingested and processed

  • Error Logs: Surface failures quickly; use bulk retry or bulk dismiss as appropriate

  • Debug Mode (via Manual Sync): Prioritize individual records and get detailed logging for hard-to-diagnose failures. Note that during a large sync, debugging might take longer than typical.

Common issues during large syncs

Sync is very slow

Most likely cause: Throttle settings are too conservative or external API rate limits are being hit repeatedly.

Check your Concurrent Connections setting and review whether the external system has published rate limits. If you're seeing 429 errors in the Activity Logs, the external system is rejecting requests. You should lower your concurrent connections and allow the queue to drain before resuming.

Transfers failing mid-process

Most likely cause: Hitting a throttle limit after some (but not all) API calls in a transaction have completed.

Lower your Concurrent Connections. Use the "garden hose" - it takes longer but every transaction completes cleanly. Partial transactions that fail are retried from scratch, making high-volume bursting more expensive overall.

Duplicate records appearing in the destination system

Most likely cause: Collision handling was not configured before the bulk load.

Review your collision handling settings and decide whether existing records should be updated, linked, or rejected. Run a small test batch to confirm behavior before proceeding.

Records failing due to missing related data

Most likely cause: Syncing in the wrong order. For instance, orders before products, or products before categories.

Review the recommended data flow order above. Use the Implementation QA Checklist as a reference before beginning bulk operations.

Records appear to sync but aren't showing in the destination

Most likely cause: A mapping filter is excluding them, or an error filter is triggering silently.

Review your mapping collections for active C# filter expressions. Check Error Logs for records that may have been flagged and dismissed. Use Debug Mode to trace a specific record through the full pipeline.

Performance tips

  • Increase batch executions gradually. Higher batch sizes improve throughput but reduce error granularity. Start small and increase once you've confirmed the sync is stable.

  • Limit scope with mapping filters. If you only need a subset of data (e.g., a specific product category or orders past a certain date), use a mapping filter to reduce the volume before it enters the queue.

  • Back up your subscription config before bulk operations. Export your configuration from Subscription Management → View Details & Export before making throttle or mapping changes. See Subscription Management – View Details & Export.

  • Disable unused mapping collections. If a mapping collection isn't needed for the sync, add a false filter (1==2) or delete it to prevent unnecessary processing overhead.

Summary

Large syncs in iPaaS.com are slower than real-time syncs by design, and that's the right trade-off. The hub-and-spoke architecture ensures data is normalized once, errors are visible and actionable in one place, and the queueing system handles volume without overwhelming connected systems.

The biggest controllable factor in sync speed is your throttle configuration in Subscription Settings, specifically Concurrent Connections. Set it too high and you'll hit external API limits, waste transactions, and restart from scratch. Set it right and the sync runs steadily to completion.

Plan your data flow order, configure collision handling before you start, use the right sync method for your volume, and monitor throughout. A large sync that takes a few hours and completes cleanly is far better than one that finishes in minutes but leaves thousands of failed records behind.

Did this answer your question?