Data Type Handling
FFIEC Data Connect provides comprehensive data type handling across multiple protocols (SOAP/REST) and output formats, ensuring data integrity and precision from the original XBRL source through to your final data structure.
Overview
The library manages data types across three key dimensions:
Protocol Layer (SOAP vs REST)
Aspect |
SOAP API |
REST API |
|---|---|---|
Credentials |
|
|
Null Values |
|
|
Compatibility |
100% backward compatible |
Enhanced integer handling |
Use Case |
Existing integrations |
New implementations |
Type Detection in XBRL Processing
The library automatically detects data types from XBRL unit references:
XBRL Unit |
Value Example |
Python Type |
Description |
|---|---|---|---|
|
|
|
Monetary values (÷1000) |
|
|
|
Ratios and percentages |
|
|
|
Non-monetary numerics |
Boolean |
|
|
Boolean indicators |
Other |
|
|
Text values |
Special Processing Rules
USD Values: Automatically divided by 1000 using integer division (
//)Type Preservation: Original types stored in
data_typefieldNull Handling: Protocol-specific null values applied
SOAP vs REST Null Handling
Why Different Null Handling Strategies?
The library uses different null strategies for SOAP and REST to solve a critical problem while maintaining backward compatibility:
The Problem:
When pandas DataFrames contained np.nan values in integer columns, pandas would automatically convert those columns to float64 to accommodate the NaN values. This resulted in integer values displaying with decimal points (e.g., 1000.0 instead of 1000), which was both aesthetically problematic and semantically incorrect for financial data that should be represented as whole numbers.
The Solution:
SOAP Path (Original): Continues using
np.nanto ensure 100% backward compatibility for existing integrations. Existing code that expectsnp.nanbehavior continues to work unchanged.REST Path (Enhanced): Uses
pd.NA, which is pandas’ newer null value that works with nullable integer types (Int64). This allows integer columns to remain as integers even when containing null values.
Design Philosophy:
This dual approach follows the principle of “never break existing code.” Users who have built systems around the SOAP API can upgrade the library without any changes to their code, while new REST API users automatically benefit from improved type handling.
# Example of the problem this solves:
# Before (with np.nan):
df['int_data'] # Shows: [1000.0, 2000.0, NaN, 3000.0]
# After (with pd.NA for REST):
df['int_data'] # Shows: [1000, 2000, <NA>, 3000]
Technical Implementation
The differentiation between SOAP and REST null handling is implemented at the XBRL processor level through the use_rest_nulls parameter:
SOAP calls: Automatically use
use_rest_nulls=False(default), applyingnp.nanREST calls: Explicitly set
use_rest_nulls=True, applyingpd.NAUser transparency: This is handled automatically based on credential type - users never need to specify this parameter
This approach ensures that:
Zero configuration needed: The library automatically selects the appropriate null handling based on your credentials
No breaking changes: Existing SOAP users see no changes in behavior
Optimal for each protocol: Each API path gets the most appropriate null handling for its use case
Future-proof: As pandas evolves, REST users automatically benefit from improvements to nullable types
SOAP API (Original Behavior)
Data Type |
Null Value |
Pandas Conversion |
Final dtype |
|---|---|---|---|
Integer |
|
Direct to |
Nullable integer |
Float |
|
Direct to |
Standard float |
Boolean |
|
Direct to |
Nullable boolean |
String |
|
Direct to |
Pandas string |
# SOAP path - original behavior preserved
processed_ret = xbrl_processor._process_xml(
data,
date_format,
use_rest_nulls=False # Default
)
REST API (Enhanced Behavior)
Data Type |
Null Value |
Intermediate |
Pandas Conversion |
Final dtype |
|---|---|---|---|---|
Integer |
|
→ |
→ |
Nullable integer |
Float |
|
→ |
→ |
Standard float |
Boolean |
|
→ |
→ |
Nullable boolean |
String |
|
(unchanged) |
→ |
Pandas string |
# REST path - enhanced handling
processed_ret = xbrl_processor._process_xml(
data,
date_format,
use_rest_nulls=True # Explicit for REST
)
Output Format Type Mapping
List Output (output_type="list")
Returns raw dictionaries with the following structure:
{
'mdrm': str, # MDRM identifier
'rssd': str, # RSSD ID
'quarter': str/date, # Based on date_output_format
'data_type': str, # 'int', 'float', 'bool', or 'str'
'int_data': int/null, # Value if data_type='int'
'float_data': float/null, # Value if data_type='float'
'bool_data': bool/null, # Value if data_type='bool'
'str_data': str/None # Value if data_type='str'
}
Pandas Output (output_type="pandas")
Creates DataFrames with nullable types for proper null handling:
Column |
dtype |
Description |
Null Support |
|---|---|---|---|
|
|
MDRM identifier |
No |
|
|
RSSD ID as string |
No |
|
|
Based on date_output_format |
No |
|
|
Type indicator |
No |
|
|
Nullable integer |
Yes ( |
|
|
Standard float |
Yes ( |
|
|
Nullable boolean |
Yes ( |
|
|
Pandas string |
Yes ( |
Key Benefits:
No
.0suffix on integer valuesProper null handling with pandas nullable types
Type-safe operations
Polars Output (output_type="polars")
Direct conversion with native nullable types:
Column |
Polars Type |
Null Support |
Notes |
|---|---|---|---|
|
|
No |
String type |
|
|
No |
String type |
|
|
No |
Based on date_output_format |
|
|
No |
String type |
|
|
Yes |
Native nullable |
|
|
Yes |
Native nullable |
|
|
Yes |
Native nullable |
|
|
Yes |
Native nullable |
Integer Display Examples
Problem Scenario (Before Fix)
# Original issue: integers showing as floats
df['int_data'] # Shows: 1000.0, 2000.0, 3000.0
Current Behavior - SOAP Path
# Input: XBRL with USD value "1500000"
# Processing: 1500000 // 1000 = 1500 (integer division)
# Storage: np.int64(1500) with np.nan for nulls
# DataFrame: Int64 dtype
# Display: 1500 (no .0 suffix)
Current Behavior - REST Path
# Input: JSON with integer 1500000
# Processing: 1500000 // 1000 = 1500
# Storage: np.int64(1500) with pd.NA for nulls
# Conversion: pd.NA → None → Int64
# Display: 1500 (no .0 suffix)
Type Conversion Decision Tree
Input Data
├── SOAP API (WebserviceCredentials)
│ ├── XBRL Processing
│ │ ├── Detect Type (USD/PURE/etc.)
│ │ └── Apply np.nan for nulls
│ └── Output Format
│ ├── list → Raw dicts with np.nan
│ ├── pandas → DataFrame with Int64/float64/boolean
│ └── polars → DataFrame with native nullable types
│
└── REST API (OAuth2Credentials)
├── XBRL Processing
│ ├── Detect Type
│ └── Apply pd.NA for nulls
└── Output Format
├── list → Raw dicts with pd.NA
├── pandas → Convert pd.NA → None/np.nan → nullable dtypes
└── polars → Convert pd.NA → None → native nulls
Date Format Handling
The date_output_format parameter affects the quarter column:
Format |
Example Output |
Python Type |
Pandas dtype |
Polars Type |
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Common Patterns and Best Practices
Working with Integer Data
# Recommended: Use nullable integer operations
df['int_data'].sum() # Handles NA/null correctly
df['int_data'].fillna(0) # Replace nulls with 0
# Avoid: Converting to standard float
float(df['int_data']) # May raise error with NA values
Type Checking
# Check for integer rows
int_rows = df[df['data_type'] == 'int']
# Access specific typed columns
integers = df['int_data'].dropna()
floats = df['float_data'].dropna()
Null Handling
# Pandas: Use pd.isna() for universal null checking
pd.isna(df['int_data']) # Works with both NaN and NA
# Polars: Use native is_null()
df_polars['int_data'].is_null()
Type Preservation During Operations
# Maintains Int64 type
df['int_data'] * 1000 # Result is still Int64
# Converts to float64
df['int_data'] / 1000 # Result becomes float64
df['int_data'] // 1000 # Use // to maintain integer
Migration Guide
From Existing SOAP Integration
No changes required. The library maintains 100% backward compatibility:
# Existing code continues to work unchanged
df = collect_data(
session=connection.session,
creds=soap_credentials,
reporting_period="2023-12-31",
rssd_id="480228",
output_type="pandas"
)
# Returns DataFrame with same types as before
Adopting REST API
To leverage enhanced REST features:
# Use OAuth2Credentials for REST
creds = OAuth2Credentials(username="user", token="token")
df = collect_data(
session=None, # Not needed for REST
creds=creds,
reporting_period="2023-12-31",
rssd_id="480228",
output_type="pandas"
)
# Returns DataFrame with enhanced null handling
Overriding Null Value Handling
The force_null_types Parameter
While the library automatically selects the appropriate null handling based on your API choice (SOAP vs REST), you can override this behavior using the force_null_types parameter available in collect_data() and collect_ubpr_facsimile_data() functions.
When to Use This Parameter:
Testing and Migration: Test how your code would behave with different null handling before switching APIs
Compatibility Issues: Work around specific compatibility requirements in your data pipeline
Performance Comparison: Compare the behavior of both null handling approaches
Gradual Migration: SOAP users can preview REST-style null handling without changing credentials
Parameter Options
Value |
Behavior |
Use Case |
|---|---|---|
|
Automatic based on API type |
Normal operation - recommended |
|
Force |
Legacy compatibility, SOAP-like behavior |
|
Force |
Better integer display, REST-like behavior |
Usage Examples
Example 1: SOAP User Testing Pandas Null Handling
# SOAP credentials normally use np.nan
soap_creds = WebserviceCredentials(username="user", password="pass")
# Test with pandas null handling without switching to REST
df = collect_data(
session=connection.session,
creds=soap_creds,
reporting_period="2023-12-31",
rssd_id="480228",
output_type="pandas",
force_null_types="pandas" # Override to use pd.NA
)
# Now integers display without .0 suffix
Example 2: REST User Requiring NumPy Compatibility
# REST credentials normally use pd.NA
rest_creds = OAuth2Credentials(username="user", token="token")
# Force numpy nulls for compatibility with legacy analysis code
df = collect_data(
session=None,
creds=rest_creds,
reporting_period="2023-12-31",
rssd_id="480228",
output_type="pandas",
force_null_types="numpy" # Override to use np.nan
)
# Now compatible with code expecting np.nan
Example 3: Comparing Both Approaches
# Compare integer handling with different null types
for null_type in [None, "numpy", "pandas"]:
df = collect_data(
session=connection.session,
creds=credentials,
reporting_period="2023-12-31",
rssd_id="480228",
output_type="pandas",
force_null_types=null_type
)
print(f"Null type: {null_type or 'automatic'}")
print(f"Integer sample: {df['int_data'].iloc[0]}")
print(f"Has .0 suffix: {'.0' in str(df['int_data'].iloc[0])}\n")
Implementation Notes
Performance: No significant performance difference between null types
Memory Usage:
pd.NAuses slightly more memory but provides better type safetyCompatibility:
np.nanis more widely compatible with older pandas versionsFuture-Proof:
pd.NAis the recommended approach for new pandas code
Best Practices
Use defaults when possible: Let the library choose based on your API
Document overrides: If you override, comment why in your code
Test thoroughly: When overriding, test all data operations
Consider migration: If consistently overriding, consider switching APIs
Warning
Overriding null types may cause unexpected behavior if your code assumes specific null handling. Test thoroughly when using force_null_types.
Troubleshooting
NAType Error
Symptom: float() argument must be a string or a real number, not 'NAType'
Cause: Attempting float conversion on pandas NA value
Solution: Use pd.isna() for null checking or .fillna() before conversion
Integers Display with .0
Symptom: Integer values show as 1000.0 instead of 1000
Cause: Mixed with float or using regular division
Solution: Ensure using Int64 dtype and integer division (//)
Type Loss in Operations
Symptom: Int64 column becomes float64 after operation
Cause: Operation that produces non-integer results
Solution: Use integer-preserving operations or explicitly cast back
Performance Considerations
Operation |
SOAP |
REST |
Notes |
|---|---|---|---|
Null checking |
Fast ( |
Fast ( |
Both optimized |
DataFrame creation |
Standard |
Slightly slower |
REST has extra conversion |
Memory usage |
Standard |
~Same |
Nullable types similar |
Integer operations |
Fast |
Fast |
Int64 optimized |
API Reference
Internal Type Handling
Function |
Purpose |
SOAP Behavior |
REST Behavior |
|---|---|---|---|
|
Parse XBRL |
Uses |
Uses |
|
Process single item |
Returns typed value |
Returns typed value |
DataFrame conversion |
Create pandas DF |
Direct with |
Converts |
Version History
v2.0.0: Added REST API support with enhanced null handling
v1.x.x: Original SOAP-only implementation with
np.nan