How to Use Fictitious Data in Software Testing
Using real CPF, CNPJ, or credit card data in development and staging environments is a legal and ethical risk. Understand how mathematically valid fictitious data makes testing safer, more reproducible, and LGPD-compliant.
Why fictitious data is essential in testing
Every system that collects user data — whether an e-commerce platform, a banking system, an HR app, or a health platform — needs to be thoroughly tested before going to production. The temptation to use real data is strong: it is convenient and represents real situations. But this practice creates serious risks. Brazil's LGPD (Law 13.709/2018) prohibits the use of personal data for purposes incompatible with the original collection. A development environment with real customer data can result in fines of up to 2% of the company's annual revenue, capped at R$50 million per infraction.
Beyond legal risk, there is operational risk: real data in test environments is often accessed by developers, service providers, and CI/CD tools that should not be exposed to sensitive information. Once that data leaks, the damage to the company's reputation is irreparable. Fictitious data eliminates these risks: you test with the same structure and complexity as real data, without exposing any real individuals or companies.
What does 'mathematically valid' mean?
Generating any number sequence is not enough — it needs to pass the same validations the system applies. CPF, for example, has two check digits calculated by a modulus 11 algorithm. If you insert a CPF with random digits, the validation system will reject it before you even reach the flow you want to test. The same applies to CNPJ, CNH, RENAVAM, Voter Title, and other Brazilian documents.
Help4Dev generates data that passes all these algorithmic checks. A CPF generated here has the correct structure and valid check digits — it will pass your application's validation layer, allowing you to test the actual system behavior: database persistence, external API integration, report generation, and all business flows.
Strategies for organizing fictitious data in projects
A good practice is to create a fixtures or seeds file with fixed fictitious data for the most common scenarios in your system. For example: a standard individual CPF, an active company CNPJ, a credit card for each major brand. These data points are version-controlled in the repository and used consistently by the entire team and CI/CD pipelines.
For tests that require variation — such as checking whether the system correctly handles multiple registrations — use programmatic generators that create valid fictitious data on demand. Libraries like Faker (available for Python, JavaScript, Java, Ruby) offer CPF and CNPJ generators in some implementations. Complementing with Help4Dev tools to validate generated data is always a useful additional check.
Most common use cases
User registration: forms requiring CPF, RG, or CNH need to be tested with valid documents from different states. E-commerce: checkout flows that validate the buyer's CPF and the company's CNPJ on invoices. HR systems: employee registration with PIS/PASEP, CPF, and work record data. Healthcare: platforms issuing AIH (Hospital Admission Authorization) need valid numbers to test DATASUS integrations.
In all these cases, Help4Dev offers specific generators for each document type, with mask options (with or without dots and dashes) and automatic rotation to quickly generate multiple values during manual testing sessions.
Integrating fictitious data into automation pipelines
Test frameworks like Cypress, Playwright, Selenium, and JUnit allow parameterizing test cases with multiple data sets. An efficient approach is to maintain a JSON or CSV file with 10 to 20 valid fictitious CPFs and CNPJs and iterate over them in registration and validation tests. This increases coverage without increasing script complexity.
For more sophisticated automations, consider creating an internal microservice or utility function that implements CPF and CNPJ generation algorithms in your project's language. The algorithms are public and easy to implement — our article on the CPF and CNPJ algorithm explains the full step-by-step process.