You are here :
-
Public Consultations
-
Reference
-
Statistics
-
Publications
-
Blog
- Brexit and .fr
- Analysis of the .RE
- Brands answer the call to the 2nd ‘Cercle des .marque’ event
- About the attack on French ISPs’ DNS resolvers
- Using Afnic open data : example with the term COVID
- Hosting a domain name with compound characters
- Eligibility of a holder located in the United Kingdom post Brexit
- Can compound characters be used in a domain name?
- Functioning of Afnic during lockdown
- Which Top Level Domains have an IP address?
- Lala Andriamampianina, may you rest in peace
- Resolutions for 2020: Afnic goes elliptic
- 6 tips to prevent your website from being hacked
- In search of low-cost nTLDs
- Exploring the city through the .paris community
- .org - an alternative perspective
- Looking back on the success of the first meeting of the Cercle des .marque
- Key success factors for Internet extensions: an evaluation grid
- [Video] Conclusions on the Internet Governance Forum (IGF) France 2019
- A brief example of using Afnic Open Data
- Food for thought on the "new TLD" business models
- 30 years of success and danger: the Web, URLs and the future
- [Success stories] Strengthen your infrastructure to suit your ambitions
- February 1, 2019: is the DNS going to shake?
- [Success stories] They chose to have their own TLD
- [Success stories] .museum, how a historic Internet suffix was revived
- The main steps in effectively launching your .brand
- 6 secrets on how to improve the renewal of domain names
- [Video] Back to IGF 2018 in Paris
- A .BRAND to enhance customer experience
- Afnic commits to DNS security at the international level
- Replacement of the KSK of the root zone: Are you ready?
- How the SNCF implemented its new digital strategy with oui.sncf
- Franco-Dutch research project on automatic classification of domain name abuse
- The auditive memorization of domain names
- What are the possible actions against domain name abuses?
- Identity theft by domain name: what Afnic does
- Cybersquatting, Spam, Phishing… the different types of domain name abuses
- [Video] Review of the French Internet Governance Forum 2018
- Custom Internet extensions: the opportunities for brands
- How to avoid inadmissibility in the SYRELI procedure
- Which English terms are most used in .FR domain names?
- Domain name security, the example of cryptocurrencies
- What are the terms most used in .fr domain names?
- Personality test: Are you ready for GDPR?
- Do GeoTLDs like .alsace have an effect on local SEO?
- The 11 vital locations to display your domain name!
- What means of action for a Right-holder ineligible under the Naming Policy?
- Domain name litigation: the recognition of an AOC rights in the SYRELI procedure
- Why choose a domain name under a geoTLD?
- Afnic, a community first and foremost!
- The defense of personality rights in the SYRELI procedure
- When will the next round of the new gTLDs take place?
- A million good reasons for coming to the Afnic Forum...
- Yeti DNS-over-TLS public resolver
- 2016, the beginning of a new cycle for Afnic
- .fr has just passed the 3 million domain names milestone
- My experience inside the Afnic Legal Department
- Future of ICANN Privatization? Internationalization? Supervision?
- Excellence at Afnic - Our coming-out
- Speech at the transmittal of the IANA Stewardship Transition Plan
- Exclusive offer: 100% money back on your domain name*!
- 8 tips for choosing the right domain name
- IPv6 and DNSSEC are respectively 20 and 19 years old. Same fight and challenges?
- L.45-2 paragraph 1 of the CPCE: When a domain name disrupts the French law
- How to avoid getting your domain name stolen by email?
- Accountability and IANA transition: behind the scenes
- Stop selling domain names!
- abc.xyz : erratum.xyz
- A comprehensive approach to French regional branding
- abc.xyz : Meanwhile, back in France…
- abc.xyz: Why not alphabet.com? (The conspiracy theory version)
- abc.xyz : The controversial success of .xyz
- Corporate Communications, Constant Crisis
- abc.xyz : Why not alphabet.com ?
- alphabet.xyz : How Alphabet got its domain name
- abc.xyz : Don't worry, we're still getting used to the name too!
- IANA transition crosses a major milestone in Buenos Aires
- A day in the life of the Icann empowered community
- IANA transition : the machine is moving, but the deadline is approaching
- Corporate Social Responsibility and the DNA of ccTLDs
- China Changing in Leaps and Bounds
- Towards a less intrusive DNS
- ICANN: what does accountability stand for?
- ICANN Singapore. A debate at the other end of the world
- ICANN Reform, or opening Pandora's box
- Internet Governance Forum: What is to be done?
- Slam spam!
- Icann : freeze !
- Scams and identity theft, the experience of a SYRELI reporter
- French Regional Reform Does Not Mean the End of GeoTLDs
- Lessons Learnt from NETmundial
- Suggestions for a successful IANA transition
- Wind of change at Afnic!
- Back to the future of the Afnic Legal Service
- The US Backs ICANN for Internet Governance
- Should the registrars streamline their gTLD strategy?
- The IANA elephant in the room
- 2014 : change of course for the naming system
- Why do regions want a place online?
- What can Afnic do?
- Internet governance: let’s get to work!
-
FAQ
-
Glossary
-
Certificates
A brief example of using Afnic Open Data
09 July 2019 - By Stéphane Bortzmeyer
Which domain names are derived from a first name?
Afnic distributes open data in https://opendata.afnic.fr/en/, about .fr domain names. Here is a brief example of the use of these data, crossed with other open data, on first names in France.
Domain name registrants have a vast choice of names. They can derive the domain name from their family name, or choose a descriptive name. If Jean Dupont wants to create a website on gardening, he can choose jean-dupont.fr, or dupont-jardinage.fr or jean-jardinage.fr or a wide range of other names. We shall focus here on domain names based on a first name.
The first question is how to find them? The list of .fr domain names is available in https://opendata.afnic.fr/en. Download "A- domain names fr.zip" (I won't give you the link, it changes every time), unzip the file, and you end up with a file in CSV format (in fact, the fields are separated by semicolons, not commas), the two fields important to us being the first (the domain name) and the 11th (the date of deletion: if the field is input, it means the domain no longer exists). So we now have the list of domain names under the .fr. We still have to find those that derive from a first name. (We also need to recode the file in UTF-8 because it uses an old character encoding system.)
Is there a list of first names in France, like a list of domain names? Yes, the French National Institute for Statistics and Economic Studies (INSEE) distributes such a list. We also get a zipped file that, once unzipped, gives us a list of first names. It is recommended to read the documentation because the use of this file is a bit complicated. A first analysis shows that the file contains 32,704 first names. Now let's look for which domain names are first names.
A first trivial program tells us 13,518 domain names have been formed in this way, among which the classics marie.fr and jean.fr but also my first name (stéphane.fr exists), as well as brunehilde.fr and lucrezia.fr. But this is insufficient because the program only detects the domain names that are first names. We should like to expand the search and have domain names comprising a first name.
I'll spoil the surprise immediately: it won't work well because many of the names are so short that they are found everywhere. The INSEE file includes names such as Al or Bo, but also single letters (negligence of the town clerk?). So we have to reduce the list of first names. Let's start by keeping only the most common ones; some first names are very rare. (The popularity of first names is a decreasing exponential). By accepting only the first names given to more than 1,000 people during the period in question, we reduce the number of names to 3,042 but, and this is what is important, it still represents 93.8% of the population.
This time, we find too many domain names: 34.42 %. This is due to the fact that there are still names which are quite short, which create many false positives. If lejardindelola.fr contains the first name Lola, on the other hand service-catholique-funerailles-boulogne-billancourt.fr is a false positive (it contains the first name Illan). In short, we shall have to move to a more subtle algorithm.
Next step, not only do we keep only the 3042 most frequent first names used in the previous test, but we consider a domain name is derived from a first name only if one of the following conditions is met:
- the domain name is equal to a first name (michèle.fr),
- the first name is more than six letters long and is at the beginning of the domain name (charlesdegaulleroissyparkingaeroport.fr),
- the domain name begins with a first name less than six letters long, and is followed by a dash (zora-creation.fr).
With these rules, we find that 147,094 domain names, or 4.31% of the total are derived from a first name. There are still false negatives and false positives (like france-boissons.fr, where the first word probably refers to the country and not the first name) but nothing is perfect in data analysis.
Note that there are still some things that could be improved. I did not try to do fuzzy search, for example, so the name Théophile will not be found in theophile.fr. (The INSEE data are of variable quality in terms of spelling; for example, this particular name is sometimes written Théophile and sometimes Theophile.) Another trap, first names are highly fashionable and the INSEE database dates back to 1900. It might be interesting not to take into account first names only given in the past.
And so now we can now start studying the history of these domain names based on a first name: do they have a better renewal rate than others, for example. But I focused here on what was available as open data.
Thanks to Alexander Mayrhofer, from the .at registry (Austria) for the idea, the explanations and the algorithm. The rest only concerns programmers:
- The programs were written in Python.
- The names in the database distributed by Afnic are encoded in Punycode (for example, stéphane.fr is written xn--stphane-cya.fr). To have the real name, you have to convert them encodings.idna.ToUnicode (domain).
- The trivial algorithm for testing all first names with all domain names nests both loops. This is obviously dreadfully inefficient, so I used regular expressions with the Python re module. This builds an expression with all the first names and is applied successively to each domain.
Is this domain
available ?
News
- March 16, 2021 Afnic joins the Renaissance Numérique Think Tank
- March 12, 2021 Afnic and the Swedish Internet Foundation extend their collaborative Zonemaster ...
- March 11, 2021 .FR in 2020: acceleration of the digital transformation among businesses and ret...
- March 1, 2021 Report Internet of Things & Digital Sovereignty
- February 12, 2021 Afnic sponsors the TV program Connecte Ta Boîte